Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3095713.3095751acmotherconferencesArticle/Chapter ViewAbstractPublication PagescbmiConference Proceedingsconference-collections
research-article

Harvesting Deep Models for Cross-Lingual Image Annotation

Published: 19 June 2017 Publication History

Abstract

This paper considers cross-lingual image annotation, harvesting deep visual models from one language to annotate images with labels from another language. This task cannot be accomplished by machine translation, as labels can be ambiguous and a translated vocabulary leaves us limited freedom to annotate images with appropriate labels. Given non-overlapping vocabularies between two languages, we formulate cross-lingual image annotation as a zero-shot learning problem. For cross-lingual label matching, we adapt zero-shot by replacing the current monolingual semantic embedding space by a bilingual alternative. In order to reduce both label ambiguity and redundancy we propose a simple yet effective approach called label-enhanced zero-shot learning. Using three state-of-the-art deep visual models, i.e., ResNet-152, GoogleNet-Shuffle and OpenImages, experiments on the test set of Flickr8k-CN demonstrate the viability of the proposed approach for cross-lingual image annotation.

References

[1]
Alexandre Bérard, Christophe Servan, Olivier Pietquin, and Laurent Besacier. 2016. MultiVec: a multilingual and multilevel representation learning toolkit for nlp. In LREC.
[2]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In CVPR.
[3]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In CVPR.
[4]
Micah Hodosh, Peter Young, and Julia Hockenmaier. 2013. Framing image description as a ranking task: Data, models and evaluation metrics. JAIR 47 (2013), 853--899.
[5]
Alireza Koochali, Sebastian Kalkowski, Andreas Dengel, Damian Borth, and Christian Schulze. 2016. Which Languages do People Speak on Flickr? A Language and Geo-Location Study of the YFCC100m Dataset. In MMCommons.
[6]
I Krasin, T Duerig, N Alldrin, A Veit, S Abu-El-Haija, S Belongie, D Cai, Z Feng, V Ferrari, V Gomes, and others. 2016. OpenImages: A public dataset for large-scale multi-label and multiclass image classification. https://github.com/openimages. (2016).
[7]
Xirong Li, Weiyu Lan, Jianfeng Dong, and Hailong Liu. 2016. Adding Chinese captions to images. In ICMR.
[8]
Xirong Li, Shuai Liao, Weiyu Lan, Xiaoyong Du, and Gang Yang. 2015. Zero-shot image tagging by hierarchical semantic embedding. In SIGIR.
[9]
Xirong Li, Tiberio Uricchio, Lamberto Ballan, Marco Bertini, Cees G. M. Snoek, and Alberto Del Bimbo. 2016. Socializing the Semantic Gap: A Comparative Survey on Image Tag Assignment, Refinement, and Retrieval. CSUR 49, 1 (2016), 14:1--14:39.
[10]
Thang Luong, Hieu Pham, and Christopher D Manning. 2015. Bilingual word representations with monolingual quality in mind. In NAACL workshop.
[11]
Pascal Mettes, Dennis Koelma, and Cees Snoek. 2016. The ImageNet Shuffle: Reorganized Pre-training for Video Event Detection. In ICMR.
[12]
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. In ICLR.
[13]
Takashi Miyazaki and Nobuyuki Shimizu. 2015. Cross-lingual image caption generation. In ACL.
[14]
Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome, Greg Corrado, and Jeffrey Dean. 2014. Zero-Shot Learning by Convex Combination of Semantic Embeddings. In ICLR.
[15]
Xiaoguang Rui, Nenghai Yu, Mingjing Li, and Lei Wu. 2009. On cross-language image annotations. In ICME.

Cited By

View all
  • (2019)Deep Learning for Video Retrieval by Natural LanguageProceedings of the 1st International Workshop on Fairness, Accountability, and Transparency in MultiMedia10.1145/3347447.3350565(2-3)Online publication date: 15-Oct-2019
  • (2019)COCO-CN for Cross-Lingual Image Tagging, Captioning, and RetrievalIEEE Transactions on Multimedia10.1109/TMM.2019.289649421:9(2347-2360)Online publication date: Sep-2019

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
CBMI '17: Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing
June 2017
237 pages
ISBN:9781450353335
DOI:10.1145/3095713
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 June 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Cross-lingual image annotation
  2. English-Chinese
  3. zero-shot learning

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

CBMI '17

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 25 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2019)Deep Learning for Video Retrieval by Natural LanguageProceedings of the 1st International Workshop on Fairness, Accountability, and Transparency in MultiMedia10.1145/3347447.3350565(2-3)Online publication date: 15-Oct-2019
  • (2019)COCO-CN for Cross-Lingual Image Tagging, Captioning, and RetrievalIEEE Transactions on Multimedia10.1109/TMM.2019.289649421:9(2347-2360)Online publication date: Sep-2019

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media