research-article

Domain-Specific Embedding Network for Zero-Shot Recognition

Authors:

Yongdong ZhangAuthors Info & Claims

MM '19: Proceedings of the 27th ACM International Conference on Multimedia

Pages 2070 - 2078

https://doi.org/10.1145/3343031.3351092

Published: 15 October 2019 Publication History

Abstract

Zero-Shot Learning (ZSL) seeks to recognize a sample from either seen or unseen domain by projecting the image data and semantic labels into a joint embedding space. However, most existing methods directly adapt a well-trained projection from one domain to another, thereby ignoring the serious bias problem caused by domain differences. To address this issue, we propose a novel Domain-Specific Embedding Network (DSEN) that can apply specific projections to different domains for unbiased embedding, as well as several domain constraints. In contrast to previous methods, the DSEN decomposes the domain-shared projection function into one domain-invariant and two domain-specific sub-functions to explore the similarities and differences between two domains. To prevent the two specific projections from breaking the semantic relationship, a semantic reconstruction constraint is proposed by applying the same decoder function to them in a cycle consistency way. Furthermore, a domain division constraint is developed to directly penalize the margin between real and pseudo image features in respective seen and unseen domains, which can enlarge the inter-domain difference of visual features. Extensive experiments on four public benchmarks demonstrate the effectiveness of DSEN with an average of $9.2%$ improvement in terms of harmonic mean. The code is available in \urlhttps://github.com/mboboGO/DSEN-for-GZSL.

References

[1]

Zeynep Akata, Florent Perronnin, Zaid Harchaoui, and Cordelia Schmid. 2013. Label-embedding for attribute-based classification. In 2013 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 819--826.

Digital Library

[2]

Zeynep Akata, Florent Perronnin, Zaid Harchaoui, and Cordelia Schmid. 2016. Label-embedding for image classification. IEEE transactions on pattern analysis and machine intelligence, Vol. 38, 7 (2016), 1425--1438.

[3]

Zeynep Akata, Scott Reed, Daniel Walter, Honglak Lee, and Bernt Schiele. 2015. Evaluation of output embeddings for fine-grained image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . 2927--2936.

[4]

Yashas Annadani and Soma Biswas. 2018. Preserving Semantic Relations for Zero-Shot Learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7603--7612.

[5]

Maxime Bucher, Stéphane Herbin, and Frédéric Jurie. 2017. Generating visual representations for zero-shot classification. In Proceedings of the IEEE International Conference on Computer Vision. 2666--2673.

[6]

Soravit Changpinyo, Wei-Lun Chao, Boqing Gong, and Fei Sha. 2016. Synthesized classifiers for zero-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5327--5336.

[7]

Long Chen, Hanwang Zhang, Jun Xiao, Wei Liu, and Shih-Fu Chang. 2018. Zero-Shot Visual Recognition using Semantics-Preserving Adversarial Embedding Network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Vol. 2.

[8]

Shancheng Fang, Hongtao Xie, Zheng-Jun Zha, Nannan Sun, Jianlong Tan, and Yongdong Zhang. 2018. Attention and language ensemble for scene text recognition with convolutional sequence modeling. In 2018 ACM Multimedia Conference on Multimedia Conference. ACM, 248--256.

Digital Library

[9]

Ali Farhadi, Ian Endres, Derek Hoiem, and David Forsyth. 2009. Describing objects by their attributes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 1778--1785.

[10]

Andrea Frome, Greg S Corrado, Jon Shlens, Samy Bengio, Jeff Dean, Tomas Mikolov, et almbox. 2013. Devise: A deep visual-semantic embedding model. In Advances in neural information processing systems. 2121--2129.

Digital Library

[11]

Yanwei Fu, Timothy M Hospedales, Tao Xiang, Zhenyong Fu, and Shaogang Gong. 2014. Transductive multi-view embedding for zero-shot recognition and annotation. In European Conference on Computer Vision. Springer, 584--599.

[12]

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in neural information processing systems. 2672--2680.

[13]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.

[14]

Xiangteng He and Yuxin Peng. 2018. Only Learn One Sample: Fine-Grained Visual Categorization with One Sample Training. In 2018 ACM Multimedia Conference on Multimedia Conference. ACM, 1372--1380.

Digital Library

[15]

Huajie Jiang, Ruiping Wang, Shiguang Shan, and Xilin Chen. 2018. Learning class prototypes via structure alignment for zero-shot recognition. In Proceedings of the European conference on computer vision. 118--134.

[16]

Huajie Jiang, Ruiping Wang, Shiguang Shan, Yi Yang, and Xilin Chen. 2017. Learning discriminative latent attributes for zero-shot classification. In Proceedings of the IEEE International Conference on Computer Vision . 4223--4232.

[17]

Elyor Kodirov, Tao Xiang, Zhenyong Fu, and Shaogang Gong. 2015. Unsupervised domain adaptation for zero-shot learning. In Proceedings of the IEEE International Conference on Computer Vision. 2452--2460.

Digital Library

[18]

Elyor Kodirov, Tao Xiang, and Shaogang Gong. 2017. Semantic autoencoder for zero-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3174--3183.

[19]

Vinay Kumar Verma, Gundeep Arora, Ashish Mishra, and Piyush Rai. 2018. Generalized zero-shot learning via synthesized examples. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4281--4289.

[20]

Christoph H Lampert, Hannes Nickisch, and Stefan Harmeling. 2009. Learning to detect unseen object classes by between-class attribute transfer. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 951--958.

[21]

Christoph H Lampert, Hannes Nickisch, and Stefan Harmeling. 2014. Attribute-based classification for zero-shot visual object categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 36, 3 (2014), 453--465.

Digital Library

[22]

Angeliki Lazaridou, Georgiana Dinu, and Marco Baroni. 2015. Hubness and pollution: Delving into cross-space mapping for zero-shot learning. In the 7th International Joint Conference on Natural Language Processing), Vol. 1. 270--280.

[23]

Yan Li, Junge Zhang, Jianguo Zhang, and Kaiqi Huang. 2018. Discriminative Learning of Latent Features for Zero-Shot Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . 7463--7471.

[24]

Teng Long, Xing Xu, Youyou Li, Fumin Shen, Jingkuan Song, and Heng Tao Shen. 2018. Pseudo transfer with marginalized corrupted attribute for zero-shot learning. In 2018 ACM international conference on Multimedia. ACM, 1802--1810.

Digital Library

[25]

Ashish Mishra, Shiva Krishna Reddy, Anurag Mittal, and Hema A Murthy. 2018. A generative model for zero shot learning using conditional variational autoencoders. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2188--2196.

[26]

Pedro Morgado and Nuno Vasconcelos. 2017. Semantically consistent regularization for zero-shot recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Vol. 9. 10.

[27]

Yulei Niu, Zhiwu Lu, Songfang Huang, Xin Gao, and Ji-Rong Wen. 2017. FeaBoost: Joint Feature and Label Refinement for Semantic Segmentation. In AAAI . 1474--1480.

[28]

Mark Palatucci, Dean Pomerleau, Geoffrey E Hinton, and Tom M Mitchell. 2009. Zero-shot learning with semantic output codes. In Advances in neural information processing systems. 1410--1418.

[29]

Genevieve Patterson and James Hays. 2012. Sun attribute database: Discovering, annotating, and recognizing scene attributes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2751--2758.

Digital Library

[30]

Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing. 1532--1543.

[31]

Ruizhi Qiao, Lingqiao Liu, Chunhua Shen, and Anton van den Hengel. 2016. Less is more: zero-shot learning from online textual documents with noise suppression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2249--2257.

[32]

Milovs Radovanović, Alexandros Nanopoulos, and Mirjana Ivanović. 2010. Hubs in space: Popular nearest neighbors in high-dimensional data. Journal of Machine Learning Research, Vol. 11, Sep (2010), 2487--2531.

Digital Library

[33]

Bernardino Romera-Paredes and Philip Torr. 2015. An embarrassingly simple approach to zero-shot learning. In International Conference on Machine Learning. 2152--2161.

Digital Library

[34]

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et almbox. 2015. Imagenet large scale visual recognition challenge. International Journal of Computer Vision, Vol. 115, 3 (2015), 211--252.

Digital Library

[35]

Yutaro Shigeto, Ikumi Suzuki, Kazuo Hara, Masashi Shimbo, and Yuji Matsumoto. 2015. Ridge regression, hubness, and zero-shot learning. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 135--151.

Digital Library

[36]

Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).

[37]

Richard Socher, Milind Ganjoo, Christopher D Manning, and Andrew Ng. 2013. Zero-shot learning through cross-modal transfer. In Advances in neural information processing systems. 935--943.

[38]

Jie Song, Chengchao Shen, Yezhou Yang, Yang Liu, and Mingli Song. 2018. Transductive Unbiased Embedding for Zero-Shot Learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1024--1033.

[39]

Nenad Tomasev, Milos Radovanovic, Dunja Mladenic, and Mirjana Ivanovic. 2014. The role of hubness in clustering high-dimensional data. IEEE transactions on knowledge and data engineering, Vol. 26, 3 (2014), 739--751.

Digital Library

[40]

Chaojie Wang, Bo Chen, Sucheng Xiao, and Mingyuan Zhou. 2019. Convolutional Poisson Gamma Belief Network. In ICML .

[41]

Chaojie Wang, Bo Chen, and Mingyuan Zhou. 2018. Multimodal Poisson gamma belief network. In Thirty-Second AAAI Conference on Artificial Intelligence .

[42]

Peter Welinder, Steve Branson, Takeshi Mita, Catherine Wah, Florian Schroff, Serge Belongie, and Pietro Perona. 2010. Caltech-UCSD birds 200. (2010).

[43]

Yongqin Xian, Zeynep Akata, Gaurav Sharma, Quynh Nguyen, Matthias Hein, and Bernt Schiele. 2016. Latent embeddings for zero-shot classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 69--77.

[44]

Yongqin Xian, Christoph H Lampert, Bernt Schiele, and Zeynep Akata. 2018a. Zero-shot learning-a comprehensive evaluation of the good, the bad and the ugly. IEEE transactions on pattern analysis and machine intelligence (2018).

[45]

Yongqin Xian, Tobias Lorenz, Bernt Schiele, and Zeynep Akata. 2018b. Feature generating networks for zero-shot learning. In Proceedings of the IEEE conference on computer vision and pattern recognition. 5542--5551.

[46]

Hongtao Xie, Dongbao Yang, Nannan Sun, Zhineng Chen, and Yongdong Zhang. 2019. Automated pulmonary nodule detection in CT images using deep convolutional neural networks. Pattern Recognition, Vol. 85 (2019), 109--119.

[47]

Yang Yang, Yadan Luo, Weilun Chen, Fumin Shen, Jie Shao, and Heng Tao Shen. 2016. Zero-shot hashing via transferring supervised knowledge. In Proceedings of the 24th ACM international conference on Multimedia. ACM, 1286--1295.

Digital Library

[48]

Hongguang Zhang and Piotr Koniusz. 2018. Zero-shot kernel learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7670--7679.

[49]

Li Zhang, Tao Xiang, and Shaogang Gong. 2017. Learning a deep embedding model for zero-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2021--2030.

[50]

Feng Zheng, Xin Miao, and Heng Huang. 2018. Fast vehicle identification via ranked semantic sampling based embedding. In Proceedings of the 27th International Joint Conference on Artificial Intelligence. AAAI Press, 3697--3703.

Cited By

Tian YZhang YHuang YXu WDing Z(2024)Differential Refinement Network for Zero-Shot LearningIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.320188335:3(4164-4178)Online publication date: Mar-2024
https://doi.org/10.1109/TNNLS.2022.3201883
Yin WXie HZhang LGe JLi PLiu CZhang YEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)Frequency-based Zero-Shot Learning with Phase AugmentationProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3611990(3181-3189)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3611990
Zhang HTian LWang ZXu YCheng PBai KChen B(2023)Multiscale Visual-Attribute Co-Attention for Zero-Shot Image RecognitionIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2021.313236634:9(6003-6014)Online publication date: Sep-2023
https://doi.org/10.1109/TNNLS.2021.3132366
Show More Cited By

Index Terms

Domain-Specific Embedding Network for Zero-Shot Recognition
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Object recognition
      2. Computer vision representations
        Image representations
  2. Machine learning
    1. Machine learning approaches
      1. Learning latent representations
      2. Neural networks

Recommendations

Adversarial Learning for Zero-Shot Domain Adaptation
Computer Vision – ECCV 2020
Abstract
Zero-shot domain adaptation (ZSDA) is a category of domain adaptation problems where neither data sample nor label is available for parameter learning in the target domain. With the hypothesis that the shift between a given pair of domains is ...
Cross-domain mapping learning for transductive zero-shot learning
Abstract
Zero-shot learning (ZSL) aims to learn a projection function from a visual feature space to a semantic embedding space or reverse. The main challenge of ZSL is the domain shift problem where the unseen test data has a large gap with ...
Highlights
- Our general algorithm can extend inductive ZSL methods to transductive scenarios.
Learning cross-domain semantic-visual relationships for transductive zero-shot learning
Highlights
- We propose to address ZSL as a standard domain adaptation task.
- A novel domain-...
Abstract
Zero-Shot Learning (ZSL) learns models for recognizing new classes. One of the main challenges in ZSL is the domain discrepancy caused by the category inconsistency between training and testing data. Domain adaptation is the most ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '19: Proceedings of the 27th ACM International Conference on Multimedia

October 2019

2794 pages

ISBN:9781450368896

DOI:10.1145/3343031

General Chairs:
Laurent Amsaleg
CNRS-IRISA, France
,
Benoit Huet
EURECOM, France
,
Martha Larson
Radboud University and TU Delft (Netherlands)
,
Program Chairs:
Guillaume Gravier
CNRS-IRISA, France
,
Hayley Hung
Delft University of Technology Netherlands
,
Chong-Wah Ngo
City University of Hong Kong Hong Kong
,
Wei Tsang Ooi
National University of Singapore Singapore

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 October 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Defense Science and Technology Fund for Distinguished Young Scholars
the National Nature Science Foundation of China
the Youth Innovation Promotion Association Chinese Academy of Sciences
National Postdoctoral Programme for Innovative Talents
the National Key Research and Development Program of China

Conference

MM '19

Sponsor:

SIGMM

MM '19: The 27th ACM International Conference on Multimedia

October 21 - 25, 2019

Nice, France

Acceptance Rates

MM '19 Paper Acceptance Rate 252 of 936 submissions, 27%;

Overall Acceptance Rate 995 of 4,171 submissions, 24%

Upcoming Conference

MM '24

Sponsor:
sigmm

The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

12
Total Citations
View Citations
267
Total Downloads

Downloads (Last 12 months)14
Downloads (Last 6 weeks)1

Reflects downloads up to 18 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Tian YZhang YHuang YXu WDing Z(2024)Differential Refinement Network for Zero-Shot LearningIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.320188335:3(4164-4178)Online publication date: Mar-2024
https://doi.org/10.1109/TNNLS.2022.3201883
Yin WXie HZhang LGe JLi PLiu CZhang YEl Saddik AMei TCucchiara RBertini MTobon Vallejo DAtrey PHossain M(2023)Frequency-based Zero-Shot Learning with Phase AugmentationProceedings of the 31st ACM International Conference on Multimedia10.1145/3581783.3611990(3181-3189)Online publication date: 26-Oct-2023
https://dl.acm.org/doi/10.1145/3581783.3611990
Zhang HTian LWang ZXu YCheng PBai KChen B(2023)Multiscale Visual-Attribute Co-Attention for Zero-Shot Image RecognitionIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2021.313236634:9(6003-6014)Online publication date: Sep-2023
https://doi.org/10.1109/TNNLS.2021.3132366
Ren YCong YDong JSun G(2023)Uni3DA: Universal 3D Domain Adaptation for Object RecognitionIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2022.320221333:1(379-392)Online publication date: Jan-2023
https://doi.org/10.1109/TCSVT.2022.3202213
Xu ZWang GWong YKankanhalli M(2022)Relation-Aware Compositional Zero-Shot Learning for Attribute-Object Pair RecognitionIEEE Transactions on Multimedia10.1109/TMM.2021.310441124(3652-3664)Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1109/TMM.2021.3104411
An YKim SLiang YZimmermann RKim DKim J(2022)Content-Attribute Disentanglement for Generalized Zero-Shot LearningIEEE Access10.1109/ACCESS.2022.317880010(58320-58331)Online publication date: 2022
https://doi.org/10.1109/ACCESS.2022.3178800
Min SYao HXie HZha ZZhang Y(2021)Domain-Oriented Semantic Embedding for Zero-Shot LearningIEEE Transactions on Multimedia10.1109/TMM.2020.303312423(3919-3930)Online publication date: 1-Jan-2021
https://dl.acm.org/doi/10.1109/TMM.2020.3033124
Xu QFang FGauthier NLiang WWu YLi LLim J(2021)Towards Efficient Multiview Object Detection with Adaptive Action Prediction2021 IEEE International Conference on Robotics and Automation (ICRA)10.1109/ICRA48506.2021.9561388(13423-13429)Online publication date: 30-May-2021
https://doi.org/10.1109/ICRA48506.2021.9561388
Gune OBanerjee BChaudhuri SCuzzolin FWen Chen CCucchiara RHua XQi GRicci EZhang ZZimmermann R(2020)Generalized Zero-Shot Learning using Generated Proxy Unseen Samples and Entropy SeparationProceedings of the 28th ACM International Conference on Multimedia10.1145/3394171.3413657(4262-4270)Online publication date: 12-Oct-2020
https://dl.acm.org/doi/10.1145/3394171.3413657
Wang XWu FWang J(2020)Self-Adaptive Embedding For Few-Shot Classification By Hierarchical Attention2020 IEEE International Conference on Multimedia and Expo (ICME)10.1109/ICME46284.2020.9102830(1-6)Online publication date: Jul-2020
https://doi.org/10.1109/ICME46284.2020.9102830
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents