research-article

Dual Projective Zero-Shot Learning Using Text Descriptions

Authors:

Jiansu PuAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications and Applications, Volume 19, Issue 1

Article No.: 10, Pages 1 - 17

https://doi.org/10.1145/3514247

Published: 05 January 2023 Publication History

Abstract

Zero-shot learning (ZSL) aims to recognize image instances of unseen classes solely based on the semantic descriptions of the unseen classes. In this field, Generalized Zero-Shot Learning (GZSL) is a challenging problem in which the images of both seen and unseen classes are mixed in the testing phase of learning. Existing methods formulate GZSL as a semantic-visual correspondence problem and apply generative models such as Generative Adversarial Networks and Variational Autoencoders to solve the problem. However, these methods suffer from the bias problem since the images of unseen classes are often misclassified into seen classes. In this work, a novel model named the Dual Projective model for Zero-Shot Learning (DPZSL) is proposed using text descriptions. In order to alleviate the bias problem, we leverage two autoencoders to project the visual and semantic features into a latent space and evaluate the embeddings by a visual-semantic correspondence loss function. An additional novel classifier is also introduced to ensure the discriminability of the embedded features. Our method focuses on a more challenging inductive ZSL setting in which only the labeled data from seen classes are used in the training phase. The experimental results, obtained from two popular datasets—Caltech-UCSD Birds-200-2011 (CUB) and North America Birds (NAB)—show that the proposed DPZSL model significantly outperforms both the inductive ZSL and GZSL settings. Particularly in the GZSL setting, our model yields an improvement up to 15.2% in comparison with state-of-the-art CANZSL on datasets CUB and NAB with two splittings.

References

[1]

Zeynep Akata, Mateusz Malinowski, Mario Fritz, and Bernt Schiele. 2016. Multi-cue zero-shot learning with strong supervision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Las Vegas, Nevada, USA, 59–68.

[2]

Zeynep Akata, Florent Perronnin, Zaid Harchaoui, and Cordelia Schmid. 2015. Label-embedding for image classification. IEEE Transactions on Pattern Analysis and Machine Intelligence 38, 7 (2015), 1425–1438.

[3]

Zeynep Akata, Scott Reed, Daniel Walter, Honglak Lee, and Bernt Schiele. 2015. Evaluation of output embeddings for fine-grained image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Boston, Massachusetts, USA, 2927–2936.

[4]

Yashas Annadani and Soma Biswas. 2018. Preserving semantic relations for zero-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Salt Lake City, Utah, USA, 7603–7612.

[5]

Vijay Badrinarayanan, Alex Kendall, and Roberto Cipolla. 2017. SegNet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 12 (2017), 2481–2495.

[6]

Soravit Changpinyo, Wei-Lun Chao, Boqing Gong, and Fei Sha. 2016. Synthesized classifiers for zero-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Las Vegas, Nevada, USA, 5327–5336.

[7]

Soravit Changpinyo, Wei-Lun Chao, Boqing Gong, and Fei Sha. 2020. Classifier and exemplar synthesis for zero-shot learning. International Journal of Computer Vision 128, 1 (2020), 166–201.

Digital Library

[8]

Wei-Lun Chao, Soravit Changpinyo, Boqing Gong, and Fei Sha. 2016. An empirical study and analysis of generalized zero-shot learning for object recognition in the wild. In European Conference on Computer Vision. Springer, Amsterdam, the Netherlands, 52–68.

[9]

Zhi Chen, Jingjing Li, Yadan Luo, Zi Huang, and Yang Yang. 2020. CANZSL: Cycle-consistent adversarial networks for zero-shot learning from natural language. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. IEEE, Snowmass village, Colorado, 874–883.

[10]

Yu-Ying Chou, Hsuan-Tien Lin, and Tyng-Luh Liu. 2020. Adaptive and generative zero-shot learning. In International Conference on Learning Representations. IEEE, Vienna, Austria.

[11]

Peng Cui, Shaowei Liu, and Wenwu Zhu. 2017. General knowledge embedded image representation learning. IEEE Transactions on Multimedia 20, 1 (2017), 198–207.

Digital Library

[12]

Mohamed Elhoseiny and Mohamed Elfeki. 2019. Creativity inspired zero-shot learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision. IEEE, Seoul, Korea (South), 5784–5793.

[13]

Mohamed Elhoseiny, Ahmed Elgammal, and Babak Saleh. 2016. Write a classifier: Predicting visual classifiers from unstructured text. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 12 (2016), 2539–2553.

[14]

Mohamed Elhoseiny, Yizhe Zhu, Han Zhang, and Ahmed Elgammal. 2017. Link the head to the “beak”: Zero shot learning from noisy text description at part precision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Honolulu, Hawaii, USA, 5640–5649.

[15]

Andrea Frome, Greg S. Corrado, Jon Shlens, Samy Bengio, Jeff Dean, Marc’Aurelio Ranzato, and Tomas Mikolov. 2013. DeVISE: A deep visual-semantic embedding model. Advances in Neural Information Processing Systems 26 (2013).

[16]

Ross Girshick. 2015. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision. IEEE, Santiago, Chile, 1440–1448.

Digital Library

[17]

Zhangxuan Gu, Siyuan Zhou, Li Niu, Zihan Zhao, and Liqing Zhang. 2020. Context-aware feature generation for zero-shot semantic segmentation. In Proceedings of the 28th ACM International Conference on Multimedia. ACM, New York, NY, USA, 1921–1929.

Digital Library

[18]

Xintong Han, Bharat Singh, Vlad I. Morariu, and Larry S. Davis. 2017. VRFP: On-the-fly video retrieval using web images and fast Fisher vector products. IEEE Transactions on Multimedia 19, 7 (2017), 1583–1595.

Digital Library

[19]

He Huang, Changhu Wang, Philip S. Yu, and Chang-Dong Wang. 2019. Generative dual adversarial network for generalized zero-shot learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, Long Beach, California, 801–810.

[20]

Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. International Conference on Learning Representations (ICLR’15). San Diego, CA, arXiv preprint arXiv:1412.6980 9 (2015).

[21]

Elyor Kodirov, Tao Xiang, Zhenyong Fu, and Shaogang Gong. 2015. Unsupervised domain adaptation for zero-shot learning. In Proceedings of the IEEE International Conference on Computer Vision. IEEE, Santiago, Chile, 2452–2460.

Digital Library

[22]

Elyor Kodirov, Tao Xiang, and Shaogang Gong. 2017. Semantic autoencoder for zero-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Honolulu, Hawaii, USA, 3174–3183.

[23]

Christoph H. Lampert, Hannes Nickisch, and Stefan Harmeling. 2013. Attribute-based classification for zero-shot visual object categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence 36, 3 (2013), 453–465.

Digital Library

[24]

Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature 521, 7553 (2015), 436–444.

[25]

Jingjing Li, Mengmeng Jing, Ke Lu, Lei Zhu, Yang Yang, and Zi Huang. 2019. Alleviating feature confusion for generative zero-shot learning. In Proceedings of the 27th ACM International Conference on Multimedia. ACM, New York, NY, USA, 1587–1595.

Digital Library

[26]

Kai Li, Martin Renqiang Min, and Yun Fu. 2019. Rethinking zero-shot learning: A conditional visual classification perspective. In Proceedings of the IEEE/CVF International Conference on Computer Vision. IEEE, Seoul, South Korea, 3583–3592.

[27]

Massimiliano Mancini, Muhammad Ferjad Naeem, Yongqin Xian, and Zeynep Akata. 2021. Open world compositional zero-shot learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, Nashville, TN, USA, 5222–5230.

[28]

Ashish Mishra, Shiva Krishna Reddy, Anurag Mittal, and Hema A. Murthy. 2018. A generative model for zero shot learning using conditional variational autoencoders. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. IEEE, Salt Lake City, Utah, USA, 2188–2196.

[29]

Pedro Morgado and Nuno Vasconcelos. 2017. Semantically consistent regularization for zero-shot recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Honolulu, HI, USA, 6060–6069.

[30]

Arghya Pal and Vineeth N. Balasubramanian. 2019. Zero-shot task transfer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, Long Beach, California, USA, 2189–2198.

[31]

Yuxin Peng and Jinwei Qi. 2019. CM-GANs: Cross-modal generative adversarial networks for common representation learning. ACM Transactions on Multimedia Computing, Communications, and Applications 15, 1 (2019), 1–24.

Digital Library

[32]

Ruizhi Qiao, Lingqiao Liu, Chunhua Shen, and Anton Van Den Hengel. 2016. Less is more: Zero-shot learning from online textual documents with noise suppression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Las Vegas, Nevada, USA, 2249–2257.

[33]

Shafin Rahman, Salman Khan, and Nick Barnes. 2019. Deep0tag: Deep multiple instance learning for zero-shot image tagging. IEEE Transactions on Multimedia 22, 1 (2019), 242–255.

Digital Library

[34]

Bernardino Romera-Paredes and Philip Torr. 2015. An embarrassingly simple approach to zero-shot learning. In International Conference on Machine Learning. JMLR.org, Lille, France, 2152–2161.

Digital Library

[35]

Gerard Salton and Christopher Buckley. 1988. Term-weighting approaches in automatic text retrieval. Information Processing & Management 24, 5 (1988), 513–523.

Digital Library

[36]

Edgar Schonfeld, Sayna Ebrahimi, Samarth Sinha, Trevor Darrell, and Zeynep Akata. 2019. Generalized zero- and few-shot learning via aligned variational autoencoders. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, Long Beach, CA, USA, 8247–8255.

[37]

Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).

[38]

Jie Song, Chengchao Shen, Yezhou Yang, Yang Liu, and Mingli Song. 2018. Transductive unbiased embedding for zero-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Salt Lake City, Utah, USA, 1024–1033.

[39]

Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE.Journal of Machine Learning Research 9, 11 (2008).

[40]

Grant Van Horn, Steve Branson, Ryan Farrell, Scott Haber, Jessie Barry, Panos Ipeirotis, Pietro Perona, and Serge Belongie. 2015. Building a bird recognition app and large scale dataset with citizen scientists: The fine print in fine-grained dataset collection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Boston, Massachusetts, USA, 595–604.

[41]

Vinay Kumar Verma, Dhanajit Brahma, and Piyush Rai. 2020. Meta-learning for generalized zero-shot learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. New York, USA, 6062–6069.

[42]

Catherine Wah, Steve Branson, Peter Welinder, Pietro Perona, and Serge Belongie. 2011. The Caltech-UCSD birds-200-2011 dataset. (2011).

[43]

Wei Wang, Vincent W. Zheng, Han Yu, and Chunyan Miao. 2019. A survey of zero-shot learning: Settings, methods, and applications. ACM Transactions on Intelligent Systems and Technology 10, 2 (2019), 1–37.

Digital Library

[44]

Zheng Wang, Ruimin Hu, Chao Liang, Yi Yu, Junjun Jiang, Mang Ye, Jun Chen, and Qingming Leng. 2015. Zero-shot person re-identification via cross-view consistency. IEEE Transactions on Multimedia 18, 2 (2015), 260–272.

Digital Library

[45]

Junyuan Xie, Ross Girshick, and Ali Farhadi. 2016. Unsupervised deep embedding for clustering analysis. In International Conference on Machine Learning. PMLR, New York, USA, 478–487.

Digital Library

[46]

Wenju Xu, Shawn Keshmiri, and Guanghui Wang. 2019. Adversarially approximated autoencoder for image generation and manipulation. IEEE Transactions on Multimedia 21, 9 (2019), 2387–2396.

[47]

Xinchen Yan, Jimei Yang, Kihyuk Sohn, and Honglak Lee. 2016. Attribute2image: Conditional image generation from visual attributes. In European Conference on Computer Vision. Springer, Amsterdam, The Netherlands, 776–791.

[48]

Lei Zhang, Peng Wang, Lingqiao Liu, Chunhua Shen, Wei Wei, Yanning Zhang, and Anton Van Den Hengel. 2020. Towards effective deep embedding for zero-shot learning. IEEE Transactions on Circuits and Systems for Video Technology 30, 9 (2020), 2843–2852.

Digital Library

[49]

Li Zhang, Tao Xiang, and Shaogang Gong. 2017. Learning a deep embedding model for zero-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Honolulu, Hawaii, USA, 2021–2030.

[50]

Ziming Zhang and Venkatesh Saligrama. 2015. Zero-shot learning via semantic similarity embedding. In Proceedings of the IEEE International Conference on Computer Vision. IEEE, Santiago, Chile, 4166–4174.

Digital Library

[51]

Ziming Zhang and Venkatesh Saligrama. 2016. Zero-shot learning via joint latent similarity embedding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Las Vegas, Nevada, USA, 6034–6042.

[52]

Yizhe Zhu, Mohamed Elhoseiny, Bingchen Liu, Xi Peng, and Ahmed Elgammal. 2018. A generative adversarial approach for zero-shot learning from noisy texts. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Salt Lake City, UT, USA, 1004–1013.

Cited By

Lazaros KKoumadorakis DVrahatis AKotsiantis S(2024)A comprehensive review on zero-shot-learning techniquesIntelligent Decision Technologies10.3233/IDT-24029718:2(1001-1028)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.3233/IDT-240297
Lazaros KKoumadorakis DVrahatis AKotsiantis S(2024)A comprehensive review on zero-shot-learning techniquesIntelligent Decision Technologies10.3233/IDT-24027(1-28)Online publication date: 17-Apr-2024
https://doi.org/10.3233/IDT-24027
Cen JZhao BLiu XHuang HChen DHuang HChen K(2024)An intelligent compound fault diagnosis method using generalized zero-shot model of bearingMeasurement Science and Technology10.1088/1361-6501/ad590035:9(096134)Online publication date: 28-Jun-2024
https://doi.org/10.1088/1361-6501/ad5900
Show More Cited By

Index Terms

Dual Projective Zero-Shot Learning Using Text Descriptions
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
  2. Machine learning
    1. Learning paradigms
      1. Multi-task learning
        Transfer learning

Recommendations

Multi-label Generalized Zero-Shot Learning Using Identifiable Variational Autoencoders
Extended Reality
Abstract
Multi-label Zero-Shot Learning (ZSL) is an extension of traditional single-label ZSL, where the objective is to accurately classify images containing multiple unseen classes that are not available during training. Current techniques depends on ...
Generalized Zero-Shot Learning using Identifiable Variational Autoencoders
Highlights
- Identifiable VAE is a generative model to address conventional and generalized ZSL.
Abstract
Deep learning tasks rely heavily on a large amount of training data, but collecting and annotating data daily is not practical. Therefore, Zero-shot learning (ZSL) has become important for the applications, where no labeled data is ...
Classifier and Exemplar Synthesis for Zero-Shot Learning
Abstract
Zero-shot learning (ZSL) enables solving a task without the need to see its examples. In this paper, we propose two ZSL frameworks that learn to synthesize parameters for novel unseen classes. First, we propose to cast the problem of ZSL as ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 19, Issue 1

January 2023

505 pages

ISSN:1551-6857

EISSN:1551-6865

DOI:10.1145/3572858

Editor:
Abdulmotaleb El Saddik
Mohamed Bin Zayed University of Artificial Intelligence, UAE and University of Ottawa, Canada

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 January 2023

Online AM: 29 July 2022

Accepted: 25 January 2022

Revised: 22 October 2021

Received: 30 May 2021

Published in TOMM Volume 19, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Refereed

Funding Sources

Science and Technology Project of Sichuan
National Natural Science Foundation of China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
419
Total Downloads

Downloads (Last 12 months)153
Downloads (Last 6 weeks)19

Reflects downloads up to 10 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Lazaros KKoumadorakis DVrahatis AKotsiantis S(2024)A comprehensive review on zero-shot-learning techniquesIntelligent Decision Technologies10.3233/IDT-24029718:2(1001-1028)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.3233/IDT-240297
Lazaros KKoumadorakis DVrahatis AKotsiantis S(2024)A comprehensive review on zero-shot-learning techniquesIntelligent Decision Technologies10.3233/IDT-24027(1-28)Online publication date: 17-Apr-2024
https://doi.org/10.3233/IDT-24027
Cen JZhao BLiu XHuang HChen DHuang HChen K(2024)An intelligent compound fault diagnosis method using generalized zero-shot model of bearingMeasurement Science and Technology10.1088/1361-6501/ad590035:9(096134)Online publication date: 28-Jun-2024
https://doi.org/10.1088/1361-6501/ad5900
Yang QLiao YLi JXie JRuan J(2024)Cross-domain zero-shot learning for enhanced fault diagnosis in high-voltage circuit breakersNeural Networks10.1016/j.neunet.2024.106681(106681)Online publication date: Aug-2024
https://doi.org/10.1016/j.neunet.2024.106681
Yang QLiao Y(2024)A novel mechanical fault diagnosis for high-voltage circuit breakers with zero-shot learningExpert Systems with Applications10.1016/j.eswa.2023.123133245(123133)Online publication date: Jul-2024
https://doi.org/10.1016/j.eswa.2023.123133
Li JWang YLi W(2023)Zero-shot Scene Graph Generation via Triplet Calibration and ReductionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/360428420:1(1-21)Online publication date: 8-Jun-2023
https://dl.acm.org/doi/10.1145/3604284
Cao WWu YSun YZhang HRen JGu DWang X(2023)A review on multimodal zero‐shot learningWIREs Data Mining and Knowledge Discovery10.1002/widm.148813:2Online publication date: 20-Jan-2023
https://doi.org/10.1002/widm.1488

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents