research-article

Free access

Just Accepted

Contextual Interaction via Primitive-based Adversarial Training for Compositional Zero-shot Learning

Authors:

Haofeng ZhangAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications and Applications

Accepted on 11 January 2025

https://doi.org/10.1145/3712596

Online AM: 17 January 2025 Publication History

Abstract

Compositional Zero-shot Learning (CZSL) aims to identify novel compositions via known attribute-object pairs. The primary challenge in CZSL tasks lies in the significant discrepancies introduced by the complex interaction between the visual primitives of attribute and object, consequently decreasing the classification performance towards novel compositions. Previous remarkable works primarily addressed this issue by focusing on disentangling strategy or utilizing object-based conditional probabilities to constrain the selection space of attributes. Unfortunately, few studies have explored the problem from the perspective of modeling the mechanism of visual primitive interactions. Inspired by the success of vanilla adversarial learning in Cross-Domain Few-Shot Learning, we take a step further and devise a model-agnostic and Primitive-Based Adversarial training (PBadv) method to deal with this problem. Besides, the latest studies highlight the weakness of the perception of hard compositions even under data-balanced conditions. To this end, we propose a novel over-sampling strategy with object-similarity guidance to augment target compositional training data. We performed detailed quantitative analysis and retrieval experiments on well-established datasets, such as UT-Zappos50K, MIT-States, and C-GQA, to validate the effectiveness of our proposed method, and the state-of-the-art (SOTA) performance demonstrates the superiority of our approach. The code is available at https://github.com/lisuyi/PBadv_czsl.

References

[1]

Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. 2016. Layer normalization. In NeurIPS.

[2]

Yi Bin, Wenhao Shi, Yujuan Ding, Zhiqiang Hu, Zheng Wang, Yang Yang, See-Kiong Ng, and Heng Tao Shen. 2024. GalleryGPT: Analyzing Paintings with Large Multimodal Models. In ACM MM. 7734––7743.

[3]

Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. 2021. Emerging properties in self-supervised vision transformers. In ICCV. 9650–9660.

[4]

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In CVPR. 248–255.

[5]

Xibin Dong, Zhiwen Yu, Wenming Cao, Yifan Shi, and Qianli Ma. 2020. A survey on ensemble learning. Frontiers of Computer Science 14 (2020), 241–258.

Digital Library

[6]

Yinpeng Dong, Fangzhou Liao, Tianyu Pang, Hang Su, Jun Zhu, Xiaolin Hu, and Jianguo Li. 2018. Boosting adversarial attacks with momentum. In CVPR. 9185–9193.

[7]

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In ICLR.

[8]

Mingxing Duan, Kenli Li, Jiayan Deng, Bin Xiao, and Qi Tian. 2022. A novel multi-sample generation method for adversarial attacks. ACM TOMM 18, 4 (2022), 1–21.

Digital Library

[9]

Yuqian Fu, Yu Xie, Yanwei Fu, and Yu-Gang Jiang. 2023. Styleadv: Meta style adversarial training for cross-domain few-shot learning. In CVPR. 24575–24584.

[10]

Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. 2015. Explaining and harnessing adversarial examples. In ICLR. 1–14.

[11]

Shaozhe Hao, Kai Han, and Kwan-Yee K Wong. 2023. Learning Attention as Disentangler for Compositional Zero-shot Learning. In CVPR. 15315–15324.

[12]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In CVPR. 770–778.

[13]

Yanxu Hu and Andy J Ma. 2022. Adversarial feature augmentation for cross-domain few-shot classification. In ECCV. Springer, 20–37.

[14]

Siteng Huang, Biao Gong, Yutong Feng, Min Zhang, Yiliang Lv, and Donglin Wang. 2024. Troika: Multi-path cross-modal traction for compositional zero-shot learning. In CVPR. 24005–24014.

[15]

Phillip Isola, Joseph J Lim, and Edward H Adelson. 2015. Discovering states and transformations in image collections. In CVPR. 1383–1391.

[16]

Chenyi Jiang and Haofeng Zhang. 2024. Revealing the Proximate Long-Tail Distribution in Compositional Zero-Shot Learning. In AAAI, Vol. 38. 2498–2506.

[17]

Shyamgopal Karthik, Massimiliano Mancini, and Zeynep Akata. 2021. Revisiting visual product for compositional zero-shot learning. In NeurIPS Workshop.

[18]

Shyamgopal Karthik, Massimiliano Mancini, and Zeynep Akata. 2022. KG-SP: Knowledge Guided Simple Primitives for Open World Compositional Zero-Shot Learning. In CVPR. 9336–9345.

[19]

Hanjae Kim, Jiyoung Lee, Seongheon Park, and Kwanghoon Sohn. 2023. Hierarchical Visual Primitive Experts for Compositional Zero-Shot Learning. In ICCV. 5675–5685.

[20]

Diederik P Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In ICLR. 1–14.

[21]

Alexey Kurakin, Ian Goodfellow, and Samy Bengio. 2017. Adversarial machine learning at scale. In ICLR.

[22]

Brenden M Lake. 2014. Towards more human-like concept learning in machines: Compositionality, causality, and learning-to-learn. Ph. D. Dissertation. Massachusetts Institute of Technology.

[23]

Suyi Li, Chenyi Jiang, Qiaolin Ye, Shidong Wang, Wankou Yang, and Haofeng Zhang. 2024. Fusing spatial and frequency features for compositional zero-shot image classification. Expert Systems with Applications (2024), 125230.

[24]

Yong-Lu Li, Yue Xu, Xiaohan Mao, and Cewu Lu. 2020. Symmetry and group in attribute-object compositions. In CVPR. 11316–11325.

[25]

Yanpei Liu, Xinyun Chen, Chang Liu, and Dawn Song. 2017. Delving into transferable adversarial examples and black-box attacks. In ICLR.

[26]

Xiaocheng Lu, Song Guo, Ziming Liu, and Jingcai Guo. 2023. Decomposed soft prompt guided fusion enhancing for compositional zero-shot learning. In CVPR. 23560–23569.

[27]

Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. 2018. Towards deep learning models resistant to adversarial attacks. In ICLR.

[28]

Massimiliano Mancini, Muhammad Ferjad Naeem, Yongqin Xian, and Zeynep Akata. 2021. Open world compositional zero-shot learning. In CVPR. 5222–5230.

[29]

Massimiliano Mancini, Muhammad Ferjad Naeem, Yongqin Xian, and Zeynep Akata. 2022. Learning graph embeddings for open world compositional zero-shot learning. IEEE TPAMI 46, 3 (2022), 1545–1560.

Digital Library

[30]

Ishan Misra, Abhinav Gupta, and Martial Hebert. 2017. From red wine to red tomato: Composition with context. In CVPR. 1792–1801.

[31]

Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Omar Fawzi, and Pascal Frossard. 2017. Universal adversarial perturbations. In CVPR. 1765–1773.

[32]

Muhammad Ferjad Naeem, Yongqin Xian, Federico Tombari, and Zeynep Akata. 2021. Learning graph embeddings for compositional zero-shot learning. In CVPR. 953–962.

[33]

Tushar Nagarajan and Kristen Grauman. 2018. Attributes as operators: factorizing unseen attribute-object compositions. In ECCV. 169–185.

[34]

Vinod Nair and Geoffrey E Hinton. 2010. Rectified linear units improve restricted boltzmann machines. In ICML. 807–814.

[35]

Nihal V. Nayak, Peilin Yu, and Stephen H. Bach. 2023. Learning to Compose Soft Prompts for Compositional Zero-Shot Learning. In ICLR.

[36]

Aditya Panda and Dipti Prasad Mukherjee. 2024. Compositional Zero-Shot Learning using Multi-Branch Graph Convolution and Cross-layer Knowledge Sharing. Pattern Recognition 145 (2024), 109916.

Digital Library

[37]

Jeffrey Pennington, Richard Socher, and Christopher D Manning. 2014. Glove: Global vectors for word representation. In EMNLP. 1532–1543.

[38]

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. In ICML. 8748–8763.

[39]

Omer Sagi and Lior Rokach. 2018. Ensemble learning: A survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 8, 4 (2018), e1249.

[40]

Nirat Saini, Khoi Pham, and Abhinav Shrivastava. 2022. Disentangling Visual Embeddings for Attributes and Objects. In CVPR. 13658–13667.

[41]

Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting. JMLR 15, 1 (2014), 1929–1958.

Digital Library

[42]

Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. 2013. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013).

[43]

Bowen Tang, Zheng Wang, Yi Bin, Qi Dou, Yang Yang, and Heng Tao Shen. 2024. Ensemble Diversity Facilitates Adversarial Transferability. In CVPR. 24377–24386.

[44]

Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Ian Goodfellow, Dan Boneh, and Patrick McDaniel. 2018. Ensemble adversarial training: Attacks and defenses. In ICLR.

[45]

Haoqing Wang and Zhi-Hong Deng. 2021. Cross-domain few-shot classification via adversarial task augmentation. In IJCAI. 1075–1081.

[46]

Hanrui Wang, Shuo Wang, Cunjian Chen, Massimo Tistarelli, and Zhe Jin. [n. d.]. A Multi-task Adversarial Attack Against Face Authentication. ACM TOMM ([n. d.]). https://doi.org/10.1145/3665496

Digital Library

[47]

Qingsheng Wang, Lingqiao Liu, Chenchen Jing, Hao Chen, Guoqiang Liang, Peng Wang, and Chunhua Shen. 2023. Learning Conditional Attributes for Compositional Zero-Shot Learning. In CVPR. 11197–11206.

[48]

Ren Wang, Kaidi Xu, Sijia Liu, Pin-Yu Chen, Tsui-Wei Weng, Chuang Gan, and Meng Wang. 2021. On fast adversarial robustness adaptation in model-agnostic meta-learning. In ICLR.

[49]

Guangyue Xu, Joyce Chai, and Parisa Kordjamshidi. 2024. GIPCOL: Graph-Injected Soft Prompting for Compositional Zero-Shot Learning. In WACV. 5774–5783.

[50]

Yikun Xu, Xingxing Wei, Pengwen Dai, and Xiaochun Cao. 2023. A2SC: Adversarial Attacks on Subspace Clustering. ACM TOMM 19, 6 (2023), 1–23.

[51]

Muli Yang, Cheng Deng, Junchi Yan, Xianglong Liu, and Dacheng Tao. 2020. Learning unseen concepts via hierarchical decomposition and composition. In CVPR. 10248–10256.

[52]

Muli Yang, Chenghao Xu, Aming Wu, and Cheng Deng. 2023. A Decomposable Causal View of Compositional Zero-Shot Learning. IEEE TMM 25 (2023), 5892–5902.

Digital Library

[53]

Aron Yu and Kristen Grauman. 2014. Fine-grained visual comparisons with local learning. In CVPR. 192–199.

[54]

Yawen Zeng, Da Cao, Shaofei Lu, Hanling Zhang, Jiao Xu, and Zheng Qin. 2022. Moment is important: Language-based video moment retrieval via adversarial learning. ACM TOMM 18, 2 (2022), 1–21.

[55]

Tian Zhang, Kongming Liang, Ruoyi Du, Xian Sun, Zhanyu Ma, and Jun Guo. 2022. Learning invariant visual representations for compositional zero-shot learning. In ECCV. 339–355.

[56]

Xiao Zhang, Rui Zhao, Yu Qiao, Xiaogang Wang, and Hongsheng Li. 2019. Adacos: Adaptively scaling cosine logits for effectively learning deep face representations. In CVPR. 10823–10832.

[57]

Zhaoheng Zheng, Haidong Zhu, and Ram Nevatia. 2024. CAILA: Concept-Aware Intra-Layer Adapters for Compositional Zero-Shot Learning. In WACV. 1721–1731.

Index Terms

Contextual Interaction via Primitive-based Adversarial Training for Compositional Zero-shot Learning
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Scene understanding
        Visual content-based indexing and retrieval

Recommendations

Reference-Limited Compositional Zero-Shot Learning
ICMR '23: Proceedings of the 2023 ACM International Conference on Multimedia Retrieval

Compositional zero-shot learning (CZSL) refers to recognizing unseen compositions of known visual primitives, which is an essential ability for artificial intelligence systems to learn and understand the world. While considerable progress has been made ...
Zero-shot classification with unseen prototype learning
Abstract
Zero-shot learning (ZSL) aims at recognizing instances from unseen classes via training a classification model with only seen data. Most existing approaches easily suffer from the classification bias from unseen to seen categories since the models ...
Transductive Visual-Semantic Embedding for Zero-shot Learning
ICMR '17: Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval

Zero-shot learning (ZSL) aims to bridge the knowledge transfer via available semantic representations (e.g., attributes) between labeled source instances of seen classes and unlabelled target instances of unseen classes. Most existing ZSL approaches ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications

ACM Transactions on Multimedia Computing, Communications, and Applications Just Accepted

EISSN:1551-6865

Table of Contents

Copyright © 2025 Copyright held by the owner/author(s).

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Online AM: 17 January 2025

Accepted: 11 January 2025

Revised: 27 October 2024

Received: 14 June 2024

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
44
Total Downloads

Downloads (Last 12 months)44
Downloads (Last 6 weeks)25

Reflects downloads up to 08 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Figures

Tables

Media