Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Free access
Just Accepted

Contextual Interaction via Primitive-based Adversarial Training for Compositional Zero-shot Learning

Online AM: 17 January 2025 Publication History

Abstract

Compositional Zero-shot Learning (CZSL) aims to identify novel compositions via known attribute-object pairs. The primary challenge in CZSL tasks lies in the significant discrepancies introduced by the complex interaction between the visual primitives of attribute and object, consequently decreasing the classification performance towards novel compositions. Previous remarkable works primarily addressed this issue by focusing on disentangling strategy or utilizing object-based conditional probabilities to constrain the selection space of attributes. Unfortunately, few studies have explored the problem from the perspective of modeling the mechanism of visual primitive interactions. Inspired by the success of vanilla adversarial learning in Cross-Domain Few-Shot Learning, we take a step further and devise a model-agnostic and Primitive-Based Adversarial training (PBadv) method to deal with this problem. Besides, the latest studies highlight the weakness of the perception of hard compositions even under data-balanced conditions. To this end, we propose a novel over-sampling strategy with object-similarity guidance to augment target compositional training data. We performed detailed quantitative analysis and retrieval experiments on well-established datasets, such as UT-Zappos50K, MIT-States, and C-GQA, to validate the effectiveness of our proposed method, and the state-of-the-art (SOTA) performance demonstrates the superiority of our approach. The code is available at https://github.com/lisuyi/PBadv_czsl.

References

[1]
Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. 2016. Layer normalization. In NeurIPS.
[2]
Yi Bin, Wenhao Shi, Yujuan Ding, Zhiqiang Hu, Zheng Wang, Yang Yang, See-Kiong Ng, and Heng Tao Shen. 2024. GalleryGPT: Analyzing Paintings with Large Multimodal Models. In ACM MM. 7734––7743.
[3]
Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. 2021. Emerging properties in self-supervised vision transformers. In ICCV. 9650–9660.
[4]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In CVPR. 248–255.
[5]
Xibin Dong, Zhiwen Yu, Wenming Cao, Yifan Shi, and Qianli Ma. 2020. A survey on ensemble learning. Frontiers of Computer Science 14 (2020), 241–258.
[6]
Yinpeng Dong, Fangzhou Liao, Tianyu Pang, Hang Su, Jun Zhu, Xiaolin Hu, and Jianguo Li. 2018. Boosting adversarial attacks with momentum. In CVPR. 9185–9193.
[7]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In ICLR.
[8]
Mingxing Duan, Kenli Li, Jiayan Deng, Bin Xiao, and Qi Tian. 2022. A novel multi-sample generation method for adversarial attacks. ACM TOMM 18, 4 (2022), 1–21.
[9]
Yuqian Fu, Yu Xie, Yanwei Fu, and Yu-Gang Jiang. 2023. Styleadv: Meta style adversarial training for cross-domain few-shot learning. In CVPR. 24575–24584.
[10]
Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. 2015. Explaining and harnessing adversarial examples. In ICLR. 1–14.
[11]
Shaozhe Hao, Kai Han, and Kwan-Yee K Wong. 2023. Learning Attention as Disentangler for Compositional Zero-shot Learning. In CVPR. 15315–15324.
[12]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In CVPR. 770–778.
[13]
Yanxu Hu and Andy J Ma. 2022. Adversarial feature augmentation for cross-domain few-shot classification. In ECCV. Springer, 20–37.
[14]
Siteng Huang, Biao Gong, Yutong Feng, Min Zhang, Yiliang Lv, and Donglin Wang. 2024. Troika: Multi-path cross-modal traction for compositional zero-shot learning. In CVPR. 24005–24014.
[15]
Phillip Isola, Joseph J Lim, and Edward H Adelson. 2015. Discovering states and transformations in image collections. In CVPR. 1383–1391.
[16]
Chenyi Jiang and Haofeng Zhang. 2024. Revealing the Proximate Long-Tail Distribution in Compositional Zero-Shot Learning. In AAAI, Vol. 38. 2498–2506.
[17]
Shyamgopal Karthik, Massimiliano Mancini, and Zeynep Akata. 2021. Revisiting visual product for compositional zero-shot learning. In NeurIPS Workshop.
[18]
Shyamgopal Karthik, Massimiliano Mancini, and Zeynep Akata. 2022. KG-SP: Knowledge Guided Simple Primitives for Open World Compositional Zero-Shot Learning. In CVPR. 9336–9345.
[19]
Hanjae Kim, Jiyoung Lee, Seongheon Park, and Kwanghoon Sohn. 2023. Hierarchical Visual Primitive Experts for Compositional Zero-Shot Learning. In ICCV. 5675–5685.
[20]
Diederik P Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In ICLR. 1–14.
[21]
Alexey Kurakin, Ian Goodfellow, and Samy Bengio. 2017. Adversarial machine learning at scale. In ICLR.
[22]
Brenden M Lake. 2014. Towards more human-like concept learning in machines: Compositionality, causality, and learning-to-learn. Ph. D. Dissertation. Massachusetts Institute of Technology.
[23]
Suyi Li, Chenyi Jiang, Qiaolin Ye, Shidong Wang, Wankou Yang, and Haofeng Zhang. 2024. Fusing spatial and frequency features for compositional zero-shot image classification. Expert Systems with Applications (2024), 125230.
[24]
Yong-Lu Li, Yue Xu, Xiaohan Mao, and Cewu Lu. 2020. Symmetry and group in attribute-object compositions. In CVPR. 11316–11325.
[25]
Yanpei Liu, Xinyun Chen, Chang Liu, and Dawn Song. 2017. Delving into transferable adversarial examples and black-box attacks. In ICLR.
[26]
Xiaocheng Lu, Song Guo, Ziming Liu, and Jingcai Guo. 2023. Decomposed soft prompt guided fusion enhancing for compositional zero-shot learning. In CVPR. 23560–23569.
[27]
Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. 2018. Towards deep learning models resistant to adversarial attacks. In ICLR.
[28]
Massimiliano Mancini, Muhammad Ferjad Naeem, Yongqin Xian, and Zeynep Akata. 2021. Open world compositional zero-shot learning. In CVPR. 5222–5230.
[29]
Massimiliano Mancini, Muhammad Ferjad Naeem, Yongqin Xian, and Zeynep Akata. 2022. Learning graph embeddings for open world compositional zero-shot learning. IEEE TPAMI 46, 3 (2022), 1545–1560.
[30]
Ishan Misra, Abhinav Gupta, and Martial Hebert. 2017. From red wine to red tomato: Composition with context. In CVPR. 1792–1801.
[31]
Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Omar Fawzi, and Pascal Frossard. 2017. Universal adversarial perturbations. In CVPR. 1765–1773.
[32]
Muhammad Ferjad Naeem, Yongqin Xian, Federico Tombari, and Zeynep Akata. 2021. Learning graph embeddings for compositional zero-shot learning. In CVPR. 953–962.
[33]
Tushar Nagarajan and Kristen Grauman. 2018. Attributes as operators: factorizing unseen attribute-object compositions. In ECCV. 169–185.
[34]
Vinod Nair and Geoffrey E Hinton. 2010. Rectified linear units improve restricted boltzmann machines. In ICML. 807–814.
[35]
Nihal V. Nayak, Peilin Yu, and Stephen H. Bach. 2023. Learning to Compose Soft Prompts for Compositional Zero-Shot Learning. In ICLR.
[36]
Aditya Panda and Dipti Prasad Mukherjee. 2024. Compositional Zero-Shot Learning using Multi-Branch Graph Convolution and Cross-layer Knowledge Sharing. Pattern Recognition 145 (2024), 109916.
[37]
Jeffrey Pennington, Richard Socher, and Christopher D Manning. 2014. Glove: Global vectors for word representation. In EMNLP. 1532–1543.
[38]
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. In ICML. 8748–8763.
[39]
Omer Sagi and Lior Rokach. 2018. Ensemble learning: A survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 8, 4 (2018), e1249.
[40]
Nirat Saini, Khoi Pham, and Abhinav Shrivastava. 2022. Disentangling Visual Embeddings for Attributes and Objects. In CVPR. 13658–13667.
[41]
Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting. JMLR 15, 1 (2014), 1929–1958.
[42]
Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. 2013. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013).
[43]
Bowen Tang, Zheng Wang, Yi Bin, Qi Dou, Yang Yang, and Heng Tao Shen. 2024. Ensemble Diversity Facilitates Adversarial Transferability. In CVPR. 24377–24386.
[44]
Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Ian Goodfellow, Dan Boneh, and Patrick McDaniel. 2018. Ensemble adversarial training: Attacks and defenses. In ICLR.
[45]
Haoqing Wang and Zhi-Hong Deng. 2021. Cross-domain few-shot classification via adversarial task augmentation. In IJCAI. 1075–1081.
[46]
Hanrui Wang, Shuo Wang, Cunjian Chen, Massimo Tistarelli, and Zhe Jin. [n. d.]. A Multi-task Adversarial Attack Against Face Authentication. ACM TOMM ([n. d.]). https://doi.org/10.1145/3665496
[47]
Qingsheng Wang, Lingqiao Liu, Chenchen Jing, Hao Chen, Guoqiang Liang, Peng Wang, and Chunhua Shen. 2023. Learning Conditional Attributes for Compositional Zero-Shot Learning. In CVPR. 11197–11206.
[48]
Ren Wang, Kaidi Xu, Sijia Liu, Pin-Yu Chen, Tsui-Wei Weng, Chuang Gan, and Meng Wang. 2021. On fast adversarial robustness adaptation in model-agnostic meta-learning. In ICLR.
[49]
Guangyue Xu, Joyce Chai, and Parisa Kordjamshidi. 2024. GIPCOL: Graph-Injected Soft Prompting for Compositional Zero-Shot Learning. In WACV. 5774–5783.
[50]
Yikun Xu, Xingxing Wei, Pengwen Dai, and Xiaochun Cao. 2023. A2SC: Adversarial Attacks on Subspace Clustering. ACM TOMM 19, 6 (2023), 1–23.
[51]
Muli Yang, Cheng Deng, Junchi Yan, Xianglong Liu, and Dacheng Tao. 2020. Learning unseen concepts via hierarchical decomposition and composition. In CVPR. 10248–10256.
[52]
Muli Yang, Chenghao Xu, Aming Wu, and Cheng Deng. 2023. A Decomposable Causal View of Compositional Zero-Shot Learning. IEEE TMM 25 (2023), 5892–5902.
[53]
Aron Yu and Kristen Grauman. 2014. Fine-grained visual comparisons with local learning. In CVPR. 192–199.
[54]
Yawen Zeng, Da Cao, Shaofei Lu, Hanling Zhang, Jiao Xu, and Zheng Qin. 2022. Moment is important: Language-based video moment retrieval via adversarial learning. ACM TOMM 18, 2 (2022), 1–21.
[55]
Tian Zhang, Kongming Liang, Ruoyi Du, Xian Sun, Zhanyu Ma, and Jun Guo. 2022. Learning invariant visual representations for compositional zero-shot learning. In ECCV. 339–355.
[56]
Xiao Zhang, Rui Zhao, Yu Qiao, Xiaogang Wang, and Hongsheng Li. 2019. Adacos: Adaptively scaling cosine logits for effectively learning deep face representations. In CVPR. 10823–10832.
[57]
Zhaoheng Zheng, Haidong Zhu, and Ram Nevatia. 2024. CAILA: Concept-Aware Intra-Layer Adapters for Compositional Zero-Shot Learning. In WACV. 1721–1731.

Index Terms

  1. Contextual Interaction via Primitive-based Adversarial Training for Compositional Zero-shot Learning

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Multimedia Computing, Communications, and Applications
      ACM Transactions on Multimedia Computing, Communications, and Applications Just Accepted
      EISSN:1551-6865
      Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Online AM: 17 January 2025
      Accepted: 11 January 2025
      Revised: 27 October 2024
      Received: 14 June 2024

      Check for updates

      Author Tags

      1. Adversarial Training
      2. Image Classification
      3. Zero-shot learning
      4. Compositional Zero-shot Learning
      5. Data Augmentation

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 44
        Total Downloads
      • Downloads (Last 12 months)44
      • Downloads (Last 6 weeks)25
      Reflects downloads up to 08 Mar 2025

      Other Metrics

      Citations

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Full Access

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media