Learning Common and Specific Visual Prompts for Domain Generalization

Li, Aodi; Zhuang, Liansheng; Fan, Shuo; Wang, Shafei

doi:10.1007/978-3-031-26351-4_35

Aodi Li¹²,
Liansheng Zhuang¹²,
Shuo Fan¹² &
…
Shafei Wang¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13846))

Included in the following conference series:

Asian Conference on Computer Vision

503 Accesses

Abstract

Although fine-tuning a pre-trained large-scale model has become an effective method for domain generalization, domain shifts still issue a huge challenge for successfully transferring models to unseen test domains. In this paper, we study how to effectively adapt pre-trained vision Transformers for domain generalization problems in image classification. To this end, this paper proposes a novel Common-Specific Visual Prompt Tuning (CSVPT) method to transfer large-scale vision Transformer models to unknown test domains. Different from existing methods which learn fixed visual prompts for each task, CSVPT jointly learns domain-common prompts to capture the task context and sample-specific prompts to capture information about data distribution, which are generated for each sample through a trainable prompt-generating module (PGM). Combining the domain-common prompts and the sample-specific prompts, visual prompts learned by CSVPT are conditioned on each input sample rather than fixed once learned, which helps out-of-distribution generalization. Extensive experimental results show the effectiveness of CSVPT, and CSVPT with the backbone ViT-L/14 achieves state-of-the-art (SOTA) performance on five widely used benchmark datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

X-Learner: Learning Cross Sources and Tasks for Universal Visual Representation

Visual domain adaptation via transfer feature learning

Article 07 May 2016

Learning to Prompt for Vision-Language Models

Article 31 July 2022

References

Voulodimos, A., Doulamis, N., Doulamis, A., Protopapadakis, E.: Deep learning for computer vision: a brief review. Comput. Intell. Neurosci. 2018, 7068349 (2018)
Google Scholar
Lopez, M.M., Kalita, J.: Deep learning applied to NLP. arXiv preprint arXiv:1703.03091 (2017)
Zhang, Z., Geiger, J., Pohjalainen, J., Mousa, A.E.-D., Jin, W., Schuller, B.: Deep learning for environmentally robust speech recognition: an overview of recent developments. ACM Trans. Intell. Syst. Technol. (TIST) 9(5), 1–28 (2018)
Article Google Scholar
Kamath, U., Liu, J., Whitaker, J.: Deep Learning for NLP and Speech Recognition. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-14596-5
Book Google Scholar
Shen, Z., et al.: Towards out-of-distribution generalization: a survey. arXiv preprint arXiv:2108.13624 (2021)
Wang, J., et al.: Generalizing to unseen domains: a survey on domain generalization. In: IEEE Transactions on Knowledge and Data Engineering (2022)
Google Scholar
Zhou, K., Liu, Z., Qiao, Y., Xiang, T., Loy, C.C.: Domain generalization in vision: a survey. arXiv preprint arXiv:2103.02503 (2021)
Ganin, Y., et al.: Domain-adversarial training of neural networks. J. Mach. Learn. Res. 17(1), 2030–2096 (2016)
MathSciNet Google Scholar
Li, H., Pan, S.J., Wang, S., Kot, A.C.: Domain generalization with adversarial feature learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5400–5409 (2018)
Google Scholar
Li, Y., et al.: Deep domain generalization via conditional invariant adversarial networks. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11219, pp. 647–663. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01267-0_38
Chapter Google Scholar
Sun, B., Saenko, K.: Deep CORAL: correlation alignment for deep domain adaptation. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9915, pp. 443–450. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49409-8_35
Chapter Google Scholar
Arjovsky, M., Bottou, L., Gulrajani, I., Lopez-Paz, D.: Invariant risk minimization. arXiv preprint arXiv:1907.02893 (2019)
Khosla, A., Zhou, T., Malisiewicz, T., Efros, A.A., Torralba, A.: Undoing the damage of dataset bias. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7572, pp. 158–171. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33718-5_12
Chapter Google Scholar
Piratla, V., Netrapalli, P., Sarawagi, S.: Efficient domain generalization via common-specific low-rank decomposition. In: International Conference on Machine Learning, pp. 7728–7738. PMLR (2020)
Google Scholar
Gulrajani, I., Lopez-Paz, D.: In search of lost domain generalization. arXiv preprint arXiv:2007.01434 (2020)
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Floridi, L., Chiriatti, M.: GPT-3: Its nature, scope, limits, and consequences. Mind. Mach. 30(4), 681–694 (2020)
Article Google Scholar
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Zhang, C., et al.: Delving deep into the generalization of vision transformers under distribution shifts. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7277–7286 (2022)
Google Scholar
Li, Z., Ren, K., Jiang, X., Li, B., Zhang, H., Li, D.: Domain generalization using pretrained models without fine-tuning. arXiv preprint arXiv:2203.04600 (2022)
Cha, J., Lee, K., Park, S., Chun, S.: Domain generalization by mutual-information regularization with pre-trained models. arXiv preprint arXiv:2203.10789 (2022)
Zhang, X., Iwasawa, Y., Matsuo, Y., Gu, S.S.: Amortized prompt: guide clip to domain transfer learning. arXiv preprint arXiv:2111.12853 (2021)
Hendrycks, D., Liu, X., Wallace, E., Dziedzic, A., Krishnan, R., Song, D.: Pretrained transformers improve out-of-distribution robustness. arXiv preprint arXiv:2004.06100 (2020)
Kumar, A., Raghunathan, A., Jones, R., Ma, T., Liang, P.: Fine-tuning can distort pretrained features and underperform out-of-distribution. arXiv preprint arXiv:2202.10054 (2022)
Jia, M., et al.: Visual prompt tuning. arXiv preprint arXiv:2203.12119 (2022)
Zhou, K., Yang, J., Loy, C.C., Liu, Z.: Learning to prompt for vision-language models. arXiv preprint arXiv:2109.01134 (2021)
Li, D., Yang, Y., Song, Y.-Z., Hospedales, T.M.: Deeper, broader and artier domain generalization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5542–5550 (2017)
Google Scholar
Fang, C., Xu, Y., Rockmore, D.N.: Unbiased metric learning: on the utilization of multiple datasets and web images for softening bias. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1657–1664 (2013)
Google Scholar
Venkateswara, H., Eusebio, J., Chakraborty, S., Panchanathan, S.: Deep hashing network for unsupervised domain adaptation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5018–5027 (2017)
Google Scholar
Beery, S., Van Horn, G., Perona, P.: Recognition in terra incognita. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11220, pp. 472–489. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01270-0_28
Chapter Google Scholar
Peng, X., Bai, Q., Xia, X., Huang, Z., Saenko, K., Wang, B.: Moment matching for multi-source domain adaptation. In: Proceedings of the IEEE/CVF International Conference On Computer Vision, pp. 1406–1415 (2019)
Google Scholar
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, pp. 8748–8763. PMLR (2021)
Google Scholar
Bai, H., et al.: DecAug: out-of-distribution generalization via decomposed feature representation and semantic augmentation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 6705–6713 (2021)
Google Scholar
Parascandolo, G., Neitz, A., Orvieto, A., Gresele, L., Schölkopf, B.: Learning explanations that are hard to vary. arXiv preprint arXiv:2009.00329 (2020)
Wang, M., Deng, W.: Deep visual domain adaptation: a survey. Neurocomputing 312, 135–153 (2018)
Article Google Scholar
He, J., Zhou, C., Ma, X., Berg-Kirkpatrick, T., Neubig, G.: Towards a unified view of parameter-efficient transfer learning. arXiv preprint arXiv:2110.04366 (2021)
Abnar, S., Zuidema, W.: Quantifying attention flow in transformers. arXiv preprint arXiv:2005.00928 (2020)

Download references

Acknowledgements

This work was supported in part to Dr. Liansheng Zhuang by NSFC under contract No.U20B2070 and No.61976199.

Author information

Authors and Affiliations

University of Science and Technology of China, Hefei, 230026, China
Aodi Li, Liansheng Zhuang & Shuo Fan
Peng Cheng Laboratory, Shenzhen, 518000, China
Shafei Wang

Authors

Aodi Li
View author publications
You can also search for this author in PubMed Google Scholar
Liansheng Zhuang
View author publications
You can also search for this author in PubMed Google Scholar
Shuo Fan
View author publications
You can also search for this author in PubMed Google Scholar
Shafei Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Liansheng Zhuang .

Editor information

Editors and Affiliations

University of Wollongong, Wollongong, NSW, Australia
Lei Wang
University of Bonn, Bonn, Germany
Juergen Gall
University of Adelaide, Adelaide, SA, Australia
Tat-Jun Chin
National Institute of Informatics, Tokyo, Japan
Imari Sato
Johns Hopkins University, Baltimore, MD, USA
Rama Chellappa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, A., Zhuang, L., Fan, S., Wang, S. (2023). Learning Common and Specific Visual Prompts for Domain Generalization. In: Wang, L., Gall, J., Chin, TJ., Sato, I., Chellappa, R. (eds) Computer Vision – ACCV 2022. ACCV 2022. Lecture Notes in Computer Science, vol 13846. Springer, Cham. https://doi.org/10.1007/978-3-031-26351-4_35

Download citation

DOI: https://doi.org/10.1007/978-3-031-26351-4_35
Published: 26 February 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-26350-7
Online ISBN: 978-3-031-26351-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Learning Common and Specific Visual Prompts for Domain Generalization