Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Learning Common and Specific Visual Prompts for Domain Generalization

  • Conference paper
  • First Online:
Computer Vision – ACCV 2022 (ACCV 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13846))

Included in the following conference series:

Abstract

Although fine-tuning a pre-trained large-scale model has become an effective method for domain generalization, domain shifts still issue a huge challenge for successfully transferring models to unseen test domains. In this paper, we study how to effectively adapt pre-trained vision Transformers for domain generalization problems in image classification. To this end, this paper proposes a novel Common-Specific Visual Prompt Tuning (CSVPT) method to transfer large-scale vision Transformer models to unknown test domains. Different from existing methods which learn fixed visual prompts for each task, CSVPT jointly learns domain-common prompts to capture the task context and sample-specific prompts to capture information about data distribution, which are generated for each sample through a trainable prompt-generating module (PGM). Combining the domain-common prompts and the sample-specific prompts, visual prompts learned by CSVPT are conditioned on each input sample rather than fixed once learned, which helps out-of-distribution generalization. Extensive experimental results show the effectiveness of CSVPT, and CSVPT with the backbone ViT-L/14 achieves state-of-the-art (SOTA) performance on five widely used benchmark datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Voulodimos, A., Doulamis, N., Doulamis, A., Protopapadakis, E.: Deep learning for computer vision: a brief review. Comput. Intell. Neurosci. 2018, 7068349 (2018)

    Google Scholar 

  2. Lopez, M.M., Kalita, J.: Deep learning applied to NLP. arXiv preprint arXiv:1703.03091 (2017)

  3. Zhang, Z., Geiger, J., Pohjalainen, J., Mousa, A.E.-D., Jin, W., Schuller, B.: Deep learning for environmentally robust speech recognition: an overview of recent developments. ACM Trans. Intell. Syst. Technol. (TIST) 9(5), 1–28 (2018)

    Article  Google Scholar 

  4. Kamath, U., Liu, J., Whitaker, J.: Deep Learning for NLP and Speech Recognition. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-14596-5

    Book  Google Scholar 

  5. Shen, Z., et al.: Towards out-of-distribution generalization: a survey. arXiv preprint arXiv:2108.13624 (2021)

  6. Wang, J., et al.: Generalizing to unseen domains: a survey on domain generalization. In: IEEE Transactions on Knowledge and Data Engineering (2022)

    Google Scholar 

  7. Zhou, K., Liu, Z., Qiao, Y., Xiang, T., Loy, C.C.: Domain generalization in vision: a survey. arXiv preprint arXiv:2103.02503 (2021)

  8. Ganin, Y., et al.: Domain-adversarial training of neural networks. J. Mach. Learn. Res. 17(1), 2030–2096 (2016)

    MathSciNet  Google Scholar 

  9. Li, H., Pan, S.J., Wang, S., Kot, A.C.: Domain generalization with adversarial feature learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5400–5409 (2018)

    Google Scholar 

  10. Li, Y., et al.: Deep domain generalization via conditional invariant adversarial networks. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11219, pp. 647–663. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01267-0_38

    Chapter  Google Scholar 

  11. Sun, B., Saenko, K.: Deep CORAL: correlation alignment for deep domain adaptation. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9915, pp. 443–450. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49409-8_35

    Chapter  Google Scholar 

  12. Arjovsky, M., Bottou, L., Gulrajani, I., Lopez-Paz, D.: Invariant risk minimization. arXiv preprint arXiv:1907.02893 (2019)

  13. Khosla, A., Zhou, T., Malisiewicz, T., Efros, A.A., Torralba, A.: Undoing the damage of dataset bias. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7572, pp. 158–171. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33718-5_12

    Chapter  Google Scholar 

  14. Piratla, V., Netrapalli, P., Sarawagi, S.: Efficient domain generalization via common-specific low-rank decomposition. In: International Conference on Machine Learning, pp. 7728–7738. PMLR (2020)

    Google Scholar 

  15. Gulrajani, I., Lopez-Paz, D.: In search of lost domain generalization. arXiv preprint arXiv:2007.01434 (2020)

  16. Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  17. Floridi, L., Chiriatti, M.: GPT-3: Its nature, scope, limits, and consequences. Mind. Mach. 30(4), 681–694 (2020)

    Article  Google Scholar 

  18. Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)

  19. Zhang, C., et al.: Delving deep into the generalization of vision transformers under distribution shifts. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7277–7286 (2022)

    Google Scholar 

  20. Li, Z., Ren, K., Jiang, X., Li, B., Zhang, H., Li, D.: Domain generalization using pretrained models without fine-tuning. arXiv preprint arXiv:2203.04600 (2022)

  21. Cha, J., Lee, K., Park, S., Chun, S.: Domain generalization by mutual-information regularization with pre-trained models. arXiv preprint arXiv:2203.10789 (2022)

  22. Zhang, X., Iwasawa, Y., Matsuo, Y., Gu, S.S.: Amortized prompt: guide clip to domain transfer learning. arXiv preprint arXiv:2111.12853 (2021)

  23. Hendrycks, D., Liu, X., Wallace, E., Dziedzic, A., Krishnan, R., Song, D.: Pretrained transformers improve out-of-distribution robustness. arXiv preprint arXiv:2004.06100 (2020)

  24. Kumar, A., Raghunathan, A., Jones, R., Ma, T., Liang, P.: Fine-tuning can distort pretrained features and underperform out-of-distribution. arXiv preprint arXiv:2202.10054 (2022)

  25. Jia, M., et al.: Visual prompt tuning. arXiv preprint arXiv:2203.12119 (2022)

  26. Zhou, K., Yang, J., Loy, C.C., Liu, Z.: Learning to prompt for vision-language models. arXiv preprint arXiv:2109.01134 (2021)

  27. Li, D., Yang, Y., Song, Y.-Z., Hospedales, T.M.: Deeper, broader and artier domain generalization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5542–5550 (2017)

    Google Scholar 

  28. Fang, C., Xu, Y., Rockmore, D.N.: Unbiased metric learning: on the utilization of multiple datasets and web images for softening bias. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1657–1664 (2013)

    Google Scholar 

  29. Venkateswara, H., Eusebio, J., Chakraborty, S., Panchanathan, S.: Deep hashing network for unsupervised domain adaptation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5018–5027 (2017)

    Google Scholar 

  30. Beery, S., Van Horn, G., Perona, P.: Recognition in terra incognita. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11220, pp. 472–489. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01270-0_28

    Chapter  Google Scholar 

  31. Peng, X., Bai, Q., Xia, X., Huang, Z., Saenko, K., Wang, B.: Moment matching for multi-source domain adaptation. In: Proceedings of the IEEE/CVF International Conference On Computer Vision, pp. 1406–1415 (2019)

    Google Scholar 

  32. Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, pp. 8748–8763. PMLR (2021)

    Google Scholar 

  33. Bai, H., et al.: DecAug: out-of-distribution generalization via decomposed feature representation and semantic augmentation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 6705–6713 (2021)

    Google Scholar 

  34. Parascandolo, G., Neitz, A., Orvieto, A., Gresele, L., Schölkopf, B.: Learning explanations that are hard to vary. arXiv preprint arXiv:2009.00329 (2020)

  35. Wang, M., Deng, W.: Deep visual domain adaptation: a survey. Neurocomputing 312, 135–153 (2018)

    Article  Google Scholar 

  36. He, J., Zhou, C., Ma, X., Berg-Kirkpatrick, T., Neubig, G.: Towards a unified view of parameter-efficient transfer learning. arXiv preprint arXiv:2110.04366 (2021)

  37. Abnar, S., Zuidema, W.: Quantifying attention flow in transformers. arXiv preprint arXiv:2005.00928 (2020)

Download references

Acknowledgements

This work was supported in part to Dr. Liansheng Zhuang by NSFC under contract No.U20B2070 and No.61976199.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Liansheng Zhuang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, A., Zhuang, L., Fan, S., Wang, S. (2023). Learning Common and Specific Visual Prompts for Domain Generalization. In: Wang, L., Gall, J., Chin, TJ., Sato, I., Chellappa, R. (eds) Computer Vision – ACCV 2022. ACCV 2022. Lecture Notes in Computer Science, vol 13846. Springer, Cham. https://doi.org/10.1007/978-3-031-26351-4_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-26351-4_35

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-26350-7

  • Online ISBN: 978-3-031-26351-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics