Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Advertisement

Learning cross-domain representations by vision transformer for unsupervised domain adaptation

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Unsupervised Domain Adaptation (UDA) is a popular machine learning technique to reduce the distribution discrepancy among domains. Generally, most UDA methods utilize a deep Convolutional Neural Networks (CNNs) and a domain discriminator to learn a domain-invariant representation, but it does not equal to a discriminative domain-specific representation. Transformers (TRANS), which has been proved to be more robust to domain shift than CNNs, has gradually become a powerful alternative to CNNs in feature representation. On the other hand, the domain shift between the labeled source data and the unlabeled target data produces a significant amount of label noise, which needs a more robust connection between the source and target domain. This report proposes a simple yet effective UDA method for learning cross-domain representations by vision Transformers in a self-training manner. Unlike the conventional form of dividing an image into multiple non-overlapping patches, we proposed a novel method that aggregates both source domain labeled patches and target domain pseudo-labeled target patches. In addition, a cross-domain alignment loss is proposed to match the centroid of labeled source patches and pseudo-labeled target patches. Extensive experiments show that our proposed method achieves state-of-the-art (SOTA) results on several standard UDA benchmarks (90.5\(\%\) on ImageCLEF-DA, Office-31) by a transformers baseline model without any extra assistant networks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data availability

The data that support the finddings of this study are openly available in the following websites:

ImageCLEF-DA dataset is available in: https://www.imageclef.org/2014/adaptation

Office-31 dataset is available in: https://www.amazon.com (have mentioned in paper)

Office-Home dataset is available in: https://www.hemanthdv.org/officeHomeDataset.html.

Notes

  1. https://www.imageclef.org/2014/adaptation.

  2. https://pytorch.org.

References

  1. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  2. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inform Process Syst 30:5998–6008

    Google Scholar 

  3. Long M, Zhu H, Wang J, Jordan MI (2016) Unsupervised domain adaptation with residual transfer networks. Adv Neural Inform Process Syst 29:136–144

    Google Scholar 

  4. Tzeng E, Hoffman J, Zhang N, Saenko K, Darrell T (2014) Deep domain confusion: maximizing for domain invariance. arXiv:1412.3474

  5. Ganin Y, Ustinova E, Ajakan H, Germain P, Larochelle H, Laviolette F, Marchand M, Lempitsky V (2016) Domain-adversarial training of neural networks. J Mach Learn Res 17(1):2030–2096

    MATH  MathSciNet  Google Scholar 

  6. Xie S, Zheng Z, Chen L, Chen C (2018) Learning semantic representations for unsupervised domain adaptation. In: Proceedings of the 35th international conference on machine learning, pp 5423–5432

  7. Gu X, Sun J, Xu Z (2020) Spherical space domain adaptation with robust pseudo-label loss. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9101–9110

  8. Xiao N, Zhang L, Xu X, Guo T, Ma H (2021) Label disentangled analysis for unsupervised visual domain adaptation. Knowl-Based Syst 229:107309

    Article  Google Scholar 

  9. Wang Y, Nie L, Li Y, Chen S (2020) Soft large margin clustering for unsupervised domain adaptation. Knowl-Based Syst 192:105344

    Article  Google Scholar 

  10. Chen C, Chen Z, Jiang B, Jin X (2019) Joint domain alignment and discriminative feature learning for unsupervised deep domain adaptation. In: Proceedings of the AAAI conference on artificial intelligence, pp 3296–3303

  11. Long M, Zhu H, Wang J, Jordan MI (2017) Deep transfer learning with joint adaptation networks. In: Proceedings of the 34th international conference on machine learning, pp 2208–2217

  12. Yang L, Zhong P (2020) Discriminative and informative joint distribution adaptation for unsupervised domain adaptation. Knowl-Based Syst 207:106394

    Article  Google Scholar 

  13. Wu H, Yan Y, Ye Y, Ng MK, Wu Q (2020) Geometric knowledge embedding for unsupervised domain adaptation. Knowl-Based Syst 191:105155

    Article  Google Scholar 

  14. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. Adv Neural Inform Process Syst 63:139–144

    Google Scholar 

  15. Tzeng E, Hoffman J, Saenko K, Darrell T (2017) Adversarial discriminative domain adaptation. In: IEEE conference on computer vision and pattern recognition, pp 2962–2971

  16. Xu M, Zhang J, Ni B, Li T, Wang C, Tian Q, Zhang W (2020) Adversarial domain adaptation with domain mixup. In: Proceedings of the AAAI conference on artificial intelligence, pp 6502–6509

  17. Kang G, Jiang L, Yang Y, Hauptmann AG (2019) Contrastive adaptation network for unsupervised domain adaptation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4893–4902

  18. Zhang Q, Zhang J, Liu W, Tao D (2019) Category anchor-guided unsupervised domain adaptation for semantic segmentation. Adv Neural Inform Process Syst 32:435–445

    Google Scholar 

  19. Zhang Y, Jing C, Lin H, Chen C, Huang Y, Ding X, Zou Y (2021) Hard class rectification for domain adaptation. Knowl-Based Syst 222:107011

    Article  Google Scholar 

  20. Saito K, Ushiku Y, Harada T (2017) Asymmetric tri-training for unsupervised domain adaptation. In: Proceedings of the international conference on machine learning, pp 2988–2997

  21. Zou Y, Yu Z, Kumar B, Wang J (2018) Unsupervised domain adaptation for semantic segmentation via class-balanced self-training. In: Proceedings of the European conference on computer vision, pp 289–305

  22. Zou Y, Yu Z, Liu X, Kumar B, Wang J (2019) Confidence regularized self-training. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 5982–5991

  23. Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S et al (2021) An image is worth 16 × 16 words: transformers for image recognition at scale. In: Proceedings of the 9th international conference on learning representations

  24. Touvron H, Cord M, Douze M, Massa F, Sablayrolles A, Jégou H (2021) Training data-efficient image transformers & distillation through attention. In: Proceedings of the 38th international conference on machine learning, pp 10347–10357

  25. Naseer M, Ranasinghe K, Khan S, Hayat M, Khan FS, Yang M-H (2021) Intriguing properties of vision transformers, arXiv preprint arXiv:2105.10497

  26. Benz P, Ham S, Zhang C, Karjauv A, Kweon IS (2021) Adversarial robustness comparison of vision transformer and mlp-mixer to CNNS, arXiv preprint arXiv:2110.02797

  27. Zhang B, Wang Y, Hou W, Wu H, Wang J, Okumura M, Shinozaki T (2021) Flexmatch: boosting semi-supervised learning with curriculum pseudo labeling. Adv Neural Inform Process Syst 34:18408–18419

    Google Scholar 

  28. Khan S, Naseer M, Hayat M, Zamir SW, Khan FS, Shah M (2021) Transformers in vision: a survey. ACM Comput Surv (CSUR)

  29. Yuan L, Chen Y, Wang T, Yu W, Shi Y, Jiang Z-H, Tay FE, Feng J, Yan S (2021) Tokens-to-token vit: training vision transformers from scratch on imagenet. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 558–567

  30. Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 10012–10022

  31. Chen C-FR, Fan Q, Panda R (2021) Crossvit: cross-attention multi-scale vision transformer for image classification. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 357–366

  32. Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: Proceedings of the European conference on computer vision, pp 213–229

  33. Zheng S, Lu J, Zhao H, Zhu X, Luo Z, Wang Y, Fu Y, Feng J, Xiang T, Torr PH et al (2021) Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6881–6890

  34. He S, Luo H, Wang P, Wang F, Li H, Jiang W (2021) Transreid: transformer-based object re-identification. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 15013–15022

  35. El-Nouby A, Neverova N, Laptev I, Jégou H (2021) Training vision transformers for image retrieval, arXiv preprint arXiv:2102.05644

  36. Huang L, Tan J, Liu J, Yuan J (2020) Hand-transformer: non-autoregressive structured modeling for 3D hand pose estimation. In: Proceedings of the European conference on computer vision, pp 17–33

  37. Jiang Y, Chang S, Wang Z (2021) Transgan: two pure transformers can make one strong gan, and that can scale up. Adv Neural Inform Process Syst 34:14745–14758

    Google Scholar 

  38. d’Ascoli S, Touvron H, Leavitt M, Morcos A, Biroli G, Sagun L (2021) Convit: improving vision transformers with soft convolutional inductive biases, arXiv preprint arXiv:2103.10697

  39. Caron M, Touvron H, Misra I, Jégou H, Mairal J, Bojanowski P, Joulin A (2021) Emerging properties in self-supervised vision transformers. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 9650–9660

  40. Amini M-R, Gallinari P (2002) Semi-supervised logistic regression. In: Proceedings of the 15th European conference on artificial intelligence, pp 390–394

  41. Grandvalet Y, Bengio Y (2004) Semi-supervised learning by entropy minimization. Adv Neural Inform Process Syst 17:529–536

    Google Scholar 

  42. McLachlan GJ (1975) Iterative reclassification procedure for constructing an asymptotically optimal rule of allocation in discriminant analysis. J Am Statist Assoc 70:365–369

    Article  MATH  MathSciNet  Google Scholar 

  43. Rosenberg C, Hebert M, Schneiderman H (2005) Semi-supervised self-training of object detection models. In: Proceedings of the seventh IEEE workshops on application of computer vision, pp 29–36

  44. Lee D-H et al (2013) Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In: Workshop on challenges in representation learning, ICML, p 896

  45. Berthelot D, Carlini N, Goodfellow I, Papernot N, Oliver A, Raffel CA (2019) Mixmatch: a holistic approach to semi-supervised learning. Adv Neural Inform Process Syst 32:5049–5059

    Google Scholar 

  46. Sohn K, Berthelot D, Carlini N, Zhang Z, Zhang H, Raffel CA, Cubuk ED, Kurakin A, Li C-L (2020) Fixmatch: simplifying semi-supervised learning with consistency and confidence. Adv Neural Inform Process Syst 33:596–608

    Google Scholar 

  47. Tzeng E, Hoffman J, Zhang N, Saenko K, Darrell T (2014) Deep domain confusion: maximizing for domain invariance, arXiv preprint arXiv:1412.3474

  48. Long M, Cao Y, Wang J, Jordan M (2015) Learning transferable features with deep adaptation networks. In: Proceedings of the international conference on machine learning, pp 97–105

  49. Zellinger W, Grubinger T, Lughofer E, Natschläger T, Saminger-Platz S (2017) Central moment discrepancy (cmd) for domain-invariant representation learning. In: Proceedings of the 5th international conference on learning representations

  50. Li J, Li Z, Lü S (2021) Unsupervised double weighted domain adaptation. Neural Comput Appl 33(8):3545–3566

    Article  Google Scholar 

  51. Long M, Cao Z, Wang J, Jordan MI (2018) Conditional adversarial domain adaptation. Adv Neural Inform Process Syst 31:1647–1657

    Google Scholar 

  52. Zhou Q, Wang S, Xing Y et al (2021) Multiple adversarial networks for unsupervised domain adaptation. Knowl-Based Syst 212:106606

    Article  Google Scholar 

  53. Zhang W, Ouyang W, Li W, Xu D (2018) Collaborative and adversarial network for unsupervised domain adaptation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3801–3809

  54. Chen C, Xie W, Huang W, Rong Y, Ding X, Huang Y, Xu T, Huang J (2019) Progressive feature alignment for unsupervised domain adaptation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 627–636

  55. Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 248–255

  56. Na J, Jung H, Chang HJ, Hwang W (2021) Fixbi: bridging domain spaces for unsupervised domain adaptation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1094–1103

  57. Saenko K, Kulis B, Fritz M, Darrell T (2010) Adapting visual category models to new domains. In: Proceedings of the European conference on computer vision, pp 213–226

  58. Venkateswara H, Eusebio J, Chakraborty S, Panchanathan S (2017) Deep hashing network for unsupervised domain adaptation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5018–5027

  59. Wu Y, Inkpen D, El-Roby A (2020) Dual mixup regularized learning for adversarial domain adaptation. In: Proceedings of the European conference on computer vision, pp 540–555

  60. You F, Su H, Li J, Zhu L, Lu K, Yang Y (2021) Learning a weighted classifier for conditional domain adaptation. Knowl-Based Syst 215:106774

    Article  Google Scholar 

  61. Xu R, Li G, Yang J, Lin L (2019) Larger norm more transferable: an adaptive feature norm approach for unsupervised domain adaptation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 1426–1435

  62. Du Z, Li J, Su H, Zhu L, Lu K (2021) Cross-domain gradient discrepancy minimization for unsupervised domain adaptation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3937–3946

  63. Xiao N, Zhang L (2021) Dynamic weighted learning for unsupervised domain adaptation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 15242–15251

  64. Chang W-G, You T, Seo S, Kwak S, Han B (2019) Domain-specific batch normalization for unsupervised domain adaptation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7354–7362

  65. Zhang Y, Tang H, Jia K, Tan M (2019) Domain-symmetric networks for adversarial domain adaptation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5031–5040

  66. Sharma A, Kalluri T, Chandraker M (2021) Instance level affinity-based transfer for unsupervised domain adaptation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5361–5371

  67. Hu L, Kan M, Shan S, Chen X (2020) Unsupervised domain adaptation with hierarchical gradient synchronization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4043–4052

  68. Li J, Chen E, Ding Z, Zhu L, Lu K, Shen HT (2020) Maximum density divergence for domain adaptation. IEEE Trans Pattern Anal Mach Intell 43(11):3918–3930

    Article  Google Scholar 

  69. Liang J, Hu D, Feng J (2020) Do we really need to access the source data? source hypothesis transfer for unsupervised domain adaptation. In: Proceedings of the international conference on machine learning, pp 6028–6039

  70. Animasaun IL, Shah NA, Wakif A, Mahanthesh B, Sivaraj R, Koriko OK (2022) Ratio of momentum diffusivity to thermal diffusivity: introduction, meta-analysis, and scrutinization. CRC Press, Cambridge

    Book  Google Scholar 

  71. Cao W, Animasaun I, Yook S-J, Oladipupo V, Ji X (2022) Simulation of the dynamics of colloidal mixture of water with various nanoparticles at different levels of partial slip: Ternary-hybrid nanofluid. Int Commun Heat Mass Transfer 135:106069

    Article  Google Scholar 

  72. Van Der Maaten L, Hinton G (2008) Visualizing data using t-sne. J Mach Learn Res 9:2579–2625

    MATH  Google Scholar 

  73. Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE international conference on computer vision, pp 618–626

  74. Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jing Chen.

Ethics declarations

Conflict of interest

All authors disclosed no relevant relationships.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Yifan Ye and Shuai Fu have the same contribution to this article.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ye, Y., Fu, S. & Chen, J. Learning cross-domain representations by vision transformer for unsupervised domain adaptation. Neural Comput & Applic 35, 10847–10860 (2023). https://doi.org/10.1007/s00521-023-08269-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-023-08269-7

Keywords