Abstract
Unsupervised Domain Adaptation (UDA) is a popular machine learning technique to reduce the distribution discrepancy among domains. Generally, most UDA methods utilize a deep Convolutional Neural Networks (CNNs) and a domain discriminator to learn a domain-invariant representation, but it does not equal to a discriminative domain-specific representation. Transformers (TRANS), which has been proved to be more robust to domain shift than CNNs, has gradually become a powerful alternative to CNNs in feature representation. On the other hand, the domain shift between the labeled source data and the unlabeled target data produces a significant amount of label noise, which needs a more robust connection between the source and target domain. This report proposes a simple yet effective UDA method for learning cross-domain representations by vision Transformers in a self-training manner. Unlike the conventional form of dividing an image into multiple non-overlapping patches, we proposed a novel method that aggregates both source domain labeled patches and target domain pseudo-labeled target patches. In addition, a cross-domain alignment loss is proposed to match the centroid of labeled source patches and pseudo-labeled target patches. Extensive experiments show that our proposed method achieves state-of-the-art (SOTA) results on several standard UDA benchmarks (90.5\(\%\) on ImageCLEF-DA, Office-31) by a transformers baseline model without any extra assistant networks.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
The data that support the finddings of this study are openly available in the following websites:
ImageCLEF-DA dataset is available in: https://www.imageclef.org/2014/adaptation
Office-31 dataset is available in: https://www.amazon.com (have mentioned in paper)
Office-Home dataset is available in: https://www.hemanthdv.org/officeHomeDataset.html.
References
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inform Process Syst 30:5998–6008
Long M, Zhu H, Wang J, Jordan MI (2016) Unsupervised domain adaptation with residual transfer networks. Adv Neural Inform Process Syst 29:136–144
Tzeng E, Hoffman J, Zhang N, Saenko K, Darrell T (2014) Deep domain confusion: maximizing for domain invariance. arXiv:1412.3474
Ganin Y, Ustinova E, Ajakan H, Germain P, Larochelle H, Laviolette F, Marchand M, Lempitsky V (2016) Domain-adversarial training of neural networks. J Mach Learn Res 17(1):2030–2096
Xie S, Zheng Z, Chen L, Chen C (2018) Learning semantic representations for unsupervised domain adaptation. In: Proceedings of the 35th international conference on machine learning, pp 5423–5432
Gu X, Sun J, Xu Z (2020) Spherical space domain adaptation with robust pseudo-label loss. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9101–9110
Xiao N, Zhang L, Xu X, Guo T, Ma H (2021) Label disentangled analysis for unsupervised visual domain adaptation. Knowl-Based Syst 229:107309
Wang Y, Nie L, Li Y, Chen S (2020) Soft large margin clustering for unsupervised domain adaptation. Knowl-Based Syst 192:105344
Chen C, Chen Z, Jiang B, Jin X (2019) Joint domain alignment and discriminative feature learning for unsupervised deep domain adaptation. In: Proceedings of the AAAI conference on artificial intelligence, pp 3296–3303
Long M, Zhu H, Wang J, Jordan MI (2017) Deep transfer learning with joint adaptation networks. In: Proceedings of the 34th international conference on machine learning, pp 2208–2217
Yang L, Zhong P (2020) Discriminative and informative joint distribution adaptation for unsupervised domain adaptation. Knowl-Based Syst 207:106394
Wu H, Yan Y, Ye Y, Ng MK, Wu Q (2020) Geometric knowledge embedding for unsupervised domain adaptation. Knowl-Based Syst 191:105155
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. Adv Neural Inform Process Syst 63:139–144
Tzeng E, Hoffman J, Saenko K, Darrell T (2017) Adversarial discriminative domain adaptation. In: IEEE conference on computer vision and pattern recognition, pp 2962–2971
Xu M, Zhang J, Ni B, Li T, Wang C, Tian Q, Zhang W (2020) Adversarial domain adaptation with domain mixup. In: Proceedings of the AAAI conference on artificial intelligence, pp 6502–6509
Kang G, Jiang L, Yang Y, Hauptmann AG (2019) Contrastive adaptation network for unsupervised domain adaptation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4893–4902
Zhang Q, Zhang J, Liu W, Tao D (2019) Category anchor-guided unsupervised domain adaptation for semantic segmentation. Adv Neural Inform Process Syst 32:435–445
Zhang Y, Jing C, Lin H, Chen C, Huang Y, Ding X, Zou Y (2021) Hard class rectification for domain adaptation. Knowl-Based Syst 222:107011
Saito K, Ushiku Y, Harada T (2017) Asymmetric tri-training for unsupervised domain adaptation. In: Proceedings of the international conference on machine learning, pp 2988–2997
Zou Y, Yu Z, Kumar B, Wang J (2018) Unsupervised domain adaptation for semantic segmentation via class-balanced self-training. In: Proceedings of the European conference on computer vision, pp 289–305
Zou Y, Yu Z, Liu X, Kumar B, Wang J (2019) Confidence regularized self-training. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 5982–5991
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S et al (2021) An image is worth 16 × 16 words: transformers for image recognition at scale. In: Proceedings of the 9th international conference on learning representations
Touvron H, Cord M, Douze M, Massa F, Sablayrolles A, Jégou H (2021) Training data-efficient image transformers & distillation through attention. In: Proceedings of the 38th international conference on machine learning, pp 10347–10357
Naseer M, Ranasinghe K, Khan S, Hayat M, Khan FS, Yang M-H (2021) Intriguing properties of vision transformers, arXiv preprint arXiv:2105.10497
Benz P, Ham S, Zhang C, Karjauv A, Kweon IS (2021) Adversarial robustness comparison of vision transformer and mlp-mixer to CNNS, arXiv preprint arXiv:2110.02797
Zhang B, Wang Y, Hou W, Wu H, Wang J, Okumura M, Shinozaki T (2021) Flexmatch: boosting semi-supervised learning with curriculum pseudo labeling. Adv Neural Inform Process Syst 34:18408–18419
Khan S, Naseer M, Hayat M, Zamir SW, Khan FS, Shah M (2021) Transformers in vision: a survey. ACM Comput Surv (CSUR)
Yuan L, Chen Y, Wang T, Yu W, Shi Y, Jiang Z-H, Tay FE, Feng J, Yan S (2021) Tokens-to-token vit: training vision transformers from scratch on imagenet. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 558–567
Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 10012–10022
Chen C-FR, Fan Q, Panda R (2021) Crossvit: cross-attention multi-scale vision transformer for image classification. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 357–366
Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: Proceedings of the European conference on computer vision, pp 213–229
Zheng S, Lu J, Zhao H, Zhu X, Luo Z, Wang Y, Fu Y, Feng J, Xiang T, Torr PH et al (2021) Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6881–6890
He S, Luo H, Wang P, Wang F, Li H, Jiang W (2021) Transreid: transformer-based object re-identification. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 15013–15022
El-Nouby A, Neverova N, Laptev I, Jégou H (2021) Training vision transformers for image retrieval, arXiv preprint arXiv:2102.05644
Huang L, Tan J, Liu J, Yuan J (2020) Hand-transformer: non-autoregressive structured modeling for 3D hand pose estimation. In: Proceedings of the European conference on computer vision, pp 17–33
Jiang Y, Chang S, Wang Z (2021) Transgan: two pure transformers can make one strong gan, and that can scale up. Adv Neural Inform Process Syst 34:14745–14758
d’Ascoli S, Touvron H, Leavitt M, Morcos A, Biroli G, Sagun L (2021) Convit: improving vision transformers with soft convolutional inductive biases, arXiv preprint arXiv:2103.10697
Caron M, Touvron H, Misra I, Jégou H, Mairal J, Bojanowski P, Joulin A (2021) Emerging properties in self-supervised vision transformers. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 9650–9660
Amini M-R, Gallinari P (2002) Semi-supervised logistic regression. In: Proceedings of the 15th European conference on artificial intelligence, pp 390–394
Grandvalet Y, Bengio Y (2004) Semi-supervised learning by entropy minimization. Adv Neural Inform Process Syst 17:529–536
McLachlan GJ (1975) Iterative reclassification procedure for constructing an asymptotically optimal rule of allocation in discriminant analysis. J Am Statist Assoc 70:365–369
Rosenberg C, Hebert M, Schneiderman H (2005) Semi-supervised self-training of object detection models. In: Proceedings of the seventh IEEE workshops on application of computer vision, pp 29–36
Lee D-H et al (2013) Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In: Workshop on challenges in representation learning, ICML, p 896
Berthelot D, Carlini N, Goodfellow I, Papernot N, Oliver A, Raffel CA (2019) Mixmatch: a holistic approach to semi-supervised learning. Adv Neural Inform Process Syst 32:5049–5059
Sohn K, Berthelot D, Carlini N, Zhang Z, Zhang H, Raffel CA, Cubuk ED, Kurakin A, Li C-L (2020) Fixmatch: simplifying semi-supervised learning with consistency and confidence. Adv Neural Inform Process Syst 33:596–608
Tzeng E, Hoffman J, Zhang N, Saenko K, Darrell T (2014) Deep domain confusion: maximizing for domain invariance, arXiv preprint arXiv:1412.3474
Long M, Cao Y, Wang J, Jordan M (2015) Learning transferable features with deep adaptation networks. In: Proceedings of the international conference on machine learning, pp 97–105
Zellinger W, Grubinger T, Lughofer E, Natschläger T, Saminger-Platz S (2017) Central moment discrepancy (cmd) for domain-invariant representation learning. In: Proceedings of the 5th international conference on learning representations
Li J, Li Z, Lü S (2021) Unsupervised double weighted domain adaptation. Neural Comput Appl 33(8):3545–3566
Long M, Cao Z, Wang J, Jordan MI (2018) Conditional adversarial domain adaptation. Adv Neural Inform Process Syst 31:1647–1657
Zhou Q, Wang S, Xing Y et al (2021) Multiple adversarial networks for unsupervised domain adaptation. Knowl-Based Syst 212:106606
Zhang W, Ouyang W, Li W, Xu D (2018) Collaborative and adversarial network for unsupervised domain adaptation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3801–3809
Chen C, Xie W, Huang W, Rong Y, Ding X, Huang Y, Xu T, Huang J (2019) Progressive feature alignment for unsupervised domain adaptation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 627–636
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 248–255
Na J, Jung H, Chang HJ, Hwang W (2021) Fixbi: bridging domain spaces for unsupervised domain adaptation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1094–1103
Saenko K, Kulis B, Fritz M, Darrell T (2010) Adapting visual category models to new domains. In: Proceedings of the European conference on computer vision, pp 213–226
Venkateswara H, Eusebio J, Chakraborty S, Panchanathan S (2017) Deep hashing network for unsupervised domain adaptation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5018–5027
Wu Y, Inkpen D, El-Roby A (2020) Dual mixup regularized learning for adversarial domain adaptation. In: Proceedings of the European conference on computer vision, pp 540–555
You F, Su H, Li J, Zhu L, Lu K, Yang Y (2021) Learning a weighted classifier for conditional domain adaptation. Knowl-Based Syst 215:106774
Xu R, Li G, Yang J, Lin L (2019) Larger norm more transferable: an adaptive feature norm approach for unsupervised domain adaptation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 1426–1435
Du Z, Li J, Su H, Zhu L, Lu K (2021) Cross-domain gradient discrepancy minimization for unsupervised domain adaptation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3937–3946
Xiao N, Zhang L (2021) Dynamic weighted learning for unsupervised domain adaptation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 15242–15251
Chang W-G, You T, Seo S, Kwak S, Han B (2019) Domain-specific batch normalization for unsupervised domain adaptation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7354–7362
Zhang Y, Tang H, Jia K, Tan M (2019) Domain-symmetric networks for adversarial domain adaptation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5031–5040
Sharma A, Kalluri T, Chandraker M (2021) Instance level affinity-based transfer for unsupervised domain adaptation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5361–5371
Hu L, Kan M, Shan S, Chen X (2020) Unsupervised domain adaptation with hierarchical gradient synchronization. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4043–4052
Li J, Chen E, Ding Z, Zhu L, Lu K, Shen HT (2020) Maximum density divergence for domain adaptation. IEEE Trans Pattern Anal Mach Intell 43(11):3918–3930
Liang J, Hu D, Feng J (2020) Do we really need to access the source data? source hypothesis transfer for unsupervised domain adaptation. In: Proceedings of the international conference on machine learning, pp 6028–6039
Animasaun IL, Shah NA, Wakif A, Mahanthesh B, Sivaraj R, Koriko OK (2022) Ratio of momentum diffusivity to thermal diffusivity: introduction, meta-analysis, and scrutinization. CRC Press, Cambridge
Cao W, Animasaun I, Yook S-J, Oladipupo V, Ji X (2022) Simulation of the dynamics of colloidal mixture of water with various nanoparticles at different levels of partial slip: Ternary-hybrid nanofluid. Int Commun Heat Mass Transfer 135:106069
Van Der Maaten L, Hinton G (2008) Visualizing data using t-sne. J Mach Learn Res 9:2579–2625
Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE international conference on computer vision, pp 618–626
Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV)
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
All authors disclosed no relevant relationships.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Yifan Ye and Shuai Fu have the same contribution to this article.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ye, Y., Fu, S. & Chen, J. Learning cross-domain representations by vision transformer for unsupervised domain adaptation. Neural Comput & Applic 35, 10847–10860 (2023). https://doi.org/10.1007/s00521-023-08269-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-023-08269-7