Abstract
Domain adaptation refers to the process of utilizing the labeled source domain data to learn a model that can perform well in the target domain with limited or missing labels. Several domain adaptation methods combining image translation and feature alignment have been recently proposed. However, there are two primary drawbacks of such methods. First, the majority of the methods assume that synthetic target images have the same distribution as real target images, and thus, only the synthetic target images are employed for training the target classifier, which makes the model’s performance significantly dependent on the quality of the generated images. Second, most of the methods blindly align the discriminative content information by merging spatial and channel-wise information, thereby ignoring the relationships among channels. To address these issues, a two-step approach that joints two-stream Wasserstein auto-encoder (WAE) and selective attention (SA) alignment, named J2WSA, is proposed in this study. In the pre-training step, the two-stream WAE is employed for mapping the four domains to a shared nice manifold structure by minimizing the Wasserstein distance between the distribution of each domain and the corresponding prior distribution. During the fine-tuning step, the SA alignment model initialized by the two-stream WAE is applied for automatically selecting the style part of channels for alignment, while simultaneously suppressing the content part alignment using the SA block. Extensive experiments indicate that the combination of the aforementioned two models can achieve state-of-the-art performance on the Office-31 and digital domain adaptation benchmarks.







Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: International conference on neural information processing systems, pp 1097–1105
Zhang H, Ji Y, Huang W, Liu L (2018) Sitcom-star-based clothing retrieval for video advertising: a deep learning framework. Neural Comput Appl. https://doi.org/10.1007/s00521-018-3579-x
Badrinarayanan V, Kendall A, Cipolla R (2015) Segnet: a deep convolutional encoder-decoder architecture for scene segmentation. IEEE Trans Pattern Anal Mach Intell 99:1–1
Gatys LA, Ecker AS, Bethge M (2016) Image style transfer using convolutional neural networks. In: The IEEE conference on computer vision and pattern recognition, pp 2414–2423
Candela JQ, Sugiyama M, Schwaighofer A, Lawrence ND (2009) Dataset shift in machine learning. MIT Press, Cambridge
Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359
Kang G, Zheng L, Yan Y, Yang Y (2018) Deep adversarial attention alignment for unsupervised domain adaptation: The benefit of target expectation maximization. In: European conference on computer vision, pp 401–416
Long M, Wang J, Ding G, Sun J, Yu PS (2014) Transfer joint matching for unsupervised domain adaptation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1410–1417
Tzeng E, Hoffman J, Zhang N, Saenko K, Darrell T (2014) Deep domain confusion: maximizing for domain invariance. arXiv preprint arXiv:1412.3474
Long M, Zhu H, Wang J, Jordan MI (2017) Deep transfer learning with joint adaptation networks. In: International conference on machine learning, pp 2208–2217
Sun B, Feng J, Saenko K (2016) Return of frustratingly easy domain adaptation. AAAI 6(7):8
Sun B, Saenko K (2016) Deep coral: correlation alignment for deep domain adaptation. In: European conference on computer vision. Springer, pp 443–450
Zellinger W, Grubinger T, Lughofer E, Natschläger T, Saminger-Platz S (2017) Central moment discrepancy (cmd) for domain-invariant representation learning. arXiv preprint arXiv:1702.08811
Tzeng E, Hoffman J, Saenko K, Darrell T (2017) Adversarial discriminative domain adaptation. In: The IEEE conference on computer vision and pattern recognition (CVPR), vol 1, p 4
Ganin Y, Ustinova E, Ajakan H, Germain P, Larochelle H, Laviolette F, Marchand M, Lempitsky V (2016) Domain-adversarial training of neural networks. J Mach Learn Res 17(1):2030–2096
Hoffman J, Tzeng E, Park T, Zhu J-Y, Isola P, Saenko K, Efros AA, Darrell T (2018) Cycada: cycle-consistent adversarial domain adaptation. In: International conference on machine learning, pp 1989–1998
Bousmalis K, Silberman N, Dohan D, Erhan D, Krishnan D (2017) Unsupervised pixel-level domain adaptation with generative adversarial networks. In: The IEEE conference on computer vision and pattern recognition (CVPR), vol 1, p 7
Zhang H, Sun Y, Liu L, Wang X, Li L, Liu W (2018) Clothingout: a category-supervised gan model for clothing segmentation and retrieval. Neural Comput Appl. https://doi.org/10.1007/s00521-018-3691-y
Liu M-Y, Breuel T, Kautz J (2017) Unsupervised image-to-image translation networks. In: Advances in Neural Information Processing Systems, pp 700–708
Liu AH, Liu Y-C, Yeh Y-Y, Wang Y-C F (2018) A unified feature disentangler for multi-domain image translation and manipulation. In: Neural information processing systems, pp 2590–2599
Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville AC, Bengio Y (2014) Generative adversarial nets. Adv Neural Inf Process Syst 27:2672–2680
Kingma DP, Welling M (2014) Auto-encoding variational bayes. In: International conference on learning representations
Long M, Cao Y, Wang J, Jordan M (2015) Learning transferable features with deep adaptation networks. In: International conference on machine learning, pp 97–105
Yang Z, Yu W, Liang P, Guo H, Xia L, Zhang F, Ma Y, Ma J (2018) Deep transfer learning for military object recognition under small training set condition. Neural Comput Appl. https://doi.org/10.1007/s00521-018-3468-3
Jiang B, Chen C, Jin X (2018) Unsupervised domain adaptation with target reconstruction and label confusion in the common subspace. Neural Comput Appl. https://doi.org/10.1007/s00521-018-3846-x
Zhu JY, Park T, Isola P, Efros AA (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. In: IEEE international conference on computer vision, pp 2242–2251
Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: European conference on computer vision. Springer, pp 818–833
Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A (2016) Learning deep features for discriminative localization In: The IEEE conference on computer vision and pattern recognition, pp 2921–2929
Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D et al (2017) Grad-cam: Visual explanations from deep networks via gradient-based localization. In: ICCV, pp 618–626
Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: International conference on learning representations
Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, Zemel R, Bengio Y (2015) Show, attend and tell: neural image caption generation with visual attention. In: International conference on machine learning, pp 2048–2057
Yang Z, He X, Gao J, Deng L, Smola AJ (2016) Stacked attention networks for image question answering. In: 2006 IEEE conference on computer vision and pattern recognition (CVPR), pp 21–29
Sharma S, Kiros R, Salakhutdinov R (2015) Action recognition using visual attention. In: International conference on learning representations
Lee C-Y, Osindero S (2016) Recursive recurrent nets with attention modeling for OCR in the wild. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), 2016, pp 2231–2239
Ji Y, Zhang H, Wu QJ (2018) Salient object detection via multi-scale attention CNN. Neurocomputing 322:130–140
Tolstikhin IO, Bousquet O, Gelly S, Schoelkopf B (2018) Wasserstein auto-encoders. In: International conference on learning representations
Villani C (2003) Topics in optimal transportation. AMS Graduate Studies in Mathematics, p 370
Bousquet O, Gelly S, Tolstikhin I, Simon-Gabriel C-J, Schoelkopf B (2017) From optimal transport to generative modeling: the vegan cookbook. arXiv preprint arXiv:1705.07642
Smola A, Gretton A, Song L, Schölkopf B (2007) A hilbert space embedding for distributions. In: International conference on algorithmic learning theory, pp 13–31
Gretton A, Sriperumbudur B, Sejdinovic D, Strathmann H, Balakrishnan S, Pontil M, Kenji F (2012) Optimal kernel choice for large-scale two-sample tests. In: Advances in neural information processing systems, pp 1205–1213
Netzer Y, Wang T, Coates A, Bissacco A, Wu B, Ng AY (2011) Reading digits in natural images with unsupervised feature learning. In: Nips workshop on deep learning and unsupervised feature learning
Lecun YL, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Hull JJ (2002) A database for handwritten text recognition research. IEEE Trans Pattern Anal Mach Intell 16(5):550–554
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: The IEEE conference on computer vision and pattern recognition (CVPR)
Arbelez P, Maire M, Fowlkes C, Malik J (2011) Contour detection and hierarchical image segmentation. IEEE Trans Pattern Anal Mach Intell 33(5):898–916
Donahue J, Jia Y, Vinyals O, Hoffman J, Zhang N, Tzeng E, Darrell T (2014) Decaf: a deep convolutional activation feature for generic visual recognition. In: International conference on machine learning, pp 647–655
Acknowledgements
This work was supported by the Opening Foundation of the State Key Laboratory (No. 2014KF06) and the National Science and Technology Major Project (No. 2013ZX03005013).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Chen, Z., Chen, C., Jin, X. et al. Deep joint two-stream Wasserstein auto-encoder and selective attention alignment for unsupervised domain adaptation. Neural Comput & Applic 32, 7489–7502 (2020). https://doi.org/10.1007/s00521-019-04262-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-019-04262-1