Abstract
Object detection based on deep learning has been enormously developed in recent years. However, applying the detectors trained on a label-rich domain to an unseen domain results in performance drop due to the domain-shift. To deal with this problem, we propose a novel unsupervised domain adaptation method to adapt from a labeled source domain to an unlabeled target domain. Recent approaches based on adversarial learning show some effect for aligning the feature distributions of different domains, but the decision boundary would be strongly source-biased for the complex detection task when merely training with source labels and aligning in the entire feature distribution. In this paper, we suggest utilizing image translation to generate translated images of source and target domains to fill in the large domain gap and facilitate a paired adaptation. We propose a hierarchical contrastive adaptation method between the original and translated domains to encourage the detectors to learn domain-invariant but discriminative features. To attach importance to foreground instances and tackle the noises of translated images, we further propose foreground attention reweighting for instance-aware adaptation . Experiments are carried out on 3 cross-domain detection scenarios, and we achieve the state-of-the-art results against other approaches, showing the effectiveness of our proposed method.
Similar content being viewed by others
References
Arun, A., Jawahar, C., Kumar, M.: Dissimilarity coefficient based weakly supervised object detection. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) pp. 9424–9433 (2019)
Bachman, P., Hjelm, R.D., Buchwalter, W.: Learning representations by maximizing mutual information across views. (2019) arXiv:1906.00910
Benenson, R., Popov, S., Ferrari, V.: Large-scale interactive object segmentation with human annotators. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) pp. 11692–11701 (2019)
Bousmalis, K., Silberman, N., Dohan, D., Erhan, D., Krishnan, D.: Unsupervised pixel-level domain adaptation with generative adversarial networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp. 95–104 (2017)
Cai, Q., Pan, Y., Ngo, C., Tian, X., Duan, L., Yao, T.: Exploring object relation in mean teacher for cross-domain detection. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) pp. 11449–11458 (2019)
Chen, C., Xie, W., Xu, T., Huang, W., Rong, Y., Ding, X., Huang, Y., Huang, J.: Progressive feature alignment for unsupervised domain adaptation. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) pp. 627–636 (2019)
Chen, C., Zheng, Z., Ding, X., Huang, Y., Dou, Q.: Harmonizing transferability and discriminability for adapting object detectors. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.E.: A simple framework for contrastive learning of visual representations. (2020) arXiv:2002.05709
Chen, Y., Li, W., Sakaridis, C., Dai, D., Gool, L.V.: Domain adaptive faster r-cnn for object detection in the wild. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition pp. 3339–3348 (2018)
Cordts, M., Omran, M., Ramos, S., Rehfeld, T., Enzweiler, M., Benenson, R., Franke, U., Roth, S., Schiele, B.: The cityscapes dataset for semantic urban scene understanding. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp. 3213–3223 (2016)
Deng, J., Dong, W., Socher, R., Li, L., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: CVPR 2009 (2009)
Everingham, M., Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 88, 303–338 (2009)
Ganin, Y., Lempitsky, V.: Unsupervised domain adaptation by backpropagation. In: ICML (2015)
Guan, D., Huang, J., Xiao, A., Lu, S., Cao, Y.: Uncertainty-aware unsupervised domain adaptation in object detection. (2021) arXiv:2103.00236
Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant mapping. In: CVPR (2006)
He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.B.: Momentum contrast for unsupervised visual representation learning. (2019) arXiv:1911.05722
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp. 770–778 (2016)
He, Z., Zhang, L.: Multi-adversarial faster-rcnn for unrestricted object detection. 2019 IEEE/CVF International Conference on Computer Vision (ICCV) pp. 6667–6676 (2019)
Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. (2019) arXiv:1808.06670
Hoffman, J., Tzeng, E., Park, T., Zhu, J.Y., Isola, P., Saenko, K., Efros, A.A., Darrell, T.: Cycada: Cycle-consistent adversarial domain adaptation. In: ICML (2018)
Hsu, H.K., Hung, W.C., Tseng, H.Y., Yao, C.H., Tsai, Y.H., Singh, M.K., Yang, M.H.: Progressive domain adaptation for object detection. 2020 IEEE Winter Conference on Applications of Computer Vision (WACV) pp. 738–746 (2020)
Huang, S.W., Lin, C.T., Chen, S., Wu, Y.Y., Hsu, P.H., Lai, S.: Auggan: Cross domain adaptation with gan-based data augmentation. In: ECCV (2018)
Inoue, N., Furuta, R., Yamasaki, T., Aizawa, K.: Cross-domain weakly-supervised object detection through progressive domain adaptation. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition pp. 5001–5009 (2018)
Jang, W.D., Kim, C.S.: Interactive image segmentation via backpropagating refinement scheme. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) pp. 5292–5301 (2019)
Johnson-Roberson, M., Barto, C., Mehta, R., Sridhar, S.N., Rosaen, K., Vasudevan, R.: Driving in the matrix: Can virtual worlds replace human-generated annotations for real world tasks? 2017 IEEE International Conference on Robotics and Automation (ICRA) pp. 746–753 (2017)
Kang, G., Jiang, L., Yang, Y., Hauptmann, A.: Contrastive adaptation network for unsupervised domain adaptation. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) pp. 4888–4897 (2019)
Kim, T., Jeong, M., Kim, S., Choi, S., Kim, C.: Diversify and match: A domain adaptive representation learning paradigm for object detection. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) pp. 12448–12457 (2019)
Kim, Y., Yoo, B., Kwak, Y., Choi, C., Kim, J.: Deep generative-contrastive networks for facial expression recognition. (2017) arXiv:1703.07140
Liu, M.Y., Breuel, T., Kautz, J.: Unsupervised image-to-image translation networks. (2017) arXiv:1703.00848
Long, M., Cao, Y., Wang, J., Jordan, M.I.: Learning transferable features with deep adaptation networks. In: ICML (2015)
Maaten, L.V.D., Hinton, G.E.: Visualizing data using t-sne. J. Mach. Learn. Res. 9, 2579–2605 (2008)
Majumder, S., Yao, A.: Content-aware multi-level guidance for interactive instance segmentation. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) pp. 11594–11603 (2019)
Oord, A., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding. (2018) arXiv:1807.03748
Park, C., Lee, J., Yoo, J., Hur, M., Yoon, S.: Joint contrastive learning for unsupervised domain adaptation. (2020) arXiv:2006.10297
Racah, E., Beckham, C., Maharaj, T., Kahou, S., Prabhat, Pal, C.: Extremeweather: A large-scale climate dataset for semi-supervised detection, localization, and understanding of extreme weather events. In: NIPS (2017)
Radosavovic, I., Dollár, P., Girshick, R.B., Gkioxari, G., He, K.: Data distillation: Towards omni-supervised learning. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition pp. 4119–4128 (2018)
Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans. Patt. Anal. Mach. Intell. 39, 1137–1149 (2015)
Rezaeianaran, F., Shetty, R., Aljundi, R., Reino, D.O., Zhang, S., Schiele, B.: Seeking similarities over differences: Similarity-based domain alignment for adaptive object detection. (2021) arXiv:2110.01428
Saito, K., Ushiku, Y., Harada, T., Saenko, K.: Strong-weak distribution alignment for adaptive object detection. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) pp. 6949–6958 (2019)
Saito, K., Watanabe, K., Ushiku, Y., Harada, T.: Maximum classifier discrepancy for unsupervised domain adaptation. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition pp. 3723–3732 (2018)
Sakaridis, C., Dai, D., Gool, L.: Semantic foggy scene understanding with synthetic data. Int. J. Comput. Vis. 126, 973–992 (2018)
Sermanet, P., Lynch, C., Hsu, J., Levine, S.: Time-contrastive networks: Self-supervised learning from multi-view observation. In: CVPRW (2017)
Shen, J., Qu, Y., Zhang, W., Yu, Y.: Wasserstein distance guided representation learning for domain adaptation. In: AAAI (2018)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR (2015) arXiv:1409.1556
Srivastava, N.: Unsupervised learning of visual representations using videos (2015)
Tang, Y., Wang, J., Gao, B., Dellandréa, E., Gaizauskas, R., Chen, L.: Large scale semi-supervised object detection using visual and semantic knowledge transfer. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp. 2119–2128 (2016)
Tang, Y., Zou, W., Jin, Z., Chen, Y., Hua, Y., Li, X.: Weakly supervised salient object detection with spatiotemporal cascade neural networks. IEEE Trans. Circuits and Syst. Video Technol. 29, 1973–1984 (2019)
Tsai, Y.H., Hung, W.C., Schulter, S., Sohn, K., Yang, M.H., Chandraker, M.: Learning to adapt structured output space for semantic segmentation. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition pp. 7472–7481 (2018)
Tzeng, E., Hoffman, J., Saenko, K., Darrell, T.: Adversarial discriminative domain adaptation. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp. 2962–2971 (2017)
Vu, T.H., Jain, H., Bucher, M., Cord, M., Pérez, P.: Advent: Adversarial entropy minimization for domain adaptation in semantic segmentation. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) pp. 2512–2521 (2019)
Wan, F., Liu, C., Ke, W., Ji, X., Jiao, J., Ye, Q.: C-mil: Continuation multiple instance learning for weakly supervised object detection. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) pp. 2194–2203 (2019)
Wu, Z., Xiong, Y., Yu, S., Lin, D.: Unsupervised feature learning via non-parametric instance discrimination. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) pp. 3733–3742 (2018)
Xu, C., Zhao, X., Jin, X., Wei, X.S.: Exploring categorical regularization for domain adaptive object detection. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) pp. 11721–11730 (2020)
Xu, M., Wang, H., Ni, B., Tian, Q., Zhang, W.: Cross-domain detection via graph-induced prototype alignment. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) pp. 12352–12361 (2020)
Yang, S., Wu, L., Wiliem, A., Lovell, B.C.: Unsupervised domain adaptive object detection using forward-backward cyclic adaptation. (2020) arXiv:2002.00575
Yang, Y., Soatto, S.: Fda: Fourier domain adaptation for semantic segmentation. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) pp. 4084–4094 (2020)
Yang, Z., Mahajan, D., Ghadiyaram, D., Nevatia, R., Ramanathan, V.: Activity driven weakly supervised object detection. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) pp. 2912–2921 (2019)
Yu, F., Wang, D., Chen, Y., Karianakis, N., Yu, P., Lymberopoulos, D., Chen, X.: Unsupervised domain adaptation for object detection via cross-domain semi-supervised learning. (2019) arXiv:1911.07158
Zellinger, W., Grubinger, T., Lughofer, E., Natschläger, T., Saminger-Platz, S.: Central moment discrepancy (cmd) for domain-invariant representation learning. (2017) arXiv:1702.08811
Zhang, H., Tian, Y., Wang, K., He, H., yue Wang, F.: Synthetic-to-real domain adaptation for object instance segmentation. 2019 International Joint Conference on Neural Networks (IJCNN) pp. 1–7 (2019)
Zhang, Q., Zhang, J., Liu, W., Tao, D.: Category anchor-guided unsupervised domain adaptation for semantic segmentation. (2019) arXiv:1910.13049
Zheng, Y., Huang, D., Liu, S., Wang, Y.: Cross-domain object detection through coarse-to-fine feature adaptation. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) pp. 13763–13772 (2020)
Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. 2017 IEEE International Conference on Computer Vision (ICCV) pp. 2242–2251 (2017)
Zhu, X., Pang, J., Yang, C., Shi, J., Lin, D.: Adapting object detectors via selective cross-domain alignment. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) pp. 687–696 (2019)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Deng, Z., Kong, Q., Akira, N. et al. Hierarchical contrastive adaptation for cross-domain object detection. Machine Vision and Applications 33, 62 (2022). https://doi.org/10.1007/s00138-022-01317-7
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00138-022-01317-7