research-article

Open-Scenario Domain Adaptive Object Detection in Autonomous Driving

Authors:

Heng Tao ShenAuthors Info & Claims

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

Pages 8453 - 8462

https://doi.org/10.1145/3581783.3611854

Published: 27 October 2023 Publication History

Abstract

Existing domain adaptive object detection algorithms (DAOD) have demonstrated their effectiveness in discriminating and localizing objects across scenarios. However, these algorithms typically assume a single source and target domain for adaptation, which is not representative of the more complex data distributions in practice. To address this issue, we propose a novel Open-Scenario Domain Adaptive Object Detection (OSDA), which leverages multiple source and target domains for more practical and effective domain adaptation. We are the first to increase the granularity of the background category by building the foundation model using contrastive vision-language pre-training in an open-scenario setting for better distinguishing foreground and background, which is under-explored in previous studies. The performance gains by introducing the pre-training have been observed and have validated the model's ability to detect objects across domains. To further fine-tune the model for domain-specific object detection, we propose a hierarchical feature alignment strategy to obtain a better common feature space among the various source and target domains. In the case of multi-source domains, the cross-reconstruction framework is introduced for learning more domain invariances. The proposed method is able to alleviate knowledge forgetting without any additional computational costs. Extensive experiments across different scenarios demonstrate the effectiveness of the proposed model.

Supplemental Material

MP4 File

Presentation video - short version

Download
155.14 MB

References

[1]

Manuele Barraco, Marcella Cornia, Silvia Cascianelli, Lorenzo Baraldi, and Rita Cucchiara. 2022. The unreasonable effectiveness of CLIP features for image captioning: an experimental analysis. In proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4662--4670.

[2]

Chaoqi Chen, Zebiao Zheng, Xinghao Ding, Yue Huang, and Qi Dou. 2020. Harmonizing transferability and discriminability for adapting object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8869--8878.

[3]

Yuhua Chen, Wen Li, Christos Sakaridis, Dengxin Dai, and Luc Van Gool. 2018. Domain adaptive faster r-cnn for object detection in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3339--3348.

[4]

Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. 2016. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3213--3223.

[5]

Samyak Datta, Karan Sikka, Anirban Roy, Karuna Ahuja, Devi Parikh, and Ajay Divakaran. 2019. Align2ground: Weakly supervised phrase grounding guided by image-caption alignment. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 2601--2610.

[6]

Jinhong Deng, Wen Li, Yuhua Chen, and Lixin Duan. 2021. Unbiased mean teacher for cross-domain object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4091--4101.

[7]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).

[8]

Yaroslav Ganin and Victor Lempitsky. 2015. Unsupervised domain adaptation by backpropagation. In International Conference on Machine Learning. PMLR, 1180--1189.

[9]

Andreas Geiger, Philip Lenz, Christoph Stiller, and Raquel Urtasun. 2013. Vision meets robotics: The kitti dataset. The International Journal of Robotics Research 32, 11 (2013), 1231--1237.

Digital Library

[10]

Kaixiong Gong, Shuang Li, Shugang Li, Rui Zhang, Chi Harold Liu, and Qiang Chen. 2022. Improving Transferability for Domain Adaptive Detection Transformers. In Proceedings of the 30th ACM International Conference on Multimedia. 1543--1551.

Digital Library

[11]

Lijun Gou, Jinrong Yang, Hangcheng Yu, Pan Wang, Xiaoping Li, and Chao Deng. 2022. A Semantic Consistency Feature Alignment Object Detection Model Based on Mixed-Class Distribution Metrics. arXiv preprint arXiv:2206.05765 (2022).

[12]

Junlin Han, Mehrdad Shoeiby, Lars Petersson, and Mohammad Ali Armin. 2021. Dual contrastive learning for unsupervised image-to-image translation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 746--755.

[13]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770--778.

[14]

Mengzhe He, Yali Wang, Jiaxi Wu, Yiru Wang, Hanqing Li, Bo Li, Weihao Gan, Wei Wu, and Yu Qiao. 2022. Cross domain object detection by target-perceived dual branch distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9570--9580.

[15]

Xiaowei Hu, Chi-Wing Fu, Lei Zhu, and Pheng-Ann Heng. 2019. Depth-attentional features for single-image rain removal. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8022--8031.

[16]

Jiaxing Huang, Dayan Guan, Aoran Xiao, Shijian Lu, and Ling Shao. 2022. Category contrast for unsupervised domain adaptation in visual tasks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1203--1214.

[17]

Wei-Jie Huang, Yu-Lin Lu, Shih-Yao Lin, Yusheng Xie, and Yen-Yu Lin. 2022. AQT: Adversarial Query Transformers for Domain Adaptive Object Detection. In 31st International Joint Conference on Artificial Intelligence, IJCAI 2022. International Joint Conferences on Artificial Intelligence, 972--979.

[18]

Zi-Rong Jin, Liang-Jian Deng, Tian-Jing Zhang, and Xiao-Xu Jin. 2021. BAM: Bilateral activation mechanism for image fusion. In Proceedings of the 29th ACM International Conference on Multimedia. 4315--4323.

Digital Library

[19]

Madhu Kiran, Marco Pedersoli, Jose Dolz, Louis-Antoine Blais-Morin, Eric Granger, et al. 2022. Incremental multi-target domain adaptation for object detection with efficient domain transfer. Pattern Recognition 129 (2022), 108771.

Digital Library

[20]

Shuang Li, Chi Harold Liu, Binhui Xie, Limin Su, Zhengming Ding, and Gao Huang. 2019. Joint adversarial domain adaptation. In Proceedings of the 27th ACM International Conference on Multimedia. 729--737.

Digital Library

[21]

Wuyang Li, Xinyu Liu, Xiwen Yao, and Yixuan Yuan. 2022. SCAN: Cross Domain Object Detection with Semantic Conditioned Adaptation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 6. 7.

[22]

Xiaoxu Li, Jijie Wu, Zhuo Sun, Zhanyu Ma, Jie Cao, and Jing-Hao Xue. 2020. BSNet: Bi-similarity network for few-shot fine-grained image classification. IEEE Transactions on Image Processing 30 (2020), 1318--1331.

Digital Library

[23]

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In Proceedings of the European Conference on Computer Vision. Springer, 740--755.

[24]

Risheng Liu, Zhu Liu, Jinyuan Liu, and Xin Fan. 2021. Searching a hierarchically aggregated fusion architecture for fast multi-modality image fusion. In Proceedings of the 29th ACM International Conference on Multimedia. 1600--1608.

Digital Library

[25]

Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. 2016. Ssd: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision. Springer, 21--37.

[26]

Yishay Mansour, Mehryar Mohri, and Afshin Rostamizadeh. 2008. Domain adaptation with multiple sources. Advances in Neural Information Processing Systems 21 (2008).

[27]

Taesung Park, Alexei A Efros, Richard Zhang, and Jun-Yan Zhu. 2020. Contrastive learning for unpaired image-to-image translation. In Proceedings of the European Conference on Computer Vision. Springer, 319--345.

Digital Library

[28]

Xingchao Peng, Qinxun Bai, Xide Xia, Zijun Huang, Kate Saenko, and Bo Wang. 2019. Moment matching for multi-source domain adaptation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1406--1415.

[29]

Xingchao Peng, Qinxun Bai, Xide Xia, Zijun Huang, Kate Saenko, and Bo Wang. 2019. Moment matching for multi-source domain adaptation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1406--1415.

[30]

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning. PMLR, 8748--8763.

[31]

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems 28 (2015).

[32]

Farzaneh Rezaeianaran, Rakshith Shetty, Rahaf Aljundi, Daniel Olmeda Reino, Shanshan Zhang, and Bernt Schiele. 2021. Seeking similarities over differences: Similarity-based domain alignment for adaptive object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 9204--9213.

[33]

Kuniaki Saito, Yoshitaka Ushiku, Tatsuya Harada, and Kate Saenko. 2019. Strong-weak distribution alignment for adaptive object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6956--6965.

[34]

Christos Sakaridis, Dengxin Dai, and Luc Van Gool. 2018. Semantic foggy scene understanding with synthetic data. International Journal of Computer Vision 126, 9 (2018), 973--992.

Digital Library

[35]

Zhiqiang Shen, Harsh Maheshwari, Weichen Yao, and Marios Savvides. 2019. Scl: Towards accurate domain adaptive object detection via gradient detach based stacked complementary losses. arXiv preprint arXiv:1911.02559 (2019).

[36]

Hengcan Shi, Munawar Hayat, Yicheng Wu, and Jianfei Cai. 2022. Proposal-CLIP: unsupervised open-category object proposal generation via exploiting clip cues. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9611--9620.

[37]

Qian Sun, Rita Chattopadhyay, Sethuraman Panchanathan, and Jieping Ye. 2011. A two-stage weighting framework for multi-source domain adaptation. Advances in Neural Information Processing Systems 24 (2011).

[38]

Zhi Tian, Chunhua Shen, Hao Chen, and Tong He. 2019. Fcos: Fully convolutional one-stage object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 9627--9636.

[39]

Naveen Venkat, Jogendra Nath Kundu, Durgesh Singh, Ambareesh Revanur, et al. 2020. Your classifier can secretly suffice multi-source domain adaptation. Advances in Neural Information Processing Systems 33 (2020), 4647--4659.

[40]

Vibashan VS, Vikram Gupta, Poojan Oza, Vishwanath A Sindagi, and Vishal M Patel. 2021. Mega-cda: Memory guided attention for category-aware unsupervised domain adaptive object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4516--4526.

[41]

Wen Wang, Yang Cao, Jing Zhang, Fengxiang He, Zheng-Jun Zha, Yonggang Wen, and Dacheng Tao. 2021. Exploring Sequence Feature Alignment for Domain Adaptive Detection Transformers. In Proceedings of the 29th ACM International Conference on Multimedia. 1730--1738.

Digital Library

[42]

Jiwei Wei, Xing Xu, Zheng Wang, and Guoqing Wang. 2021. Meta self-paced learning for cross-modal matching. In Proceedings of the 29th ACM international conference on multimedia. 3835--3843.

Digital Library

[43]

Jiwei Wei, Xing Xu, Yang Yang, Yanli Ji, Zheng Wang, and Heng Tao Shen. 2020. Universal weighting metric learning for cross-modal matching. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 13005--13014.

[44]

Jiwei Wei, Yang Yang, Xing Xu, Jingkuan Song, Guoqing Wang, and Heng Tao Shen. 2023. Less is Better: Exponential Loss for Cross-Modal Matching. IEEE Transactions on Circuits and Systems for Video Technology (2023). https://doi.org/ 10.1109/TCSVT.2023.3249754

Digital Library

[45]

Jiwei Wei, Yang Yang, Xing Xu, Xiaofeng Zhu, and Heng Tao Shen. 2022. Universal Weighting Metric Learning for Cross-Modal Retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 10 (2022), 6534--6545. https://doi. org/10.1109/TPAMI.2021.3088863

Digital Library

[46]

Xing Wei, Shaofan Liu, Yaoci Xiang, Zhangling Duan, Chong Zhao, and Yang Lu. 2020. Incremental learning based multi-domain adaptation for object detection. Knowledge-Based Systems 210 (2020), 106420.

[47]

Aming Wu, Yahong Han, Linchao Zhu, and Yi Yang. 2021. Instance-invariant domain adaptive object detection via progressive disentanglement. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 8 (2021), 4178--4193.

[48]

Jiaxi Wu, Jiaxin Chen, Mengzhe He, Yiru Wang, Bo Li, Bingqi Ma, Weihao Gan, Wei Wu, Yali Wang, and Di Huang. 2022. Target-relevant knowledge preservation for multi-source domain adaptive object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5301--5310.

[49]

Chang-Dong Xu, Xing-Ran Zhao, Xin Jin, and Xiu-Shen Wei. 2020. Exploring categorical regularization for domain adaptive object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11724--11733.

[50]

Han Xu, Xinya Wang, and Jiayi Ma. 2021. DRF: Disentangled representation for visible and infrared image fusion. IEEE Transactions on Instrumentation and Measurement 70 (2021), 1--13.

[51]

Minghao Xu, Hang Wang, Bingbing Ni, Qi Tian, and Wenjun Zhang. 2020. Cross-domain detection via graph-induced prototype alignment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12355--12364.

[52]

Yahui Xu, Yi Bin, Jiwei Wei, Yang Yang, Guoqing Wang, and Heng Tao Shen. 2023. Multi-Modal Transformer with Global-Local Alignment for Composed Query Image Retrieval. IEEE Transactions on Multimedia (2023).

Digital Library

[53]

Xingxu Yao, Sicheng Zhao, Pengfei Xu, and Jufeng Yang. 2021. Multi-source domain adaptation for object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 3273--3282.

[54]

Fisher Yu, Haofeng Chen, Xin Wang, Wenqi Xian, Yingying Chen, Fangchen Liu, Vashisht Madhavan, and Trevor Darrell. 2020. Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2636--2645.

[55]

Jin Yuan, Feng Hou, Yangzhou Du, Zhongchao Shi, Xin Geng, Jianping Fan, and Yong Rui. 2022. Self-supervised graph neural network for multi-source domain adaptation. In Proceedings of the 30th ACM International Conference on Multimedia. 3907--3916.

Digital Library

[56]

Alireza Zareian, Kevin Dela Rosa, Derek Hao Hu, and Shih-Fu Chang. 2021. Open-vocabulary object detection using captions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14393--14402.

[57]

Hui Zhang, Junkun Tang, Yihong Cao, Yurong Chen, Yaonan Wang, and QM Jonathan Wu. 2022. Cycle Consistency Based Pseudo Label and Fine Alignment for Unsupervised Domain Adaptation. IEEE Transactions on Multimedia (2022).

Digital Library

[58]

Jingyi Zhang, Jiaxing Huang, Zichen Tian, and Shijian Lu. 2022. Spectral unsu-pervised domain adaptation for visual recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9829--9840.

[59]

Han Zhao, Shanghang Zhang, Guanhang Wu, José MF Moura, Joao P Costeira, and GeoffreyJ Gordon. 2018. Adversarial multiple source domain adaptation. Advances in Neural Information Processing Systems 31 (2018).

[60]

Lifan Zhao, Yunlong Meng, and Lin Xu. 2022. OA-FSUI2IT: A Novel Few-Shot Cross Domain Object Detection Framework with Object-Aware Few-Shot Unsupervised Image-to-Image Translation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 3426--3435.

[61]

Liang Zhao and Limin Wang. 2022. Task-specific Inconsistency Alignment for Domain Adaptive Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14217--14226.

[62]

Yiwu Zhong, Jianwei Yang, Pengchuan Zhang, Chunyuan Li, Noel Codella, Liunian Harold Li, Luowei Zhou, Xiyang Dai, Lu Yuan, Yin Li, et al. 2022. Regionclip: Region-based language-image pretraining. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16793--16803.

[63]

Qianyu Zhou, Qiqi Gu, Jiangmiao Pang, Xuequan Lu, and Lizhuang Ma. 2023. Self-adversarial disentangling for specific domain adaptation. IEEE Transactions on Pattern Analysis and Machine Intelligence (2023).

Digital Library

[64]

Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision. 2223--2232.

[65]

Xinge Zhu, Jiangmiao Pang, Ceyuan Yang, Jianping Shi, and Dahua Lin. 2019. Adapting object detectors via selective cross-domain alignment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 687--696.

Cited By

Wang XRen WChen XFan HTang YHan ZCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Uni-YOLO: Vision-Language Model-Guided YOLO for Robust and Fast Universal Detection in the Open WorldProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681212(1991-2000)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681212

Index Terms

Open-Scenario Domain Adaptive Object Detection in Autonomous Driving
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems

Recommendations

Rethinking Open-World Object Detection in Autonomous Driving Scenarios
MM '22: Proceedings of the 30th ACM International Conference on Multimedia

Existing object detection models have been demonstrated to successfully discriminate and localize the predefined object categories under the seen or similar situations. However, the open-world object detection as required by autonomous driving ...
Improving Transferability for Domain Adaptive Detection Transformers
MM '22: Proceedings of the 30th ACM International Conference on Multimedia

DETR-style detectors stand out amongst in-domain scenarios, but their properties in domain shift settings are under-explored. This paper aims to build a simple but effective baseline with a DETR-style detector on domain shift settings based on two ...
Region Feature Disentanglement for Domain Adaptive Object Detection
Artificial Neural Networks and Machine Learning – ICANN 2023
Abstract
In recent years, deep learning based object detection has shown impressive results. However, applying an object detector learned from one data domain to another one often faces performance degradation due to the domain shift problem. To improve ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

October 2023

9913 pages

ISBN:9798400701085

DOI:10.1145/3581783

General Chairs:
Abdulmotaleb El Saddik
University of Ottawa, Canada & MBZUAI, UAE
,
Tao Mei
HiDream.ai, China
,
Rita Cucchiara
University of Modena and Reggio Emilia, Italy
,
Program Chairs:
Marco Bertini
University of Florence, Italy
,
Diana Patricia Tobon Vallejo
Unversidad de Medellin, Colombia
,
Pradeep K. Atrey
University at Albany, State University of New York, USA
,
M. Shamim Hossain
M. Shamim Hossain (King Saud University, KSA

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 October 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Key-Area Research and Development Program of Guangdong Province
National Natural Science Foundation of China
China Postdoctoral Science Foundation with Project

Conference

MM '23

Sponsor:

SIGMM

MM '23: The 31st ACM International Conference on Multimedia

October 29 - November 3, 2023

Ottawa ON, Canada

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
212
Total Downloads

Downloads (Last 12 months)97
Downloads (Last 6 weeks)9

Reflects downloads up to 01 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wang XRen WChen XFan HTang YHan ZCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Uni-YOLO: Vision-Language Model-Guided YOLO for Robust and Fast Universal Detection in the Open WorldProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681212(1991-2000)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681212

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten