research-article

DoubleAUG: Single-domain Generalized Object Detector in Urban via Color Perturbation and Dual-style Memory

Authors:

Xin GengAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications and Applications, Volume 20, Issue 5

Article No.: 126, Pages 1 - 20

https://doi.org/10.1145/3634683

Published: 11 January 2024 Publication History

Abstract

Object detection in urban scenarios is crucial for autonomous driving in intelligent traffic systems. However, unlike conventional object detection tasks, urban-scene images vary greatly in style. For example, images taken on sunny days differ significantly from those taken on rainy days. Therefore, models trained on sunny-day images may not generalize well to rainy-day images. In this article, we aim to solve the single-domain generalizable object detection task in urban scenarios, meaning that a model trained on images from one weather condition should be able to perform well on images from any other weather conditions. To address this challenge, we propose a novel Double AUGmentation (DoubleAUG) method that includes image- and feature-level augmentation schemes. In the image-level augmentation, we consider the variation in color information across different weather conditions and propose a Color Perturbation (CP) method that randomly exchanges the RGB channels to generate various images. In the feature-level augmentation, we propose to utilize a Dual-Style Memory (DSM) to explore the diverse style information on the entire dataset, further enhancing the model’s generalization capability. Extensive experiments demonstrate that our proposed method outperforms state-of-the-art methods. Furthermore, ablation studies confirm the effectiveness of each module in our proposed method. Moreover, our method is plug-and-play and can be integrated into existing methods to further improve model performance.

References

[1]

Isabela Albuquerque, João Monteiro, Mohammad Darvishi, Tiago H. Falk, and Ioannis Mitliagkas. 2019. Generalizing to unseen domains via distribution matching. arXiv preprint arXiv:1911.00804 (2019).

[2]

Rossella Aversa, Piero Coronica, Cristiano De Nobili, and Stefano Cozzini. 2020. Deep learning, feature learning, and clustering analysis for SEM image classification. Data Intell. 2, 4 (2020), 513–528.

[3]

Chaoqi Chen, Jiongcheng Li, Xiaoguang Han, Xiaoqing Liu, and Yizhou Yu. 2022. Compound domain generalization via meta-knowledge encoding. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’22). 7109–7119.

[4]

Chaoqi Chen, Zebiao Zheng, Xinghao Ding, Yue Huang, and Qi Dou. 2020. Harmonizing transferability and discriminability for adapting object detectors. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’20). 8866–8875.

[5]

Yuhua Chen, Wen Li, Christos Sakaridis, Dengxin Dai, and Luc Van Gool. 2018. Domain adaptive faster R-CNN for object detection in the wild. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’18). 3339–3348.

[6]

Sungha Choi, Sanghun Jung, Huiwon Yun, Joanne Taery Kim, Seungryong Kim, and Jaegul Choo. 2021. RobustNet: Improving domain generalization in urban-scene segmentation via instance selective whitening. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’21). 11580–11590.

[7]

Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. 2016. The cityscapes dataset for semantic urban scene understanding. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’16). 3213–3223.

[8]

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’09). 248–255.

[9]

Xinjie Fan, Qifei Wang, Junjie Ke, Feng Yang, Boqing Gong, and Mingyuan Zhou. 2021. Adversarially adaptive normalization for single domain generalization. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’21). 8208–8217.

[10]

Ben Fei, Zhaoyang Lyu, Liang Pan, Junzhe Zhang, Weidong Yang, Tianyue Luo, Bo Zhang, and Bo Dai. 2023. Generative diffusion prior for unified image restoration and enhancement. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’23). 9935–9946.

[11]

Ross B. Girshick. 2015. Fast R-CNN. In IEEE/CVF International Conference on Computer Vision (ICCV’15). 1440–1448.

Digital Library

[12]

Ross B. Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’14). 580–587.

Digital Library

[13]

Sivan Harary, Eli Schwartz, Assaf Arbelle, Peter W. J. Staar, Shady Abu Hussein, Elad Amrani, Roei Herzig, Amit Alfassy, Raja Giryes, Hilde Kuehne, Dina Katabi, Kate Saenko, Rogério Feris, and Leonid Karlinsky. 2022. Unsupervised domain generalization by learning a bridge across domains. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’22). 5270–5280.

[14]

Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross B. Girshick. 2017. Mask R-CNN. In IEEE/CVF International Conference on Computer Vision (ICCV’17). 2980–2988.

[15]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 770–778.

[16]

Cheng-Chun Hsu, Yi-Hsuan Tsai, Yen-Yu Lin, and Ming-Hsuan Yang. 2020. Every pixel matters: Center-aware feature alignment for domain adaptive object detector. In European Conference on Computer Vision (ECCV), Vol. 12354. 733–748.

[17]

Han-Kai Hsu, Wei-Chih Hung, Hung-Yu Tseng, Chun-Han Yao, Yi-Hsuan Tsai, Maneesh Singh, and Ming-Hsuan Yang. 2019. Progressive domain adaptation for object detection. In IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW’17). 1–5.

[18]

Qichang Hu, Sakrapee Paisitkriangkrai, Chunhua Shen, Anton van den Hengel, and Fatih Porikli. 2016. Fast detection of multiple objects in traffic scenes with a common detection framework. IEEE Trans. Intell. Transport. Syst. 17, 4 (2016), 1002–1014.

Digital Library

[19]

Xiaowei Hu, Chi-Wing Fu, Lei Zhu, and Pheng-Ann Heng. 2019. Depth-attentional features for single-image rain removal. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’19). 8022–8031.

[20]

Lei Huang, Yi Zhou, Fan Zhu, Li Liu, and Ling Shao. 2019. Iterative normalization: Beyond standardization towards efficient whitening. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19). 4874–4883.

[21]

Xun Huang and Serge J. Belongie. 2017. Arbitrary style transfer in real-time with adaptive instance normalization. In International Conference on Computer Vision (ICCV’17). 1510–1519.

[22]

Glenn Jocher. [n.d.]. YOLOv5. Retrieved from https://github.com/ultralytics/yolov5

[23]

M. Johnson-Roberson, Charles Barto, Rounak Mehta, Sharath Nittur Sridhar, Karl Rosaen, and Ram Vasudevan. 2017. Driving in the matrix: Can virtual worlds replace human-generated annotations for real world tasks? In IEEE International Conference on Robotics and Automation (ICRA’17). 1–8.

Digital Library

[24]

Juwon Kang, Sohyun Lee, Namyup Kim, and Suha Kwak. 2022. Style neophile: Constantly seeking novel styles for domain generalization. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’22). 7120–7130.

[25]

Da Li, Yongxin Yang, Yi-Zhe Song, and Timothy M. Hospedales. 2018. Learning to generalize: Meta-learning for domain generalization. In AAAI Conference on Artificial Intelligence (AAAI’18). 3490–3497.

[26]

Wuyang Li, Xinyu Liu, and Yixuan Yuan. 2022. SIGMA: Semantic-complete graph matching for domain adaptive object detection. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’22). 5281–5290.

[27]

Tsung-Yi Lin, Piotr Dollár, Ross B. Girshick, Kaiming He, Bharath Hariharan, and Serge J. Belongie. 2017. Feature pyramid networks for object detection. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’17). 936–944.

[28]

Tsung-Yi Lin, Michael Maire, Serge J. Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common objects in context. In European Conference on Computer Vision (ECCV’14), Vol. 8693. 740–755.

[29]

Yajing Liu, Zhiwei Xiong, Ya Li, Yuning Lu, Xinmei Tian, and Zheng-Jun Zha. 2023. Category-stitch learning for union domain generalization. ACM Trans. Multim. Comput., Commun. Applic. 19, 1 (2023).

[30]

Rang Meng, Xianfeng Li, Weijie Chen, Shicai Yang, Jie Song, Xinchao Wang, Lei Zhang, Mingli Song, Di Xie, and Shiliang Pu. 2022. Attention diversification for domain generalization. In European Conference on Computer Vision (ECCV’22), Shai Avidan, Gabriel J. Brostow, Moustapha Cissé, Giovanni Maria Farinella, and Tal Hassner (Eds.). 322–340.

Digital Library

[31]

Seonwoo Min, Nokyung Park, Siwon Kim, Seunghyun Park, and Jinkyu Kim. 2022. Grounding visual representations with texts for domain generalization. In European Conference on Computer Vision (ECCV’22), Shai Avidan, Gabriel J. Brostow, Moustapha Cissé, Giovanni Maria Farinella, and Tal Hassner (Eds.). 37–53.

Digital Library

[32]

Xingang Pan, Ping Luo, Jianping Shi, and Xiaoou Tang. 2018. Two at once: Enhancing learning and generalization capacities via IBN-Net. In European Conference on Computer Vision (ECCV’18), Vol. 11208. 484–500.

Digital Library

[33]

Xingang Pan, Xiaohang Zhan, Jianping Shi, Xiaoou Tang, and Ping Luo. 2019. Switchable whitening for deep representation learning. In International Conference on Computer Vision (ICCV’19). 1863–1871.

[34]

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Köpf, Edward Z. Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An imperative style, high-performance deep learning library. (2019), 8024–8035.

[35]

Fengchun Qiao, Long Zhao, and Xi Peng. 2020. Learning to learn single domain generalization. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’20). 12553–12562.

[36]

Joseph Redmon and Ali Farhadi. 2018. YOLOv3: An incremental improvement. arXiv/1804.02767 (2018).

[37]

Shaoqing Ren, Kaiming He, Ross B. Girshick, and Jian Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. In Conference on Advances in Neural Information Processing Systems (NeurIPS’15). 91–99.

[38]

Shaoqing Ren, Kaiming He, Ross B. Girshick, and Jian Sun. 2017. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39, 6 (2017), 1137–1149.

Digital Library

[39]

Kuniaki Saito, Yoshitaka Ushiku, Tatsuya Harada, and Kate Saenko. 2019. Strong-weak distribution alignment for adaptive object detection. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’19). 6956–6965.

[40]

Christos Sakaridis, Dengxin Dai, and Luc Van Gool. 2018. Semantic foggy scene understanding with synthetic data. Int. J. Comput. Vis. 126, 9 (2018), 973–992.

Digital Library

[41]

Damien Teney, Ehsan Abbasnejad, Simon Lucey, and Anton van den Hengel. 2022. Evading the simplicity bias: Training a diverse set of models discovers solutions with superior OOD generalization. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’22). 16740–16751.

[42]

Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 11 (2008), 2579–2605.

[43]

Riccardo Volpi, Hongseok Namkoong, Ozan Sener, John C. Duchi, Vittorio Murino, and Silvio Savarese. 2018. Generalizing to unseen domains via adversarial data augmentation. In Conference on Advances in Neural Information Processing Systems (NeurIPS’18). 5339–5349.

[44]

Chaoqun Wan, Xu Shen, Yonggang Zhang, Zhiheng Yin, Xinmei Tian, Feng Gao, Jianqiang Huang, and Xian-Sheng Hua. 2022. Meta convolutional neural networks for single domain generalization. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’22). 4672–4681.

[45]

Jindong Wang, Cuiling Lan, Chang Liu, Yidong Ouyang, and Tao Qin. 2021. Generalizing to unseen domains: A survey on domain generalization. In International Joint Conference on Artificial Intelligence, (IJCAI’21). 4627–4635.

[46]

Shujun Wang, Lequan Yu, Caizi Li, Chi-Wing Fu, and Pheng-Ann Heng. 2020. Learning from extrinsic and intrinsic supervisions for domain generalization. In European Conference on Computer Vision (ECCV’20), Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm (Eds.). 159–176.

Digital Library

[47]

Xin Wang, Thomas E. Huang, Benlin Liu, Fisher Yu, Xiaolong Wang, Joseph E. Gonzalez, and Trevor Darrell. 2021. Robust object detection via instance-level temporal cycle confusion. In IEEE/CVF International Conference on Computer Vision (ICCV’21). 9123–9132.

[48]

Zijian Wang, Yadan Luo, Ruihong Qiu, Zi Huang, and Mahsa Baktashmotlagh. 2021. Learning to diversify for single domain generalization. In International Conference on Computer Vision (ICCV’21). 814–823.

[49]

Aming Wu and Cheng Deng. 2022. Single-domain generalized object detection in urban scene via cyclic-disentangled self-distillation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’22). 837–846.

[50]

Aming Wu, Yahong Han, Linchao Zhu, and Yi Yang. 2022. Instance-invariant domain adaptive object detection via progressive disentanglement. IEEE Trans. Pattern Anal. Mach. Intell. 44, 8 (2022), 4178–4193.

Digital Library

[51]

Jiaxi Wu, Jiaxin Chen, Mengzhe He, Yiru Wang, Bo Li, Bingqi Ma, Weihao Gan, Wei Wu, Yali Wang, and Di Huang. 2022. Target-relevant knowledge preservation for multi-source domain adaptive object detection. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’22). 5291–5300.

[52]

Lei Wu, Hefei Ling, Yuxuan Shi, and Baiyan Zhang. 2022. Instance correlation graph for unsupervised domain adaptation. ACM Trans. Multim. Comput., Commun. Applic. 18, 1s (2022), 33:1–33:23.

[53]

Yuxin Wu, Alexander Kirillov, Francisco Massa, Wan-Yen Lo, and Ross Girshick. 2019. Detectron2. Retrieved from https://github.com/facebookresearch/detectron2

[54]

Chang-Dong Xu, Xing-Ran Zhao, Xin Jin, and Xiu-Shen Wei. 2020. Exploring categorical regularization for domain adaptive object detection. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’20). 11721–11730.

[55]

Minghao Xu, Hang Wang, Bingbing Ni, Qi Tian, and Wenjun Zhang. 2020. Cross-domain detection via graph-induced prototype alignment. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’20). 12352–12361.

[56]

Yifan Xu, Kekai Sheng, Weiming Dong, Baoyuan Wu, Changsheng Xu, and Bao-Gang Hu. 2022. Towards corruption-agnostic robust domain adaptation. ACM Trans. Multim. Comput., Commun. Applic. 18, 4 (2022), 99:1–99:16.

[57]

Yuzhe Yang, Hao Wang, and Dina Katabi. 2022. On multi-domain long-tailed recognition, imbalanced domain generalization and beyond. In European Conference on Computer Vision (ECCV’22), Vol. 13680. 57–75.

Digital Library

[58]

Xufeng Yao, Yang Bai, Xinyun Zhang, Yuechen Zhang, Qi Sun, Ran Chen, Ruiyu Li, and Bei Yu. 2022. PCL: Proxy-based contrastive learning for domain generalization. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’22). 7087–7097.

[59]

Hanlin Zhang, Yi-Fan Zhang, Weiyang Liu, Adrian Weller, Bernhard Schölkopf, and Eric P. Xing. 2022. Towards principled disentanglement for domain generalization. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’22). 8014–8024.

[60]

Jian Zhang, Lei Qi, Yinghuan Shi, and Yang Gao. 2022. MVDG: A unified multi-view framework for domain generalization. In European Conference on Computer Vision (ECCV’22), Shai Avidan, Gabriel J. Brostow, Moustapha Cissé, Giovanni Maria Farinella, and Tal Hassner (Eds.). 161–177.

Digital Library

[61]

Xingxuan Zhang, Linjun Zhou, Renzhe Xu, Peng Cui, Zheyan Shen, and Haoxin Liu. 2022. Towards unsupervised domain generalization. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’22). 4900–4910.

[62]

Yabin Zhang, Minghan Li, Ruihuang Li, Kui Jia, and Lei Zhang. 2022. Exact feature distribution matching for arbitrary style transfer and domain generalization. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR’22). 8025–8035.

[63]

Yuhong Zhang, Jianqing Wu, Qi Zhang, and Xuegang Hu. 2023. Multi-view feature learning for the over-penalty in adversarial domain adaptation. Data Intell. (2023), 1–16.

[64]

Kaiyang Zhou, Yongxin Yang, Timothy Hospedales, and Tao Xiang. 2020. Deep domain-adversarial image generation for domain generalisation. In AAAI Conference on Artificial Intelligence (AAAI’20). 13025–13032.

[65]

Kaiyang Zhou, Yongxin Yang, Timothy M. Hospedales, and Tao Xiang. 2020. Learning to generate novel domains for domain generalization. In European Conference on Computer Vision (ECCV’20), Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm (Eds.), Vol. 12361. 561–578.

Digital Library

[66]

Kaiyang Zhou, Yongxin Yang, Yu Qiao, and Tao Xiang. 2021. Domain generalization with MixStyle. In International Conference on Learning Representations (ICLR’21).

Index Terms

DoubleAUG: Single-domain Generalized Object Detector in Urban via Color Perturbation and Dual-style Memory
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Object detection

Recommendations

Dual-Scale Single Image Dehazing via Neural Augmentation
Model-based single image dehazing algorithms restore haze-free images with sharp edges and rich details for real-world hazy images at the expense of low PSNR and SSIM values for synthetic hazy images. Data-driven ones restore haze-free images with high ...
Color object detection using spatial-color joint probability functions

Object detection in unconstrained images is an important image understanding problem with many potential applications. There has been little success in creating a single algorithm that can detect arbitrary objects in unconstrained images; instead, ...
A content-based image retrieval scheme with object detection and quantised colour histogram

Content-based image retrieval (CBIR) is an active area of research due to its wide applications. Most of the existing CBIR schemes are concentrated to do the searching of the images based on the texture, colour, or shape features extracted from the query ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 20, Issue 5

May 2024

650 pages

ISSN:1551-6857

EISSN:1551-6865

DOI:10.1145/3613634

Editor:
Abdulmotaleb El Saddik
Mohamed Bin Zayed University of Artificial Intelligence, UAE and University of Ottawa, Canada

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 January 2024

Online AM: 18 December 2023

Accepted: 21 November 2023

Revised: 14 October 2023

Received: 16 March 2023

Published in TOMM Volume 20, Issue 5

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

NSFC Program
Jiangsu Natural Science Foundation Project

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
278
Total Downloads

Downloads (Last 12 months)278
Downloads (Last 6 weeks)22

Reflects downloads up to 27 Jul 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents