research-article

Overcoming Label Noise for Source-free Unsupervised Video Domain Adaptation

Authors:

Avijit Dasgupta,

Karteek AlahariAuthors Info & Claims

ICVGIP '22: Proceedings of the Thirteenth Indian Conference on Computer Vision, Graphics and Image Processing

Article No.: 20, Pages 1 - 9

https://doi.org/10.1145/3571600.3571621

Published: 12 May 2023 Publication History

Abstract

Despite the progress seen in classification methods, current approaches for handling videos with distribution shifts in source and target domains remain source-dependent as they require access to the source data during the adaptation stage. In this paper, we present a self-training based source-free video domain adaptation approach (without bells and whistles) to address this challenge by bridging the gap between the source and the target domains. We use the source pre-trained model to generate pseudo-labels for the target domain samples, which are inevitably noisy. We treat the problem of source-free video domain adaptation as learning from noisy labels and argue that the samples with correct pseudo-labels can help in the adaptation stage. To this end, we leverage the cross-entropy loss as an indicator of the correctness of pseudo-labels, and use the resulting small-loss samples from the target domain for fine-tuning the model. Extensive experimental evaluations show that our method termed as CleanAdapt achieves gain over the source-only model and outperforms the state-of-the-art approaches on various open datasets.

Supplementary Material

Additional Results (supple.pdf)

Download
3.89 MB

Additional Results (21-supple.pdf)

Download
3.85 MB

References

[1]

Devansh Arpit, Stanislaw Jastrzebski, Nicolas Ballas, David Krueger, Emmanuel Bengio, Maxinder S Kanwal, Tegan Maharaj, Asja Fischer, Aaron Courville, Yoshua Bengio, 2017. A closer look at memorization in deep networks. In ICML.

[2]

Joao Carreira and Andrew Zisserman. 2017. Quo vadis, action recognition? a new model and the kinetics dataset. In CVPR.

[3]

Min-Hung Chen, Zsolt Kira, Ghassan AlRegib, Jaekwon Yoo, Ruxin Chen, and Jian Zheng. 2019. Temporal attentive alignment for large-scale video domain adaptation. In ICCV.

[4]

Jinwoo Choi, Gaurav Sharma, Manmohan Chandraker, and Jia-Bin Huang. 2020. Unsupervised and semi-supervised domain adaptation for action recognition from drones. In WACV.

[5]

Jinwoo Choi, Gaurav Sharma, Samuel Schulter, and Jia-Bin Huang. 2020. Shuffle and attend: Video domain adaptation. In ECCV.

[6]

Victor G Turrisi da Costa, Giacomo Zara, Paolo Rota, Thiago Oliveira-Santos, Nicu Sebe, Vittorio Murino, and Elisa Ricci. 2022. Dual-Head Contrastive Domain Adaptation for Video Action Recognition. In WACV.

[7]

Dima Damen, Hazel Doughty, Giovanni Maria Farinella, Sanja Fidler, Antonino Furnari, Evangelos Kazakos, Davide Moltisanti, Jonathan Munro, Toby Perrett, Will Price, and Michael Wray. 2018. Scaling Egocentric Vision: The EPIC-KITCHENS Dataset. In ECCV.

[8]

Christoph Feichtenhofer, Haoqi Fan, Jitendra Malik, and Kaiming He. 2019. SlowFast Networks for Video Recognition. In ICCV.

[9]

Christoph Feichtenhofer, Axel Pinz, and Andrew Zisserman. 2016. Convolutional two-stream network fusion for video action recognition. In CVPR.

[10]

Lei Feng, Senlin Shu, Zhuoyi Lin, Fengmao Lv, Li Li, and Bo An. 2021. Can cross entropy loss be robust to label noise?. In IJCAI.

[11]

Yaroslav Ganin and Victor Lempitsky. 2015. Unsupervised domain adaptation by backpropagation. In ICML.

[12]

Bo Han, Quanming Yao, Xingrui Yu, Gang Niu, Miao Xu, Weihua Hu, Ivor Tsang, and Masashi Sugiyama. 2018. Co-teaching: Robust training of deep neural networks with extremely noisy labels. NeurIPS (2018).

[13]

Jiaxing Huang, Dayan Guan, Aoran Xiao, and Shijian Lu. 2021. Model adaptation: Historical contrastive learning for unsupervised domain adaptation without source data. NeurIPS (2021).

[14]

Arshad Jamal, Vinay P Namboodiri, Dipti Deodhare, and KS Venkatesh. 2018. Deep Domain Adaptation in Action Space. In BMVC.

[15]

Will Kay, Joao Carreira, Karen Simonyan, Brian Zhang, Chloe Hillier, Sudheendra Vijayanarasimhan, Fabio Viola, Tim Green, Trevor Back, Paul Natsev, Mustafa Suleyman, and Andrew Zisserman. 2017. The kinetics human action video dataset. arXiv preprint arXiv:1705.06950(2017).

[16]

Donghyun Kim, Yi-Hsuan Tsai, Bingbing Zhuang, Xiang Yu, Stan Sclaroff, Kate Saenko, and Manmohan Chandraker. 2021. Learning Cross-modal Contrastive Features for Video Domain Adaptation. ICCV (2021).

[17]

Youngeun Kim, Donghyeon Cho, Kyeongtak Han, Priyadarshini Panda, and Sungeun Hong. 2021. Domain adaptation without source data. IEEE Transactions on Artificial Intelligence (2021).

[18]

Hildegard Kuehne, Hueihan Jhuang, Estíbaliz Garrote, Tomaso Poggio, and Thomas Serre. 2011. HMDB: a large video database for human motion recognition. In ICCV.

[19]

Jogendra Nath Kundu, Naveen Venkat, R Venkatesh Babu, 2020. Universal source-free domain adaptation. In CVPR.

[20]

Rui Li, Qianfen Jiao, Wenming Cao, Hau-San Wong, and Si Wu. 2020. Model adaptation: Unsupervised domain adaptation without source data. In CVPR.

[21]

Yanghao Li, Naiyan Wang, Jianping Shi, Xiaodi Hou, and Jiaying Liu. 2018. Adaptive batch normalization for practical domain adaptation. Pattern Recognition (2018).

[22]

Yunsheng Li, Lu Yuan, and Nuno Vasconcelos. 2019. Bidirectional learning for domain adaptation of semantic segmentation. In CVPR.

[23]

J. Liang 2020. Do we really need to access the source data? source hypothesis transfer for unsupervised domain adaptation. In ICML.

[24]

Mingsheng Long, Yue Cao, Jianmin Wang, and Michael Jordan. 2015. Learning transferable features with deep adaptation networks. In ICML.

[25]

Xingjun Ma, Hanxun Huang, Yisen Wang, Simone Romano, Sarah Erfani, and James Bailey. 2020. Normalized loss functions for deep learning with noisy labels. In ICML.

[26]

Pietro Morerio, Riccardo Volpi, Ruggero Ragonesi, and Vittorio Murino. 2020. Generative pseudo-label refinement for unsupervised domain adaptation. In WACV.

[27]

Jonathan Munro and Dima Damen. 2020. Multi-modal domain adaptation for fine-grained action recognition. In CVPR.

[28]

Daniel Neimark, Omri Bar, Maya Zohar, and Dotan Asselmann. 2021. Video transformer network. In ICCV.

[29]

Boxiao Pan, Zhangjie Cao, Ehsan Adeli, and Juan Carlos Niebles. 2020. Adversarial cross-domain action recognition with co-attention. In AAAI.

[30]

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, 2019. Pytorch: An imperative style, high-performance deep learning library. NeurIPS (2019).

[31]

Zhen Qiu, Yifan Zhang, Hongbin Lin, Shuaicheng Niu, Yanxia Liu, Qing Du, and Mingkui Tan. 2021. Source-free domain adaptation via avatar prototype generation and adaptation. IJCAI (2021).

[32]

Aadarsh Sahoo, Rutav Shah, Rameswar Panda, Kate Saenko, and Abir Das. 2021. Contrast and Mix: Temporal Contrastive Video Domain Adaptation with Background Mixing. NeurIPS (2021).

[33]

Kuniaki Saito, Kohei Watanabe, Yoshitaka Ushiku, and Tatsuya Harada. 2018. Maximum classifier discrepancy for unsupervised domain adaptation. In CVPR.

[34]

Karen Simonyan and Andrew Zisserman. 2014. Two-stream convolutional networks for action recognition in videos. NeurIPS (2014).

[35]

Xiaolin Song, Sicheng Zhao, Jingyu Yang, Huanjing Yue, Pengfei Xu, Runbo Hu, and Hua Chai. 2021. Spatio-temporal Contrastive Domain Adaptation for Action Recognition. In CVPR.

[36]

Khurram Soomro, Amir Roshan Zamir, and Mubarak Shah. 2012. UCF101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402(2012).

[37]

Antonio Torralba and Alexei A Efros. 2011. Unbiased look at dataset bias. In CVPR.

[38]

Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. 2015. Learning spatiotemporal features with 3d convolutional networks. In ICCV.

[39]

Haoran Wang, Tong Shen, Wei Zhang, Ling-Yu Duan, and Tao Mei. 2020. Classes matter: A fine-grained adversarial approach to cross-domain semantic segmentation. In ECCV.

[40]

Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, and Luc Van Gool. 2016. Temporal segment networks: Towards good practices for deep action recognition. In ECCV.

[41]

Xiaolong Wang, Ross Girshick, Abhinav Gupta, and Kaiming He. 2018. Non-local neural networks. In CVPR.

[42]

Yisen Wang, Xingjun Ma, Zaiyi Chen, Yuan Luo, Jinfeng Yi, and James Bailey. 2019. Symmetric cross entropy for robust learning with noisy labels. In ICCV.

[43]

Hongxin Wei, Lei Feng, Xiangyu Chen, and Bo An. 2020. Combating noisy labels by agreement: A joint training method with co-regularization. In CVPR.

[44]

Chao-Yuan Wu, Christoph Feichtenhofer, Haoqi Fan, Kaiming He, Philipp Krahenbuhl, and Ross Girshick. 2019. Long-Term Feature Banks for Detailed Video Understanding. In CVPR.

[45]

Jihan Yang, Shaoshuai Shi, Zhe Wang, Hongsheng Li, and Xiaojuan Qi. 2021. ST3D: Self-training for unsupervised domain adaptation on 3d object detection. In CVPR.

[46]

S. Yang 2021. Exploiting the Intrinsic Neighborhood Structure for Source-free Domain Adaptation. NeurIPS (2021).

[47]

Shiqi Yang, Yaxing Wang, Joost van de Weijer, Luis Herranz, and Shangling Jui. 2020. Unsupervised domain adaptation without source data by casting a bait. arXiv preprint arXiv:2010.12427(2020).

[48]

Xingrui Yu, Bo Han, Jiangchao Yao, Gang Niu, Ivor Tsang, and Masashi Sugiyama. 2019. How does disagreement help generalization against label corruption?. In ICML.

[49]

Christopher Zach, Thomas Pock, and Horst Bischof. 2007. A duality based approach for realtime tv-l 1 optical flow. In Joint pattern recognition symposium.

[50]

Pan Zhang, Bo Zhang, Ting Zhang, Dong Chen, Yong Wang, and Fang Wen. 2021. Prototypical pseudo label denoising and target structure learning for domain adaptive semantic segmentation. In CVPR.

[51]

Bolei Zhou, Alex Andonian, Aude Oliva, and Antonio Torralba. 2018. Temporal Relational Reasoning in Videos. ECCV (2018).

[52]

Xiaojin Jerry Zhu. 2005. Semi-supervised learning literature survey. Technical Report, University of Wisconsin-Madison, Department of Computer Sciences (2005).

Cited By

Index Terms

Overcoming Label Noise for Source-free Unsupervised Video Domain Adaptation
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision representations
  2. Machine learning
    1. Learning paradigms
      1. Unsupervised learning
    2. Machine learning approaches
      1. Neural networks

Recommendations

Reducing bias to source samples for unsupervised domain adaptation
Abstract
Unsupervised Domain Adaptation (UDA) makes predictions for the target domain data while labels are only available in the source domain. Lots of works in UDA focus on finding a common representation of the two domains via domain alignment, ...
Highlights
- A novel method named RBDA is proposed for domain adaptation.
- RBDA focuses on reducing the classifier’s bias to source samples.
- Comprehensive experiments demonstrate the effectiveness of RBDA.
A survey of multi-source domain adaptation

Theoretical developments on multi-source domain adaptation are reviewed.Well developed algorithms on multi-source domain adaptation are reviewed and categorized.Performance measurements and benchmark data for multi-source domain adaptation are ...
Source Free Graph Unsupervised Domain Adaptation
WSDM '24: Proceedings of the 17th ACM International Conference on Web Search and Data Mining

Graph Neural Networks (GNNs) have achieved great success on a variety of tasks with graph-structural data, among which node classification is an essential one. Unsupervised Graph Domain Adaptation (UGDA) shows its practical value of reducing the labeling ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICVGIP '22: Proceedings of the Thirteenth Indian Conference on Computer Vision, Graphics and Image Processing

December 2022

506 pages

ISBN:9781450398220

DOI:10.1145/3571600

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 May 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Google India PhD Fellowship

Conference

ICVGIP'22

ICVGIP'22: Thirteenth Indian Conference on Computer Vision, Graphics and Image Processing

December 8 - 10, 2022

Gandhinagar, India

Acceptance Rates

Overall Acceptance Rate 95 of 286 submissions, 33%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
72
Total Downloads

Downloads (Last 12 months)44
Downloads (Last 6 weeks)6

Reflects downloads up to 16 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents