Diversifying Spatial-Temporal Perception for Video Domain Generalization

Lin, Kun-Yu; Du, Jia-Run; Gao, Yipeng; Zhou, Jiaming; Zheng, Wei-Shi

Computer Science > Computer Vision and Pattern Recognition

arXiv:2310.17942v1 (cs)

[Submitted on 27 Oct 2023]

Title:Diversifying Spatial-Temporal Perception for Video Domain Generalization

Authors:Kun-Yu Lin, Jia-Run Du, Yipeng Gao, Jiaming Zhou, Wei-Shi Zheng

View PDF

Abstract:Video domain generalization aims to learn generalizable video classification models for unseen target domains by training in a source domain. A critical challenge of video domain generalization is to defend against the heavy reliance on domain-specific cues extracted from the source domain when recognizing target videos. To this end, we propose to perceive diverse spatial-temporal cues in videos, aiming to discover potential domain-invariant cues in addition to domain-specific cues. We contribute a novel model named Spatial-Temporal Diversification Network (STDN), which improves the diversity from both space and time dimensions of video data. First, our STDN proposes to discover various types of spatial cues within individual frames by spatial grouping. Then, our STDN proposes to explicitly model spatial-temporal dependencies between video contents at multiple space-time scales by spatial-temporal relation modeling. Extensive experiments on three benchmarks of different types demonstrate the effectiveness and versatility of our approach.

Comments:	Accepted to NeurIPS 2023. Code is available at this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2310.17942 [cs.CV]
	(or arXiv:2310.17942v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2310.17942

Submission history

From: Yipeng Gao [view email]
[v1] Fri, 27 Oct 2023 07:36:36 UTC (423 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Diversifying Spatial-Temporal Perception for Video Domain Generalization

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Diversifying Spatial-Temporal Perception for Video Domain Generalization

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators