1st Place Solution for the 5th LSVOS Challenge: Video Instance Segmentation

Zhang, Tao; Tian, Xingye; Zhou, Yikang; Wu, Yu; Ji, Shunping; Yan, Cilin; Wang, Xuebo; Tao, Xin; Zhang, Yuan; Wan, Pengfei

Computer Science > Computer Vision and Pattern Recognition

arXiv:2308.14392 (cs)

[Submitted on 28 Aug 2023]

Title:1st Place Solution for the 5th LSVOS Challenge: Video Instance Segmentation

Authors:Tao Zhang, Xingye Tian, Yikang Zhou, Yu Wu, Shunping Ji, Cilin Yan, Xuebo Wang, Xin Tao, Yuan Zhang, Pengfei Wan

View PDF

Abstract:Video instance segmentation is a challenging task that serves as the cornerstone of numerous downstream applications, including video editing and autonomous driving. In this report, we present further improvements to the SOTA VIS method, DVIS. First, we introduce a denoising training strategy for the trainable tracker, allowing it to achieve more stable and accurate object tracking in complex and long videos. Additionally, we explore the role of visual foundation models in video instance segmentation. By utilizing a frozen VIT-L model pre-trained by DINO v2, DVIS demonstrates remarkable performance improvements. With these enhancements, our method achieves 57.9 AP and 56.0 AP in the development and test phases, respectively, and ultimately ranked 1st in the VIS track of the 5th LSVOS Challenge. The code will be available at this https URL.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2308.14392 [cs.CV]
	(or arXiv:2308.14392v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2308.14392

Submission history

From: Tao Zhang [view email]
[v1] Mon, 28 Aug 2023 08:15:43 UTC (437 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:1st Place Solution for the 5th LSVOS Challenge: Video Instance Segmentation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:1st Place Solution for the 5th LSVOS Challenge: Video Instance Segmentation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators