SwinTrack: A Simple and Strong Baseline for Transformer Tracking

Lin, Liting; Fan, Heng; Zhang, Zhipeng; Xu, Yong; Ling, Haibin

Computer Science > Computer Vision and Pattern Recognition

arXiv:2112.00995 (cs)

[Submitted on 2 Dec 2021 (v1), last revised 13 Oct 2022 (this version, v3)]

Title:SwinTrack: A Simple and Strong Baseline for Transformer Tracking

Authors:Liting Lin, Heng Fan, Zhipeng Zhang, Yong Xu, Haibin Ling

View PDF

Abstract:Recently Transformer has been largely explored in tracking and shown state-of-the-art (SOTA) performance. However, existing efforts mainly focus on fusing and enhancing features generated by convolutional neural networks (CNNs). The potential of Transformer in representation learning remains under-explored. In this paper, we aim to further unleash the power of Transformer by proposing a simple yet efficient fully-attentional tracker, dubbed SwinTrack, within classic Siamese framework. In particular, both representation learning and feature fusion in SwinTrack leverage the Transformer architecture, enabling better feature interactions for tracking than pure CNN or hybrid CNN-Transformer frameworks. Besides, to further enhance robustness, we present a novel motion token that embeds historical target trajectory to improve tracking by providing temporal context. Our motion token is lightweight with negligible computation but brings clear gains. In our thorough experiments, SwinTrack exceeds existing approaches on multiple benchmarks. Particularly, on the challenging LaSOT, SwinTrack sets a new record with 0.713 SUC score. It also achieves SOTA results on other benchmarks. We expect SwinTrack to serve as a solid baseline for Transformer tracking and facilitate future research. Our codes and results are released at this https URL.

Comments:	22 pages, 10 figures
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2112.00995 [cs.CV]
	(or arXiv:2112.00995v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2112.00995
Journal reference:	Advances in Neural Information Processing Systems, 2022

Submission history

From: Liting Lin [view email]
[v1] Thu, 2 Dec 2021 05:56:03 UTC (344 KB)
[v2] Wed, 8 Dec 2021 15:22:38 UTC (158 KB)
[v3] Thu, 13 Oct 2022 11:31:03 UTC (970 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:SwinTrack: A Simple and Strong Baseline for Transformer Tracking

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:SwinTrack: A Simple and Strong Baseline for Transformer Tracking

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators