research-article

Spatial feature embedding for robust visual object tracking

Authors:

Kang Liu,

Long Liu,

Shangqi Yang,

Zhihao FuAuthors Info & Claims

IET Computer Vision, Volume 18, Issue 4

Pages 540 - 556

https://doi.org/10.1049/cvi2.12263

Published: 20 December 2023 Publication History

Abstract

Recently, the offline‐trained Siamese pipeline has drawn wide attention due to its outstanding tracking performance. However, the existing Siamese trackers utilise offline training to extract ‘universal’ features, which is insufficient to effectively distinguish between the target and fluctuating interference in embedding the information of the two branches, leading to inaccurate classification and localisation. In addition, the Siamese trackers employ a pre‐defined scale for cropping the search candidate region based on the previous frame's result, which might easily introduce redundant background noise (clutter, similar objects etc.), affecting the tracker's robustness. To solve these problems, the authors propose two novel sub‐network spatial employed to spatial feature embedding for robust object tracking. Specifically, the proposed spatial remapping (SRM) network enhances the feature discrepancy between target and distractor categories by online remapping, and improves the discriminant ability of the tracker on the embedding space. The MAML is used to optimise the SRM network to ensure its adaptability to complex tracking scenarios. Moreover, a temporal information proposal‐guided (TPG) network that utilises a GRU model to dynamically predict the search scale based on temporal motion states to reduce potential background interference is introduced. The proposed two network is integrated into two popular trackers, namely SiamFC++ and TransT, which achieve superior performance on six challenging benchmarks, including OTB100, VOT2019, UAV123, GOT10K, TrackingNet and LaSOT, TrackingNet and LaSOT denoting them as SiamSRMC and SiamSRMT, respectively. Moreover, the proposed trackers obtain competitive tracking performance compared with the state‐of‐the‐art trackers in the attribute of background clutter and similar object, validating the effectiveness of our method.

Graphical Abstract

The anchor‐free Siamese tracking method is prone to ‘target‐like’ classification responses in areas with backgrounds clutter and similar distractors, affecting tracking accuracy. The authors propose a spatial remapping network that provide more discriminative metric features for accurate classification and localisation of similar objects, enhancing the tracker’s ability to handle distractor regions.

References

[1]

Shen, J., et al.: Visual object tracking by hierarchical attention siamese network. IEEE Trans. Cybern. 50(7), 3068–3080 (2019). https://doi.org/10.1109/tcyb.2019.2936503

Abstract

Graphical Abstract

References

Recommendations

Visual object tracking: A survey

Robust object tracking via multi-cue fusion

Fusing target information from multiple views for robust visual tracking

Comments

Information

Published In

Publisher

Publication History

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

View options

Get Access

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations