research-article

SiamCCF: : Siamese visual tracking via cross‐layer calibration fusion

Authors:

Da‐Han WangAuthors Info & Claims

IET Computer Vision, Volume 17, Issue 8

Pages 869 - 882

https://doi.org/10.1049/cvi2.12201

Published: 04 May 2023 Publication History

Abstract

Siamese networks have attracted wide attention in visual tracking due to their competitive accuracy and speed. However, the existing Siamese trackers usually leverage a fixed linear aggregation of feature maps, which does not effectively fuse the different layers of features with attention. Besides, most of Siamese trackers calculate the similarity between the template and the search region through a cross‐correlation operation between the features of the last blocks from the two branches, which might introduce the redundant noise information. In order to solve these problems, this study proposes a novel Siamese visual tracking method via cross‐layer calibration fusion, termed SiamCCF. An attention‐based feature fusion module is employed using local attention and non‐local attention to fuse the features from the deep and shallow layers, so as to capture both local details and high‐level semantic information. Moreover, a cross‐layer calibration module can use the fused features to calibrate the features of the last network blocks and build the cross‐layer long‐range spatial and inter‐channel dependencies around each spatial location. Extensive experiments demonstrate that the proposed method has achieved competitive tracking performance compared with state‐of‐the‐art trackers on challenging benchmarks, including OTB100, OTB2013, UAV123, UAV20L, and LaSOT.

Graphical Abstract

This study proposes a novel Siamese visual tracking method via cross‐layer calibration fusion, termed SiamCCF. We first employ an attention‐based feature fusion module (FFM) by using local attention and non‐local attention to fuse the features from the deep and shallow layers, so as to capture both local details and high‐level semantic information. Moreover, a cross‐layer calibration module (CCM) can use the fused features to calibrate the features of the last network blocks, and build the long‐range spatial and inter‐channel dependencies around each spatial location between the different layers.

References

[1]

Zhu, Z., et al.: Distractor‐aware siamese networks for visual object tracking. In: Proceedings of the European Conference on Computer Vision, pp. 101–117 (2018)

[2]

Bertinetto, L., et al.: Fully‐convolutional siamese networks for object tracking. In: Proceedings of the European Conference on Computer Vision, pp. 850–8653 (2016)

[3]

Li, S., et al.: Tracking every thing in the wild. In: Proceedings of the European Conference on Computer Vision, pp. 498–515 (2022)

[4]

Li, B., et al.: Evolution of siamese visual tracking with very deep networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16–20 (2019)

[5]

Tao, R., Gavves, E., Smeulders, A.W.: Siamese instance search for tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1420–1429 (2016)

[6]

Li, B., et al.: High performance visual tracking with siamese region proposal network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8971–8980 (2018)

[7]

Fan, H., Ling, H.: Siamese cascaded region proposal networks for real‐time visual tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7952–7961 (2019)

[8]

Wang, G., et al.: Spm‐tracker: series‐parallel matching for real‐time visual object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3643–3652 (2019)

[9]

Guo, D., et al.: Siamcar: siamese fully convolutional classification and regression for visual tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6269–6277 (2020)

[10]

He, A., et al.: A twofold siamese network for real‐time object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4834–4843 (2018)

[11]

Voigtlaender, P., et al.: Siam R‐CNN: visual tracking by re‐detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6578–6588 (2020)

[12]

Chen, X., et al.: Transformer tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8126–8135 (2021)

[13]

Guo, D., et al.: Graph attention tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9543–9552 (2021)

[14]

Shen, Q., et al.: ULAST: unsupervised learning for anchor‐free Siamese tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8101–8110 (2022)

[15]

Sun, X., et al.: Two‐stage aware attentional siamese network for visual tracking. Pattern Recogn. 124, 108502 (2022). https://doi.org/10.1016/j.patcog.2021.108502

[16]

Christian, S.: Inception‐resnet and the impact of residual connections on learning. In: Proceedings of the AAAI Conference on Artificial Intelligence (2019)

[17]

Szegedy, C., et al.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)

[18]

Yang, J., et al.: Feature fusion: parallel strategy vs. serial strategy. Pattern Recogn. 36(6), 1369–1381 (2003). https://doi.org/10.1016/s0031-3203(02)00262-5

[19]

Yu, R., et al.: CFFNN: cross feature fusion neural network for collaborative filtering. IEEE Trans. Knowl. Data Eng. 34(10), 4650–4662 (2022). https://doi.org/10.1109/tkde.2020.3048788

Digital Library

[20]

Ronneberger, O., Fischer, P., Brox, T.: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer‐Assisted Intervention, pp. 234–241 (2015)

[21]

He, X., et al.: Co‐attention fusion network for multimodal skin cancer diagnosis. Pattern Recogn. 133, 108990 (2023). https://doi.org/10.1016/j.patcog.2022.108990

[22]

Dai, Y., et al.: Attentional feature fusion. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 3560–3569 (2021)

[23]

Simonyan, K., Zisserman, A.: Very deep convolutional networks for large scale image recognition. ArXiv Preprint ArXiv:14091556 (2014)

[24]

He, K., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

[25]

He, K., et al.: Identity mappings in deep residual networks. In: Proceedings of the European Conference on Computer Vision, pp. 630–645 (2016)

[26]

Zhang, H., et al.: Resnest: split‐attention networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2736–2746 (2022)

[27]

Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)

[28]

Szegedy, C., et al.: Inception‐v4, inception‐resnet and the impact of residual connections on learning. In: Proceedings of the AAAI Conference on Artificial Intelligence (2017)

[29]

Shi, C., et al.: Remote sensing scene image classification based on selfcompensating convolution neural network. Rem. Sens. 14(3), 545 (2022). https://doi.org/10.3390/rs14030545

[30]

Misra, D., et al.: Rotate to attend: convolutional triplet attention module. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 3139–3148 (2021)

[31]

Yu, F., et al.: Deep layer aggregation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2403–2412 (2018)

[32]

Liu, J.J., et al.: Improving convolutional networks with self‐calibrated convolutions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10096–10105 (2020)

[33]

Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017). https://doi.org/10.1145/3065386

Digital Library

[34]

Lin, T.Y., et al.: Microsoft coco: common objects in context. In: Proceedings of the European Conference on Computer Vision, pp. 740–755 (2014)

[35]

Wu, Y., Lim, J., Yang, M.H.: Online object tracking: a benchmark. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2411–2418 (2013)

Digital Library

[36]

Sosnovik, I., Moskalev, A., Smeulders, A.W.: Scale equivariance improves siamese tracking. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 2765–2774 (2021)

[37]

Xu, Y., et al.: Siamfc++: towards robust and accurate visual tracking with target estimation guidelines. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 12549–12556 (2020)

[38]

Zhang, Z., Peng, H.: Deeper and wider siamese networks for real‐time visual tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4591–4600 (2019)

[39]

Huang, Z., et al.: Learning aberrance repressed correlation filters for real‐time UAV tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2891–2900 (2019)

[40]

Dai, K., et al.: Visual tracking via adaptive spatially‐regularized correlation filters. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4670–4679 (2019)

[41]

Feng, W., et al.: Dynamic saliency‐aware regularization for correlation filter based object tracking. IEEE Trans. Image Process. 28(7), 3232–3245 (2019). https://doi.org/10.1109/tip.2019.2895411

[42]

Wang, N., et al.: Multi‐cue correlation filters for robust visual tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4844–4853 (2018)

[43]

Nam, H., Han, B.: Learning multi‐domain convolutional neural networks for visual tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4293–4302 (2016)

[44]

Li, P., et al.: Gradnet: Gradient‐guided network for visual object tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6162–6171 (2019)

[45]

Wang, N., et al.: Unsupervised deep tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1308–1317 (2019)

[46]

Bertinetto, L., et al.: Staple: complementary learners for real‐time tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1401–1409 (2016)

[47]

Chen, Z., et al.: Siamese box adaptive network for visual tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6668–6677 (2020)

Digital Library

[48]

Wang, Q., et al.: Fast online object tracking and segmentation: a unifying approach. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1328–1338 (2019)

[49]

Danelljan, M., et al.: Convolutional features for correlation filter based visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 58–66 (2015)

[50]

Danelljan, M., et al.: Learning spatially regularized correlation filters for visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4310–4318 (2015)

Digital Library

[51]

Mueller, M., Smith, N., Ghanem, B.: A benchmark and simulator for UAV tracking. In: Proceedings of the European Conference on Computer Vision, pp. 445–461 (2016)

[52]

Yang, T., Chan, A.B.: Learning dynamic memory networks for object tracking. In: Proceedings of the European Conference on Computer Vision, pp. 152–167 (2018)

[53]

Valmadre, J., et al.: End‐to‐end representation learning for correlation filter based tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2805–2813 (2017)

[54]

Cao, Z., et al.: Hift: hierarchical feature transformer for aerial tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 15457–15466 (2021)

[55]

Yun, S., et al.: Action‐decision networks for visual tracking with deep reinforcement learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2711–2720 (2017)

[56]

Song, Y., et al.: Vital: visual tracking via adversarial learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8990–8999 (2018)

[57]

Lu, X., et al.: Multi‐template Temporal Information Fusion for Siamese Object Tracking. IET Computer Vision (2022)

[58]

Abdelpakey, M.H., Shehata, M.S.: Dp‐siam: dynamic policy siamese network for robust object tracking. IEEE/CVF Transactions on Image Processing 29, 1479–1492 (2019). https://doi.org/10.1109/tip.2019.2942506

[59]

Fan, H., et al.: Lasot: a high‐quality benchmark for large‐scale single object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5374–5383 (2019)

[60]

Guo, Q., et al.: Learning dynamic siamese network for visual object tracking. In: Proceedings of the IEEE Conference on Computer Vision, pp. 1763–1771 (2017)

[61]

Yang, T., et al.: ROAM: recurrently optimizing tracking model. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6718–6727 (2020)

[62]

Fan, H., Liu, H.: Parallel tracking and verifying: a framework for real‐time and high accuracy visual tracking. In: Proceedings of the IEEE Conference on Computer Vision, pp. 5486–5494 (2017)

Index Terms

SiamCCF: Siamese visual tracking via cross‐layer calibration fusion
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Tracking
      2. Computer vision tasks
        Scene understanding
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks

Index terms have been assigned to the content through auto-classification.

Recommendations

Discriminative descriptors for object tracking

We propose a robust tracker based on Discriminative color Descriptors.Our tracker incorporates scale estimation to increase tracking performance.Empirical results show our tracker performs better than state-of-the-arts methods. Object tracking is one of ...
Visual object tracking: A survey
Abstract
Visual object tracking is an important area in computer vision, and many tracking algorithms have been proposed with promising results. Existing object tracking approaches can be categorized into generative trackers, discriminative trackers, and ...
Graphical abstract

Display Omitted
Highlights
- Comprehensive overview of state-of-the-art tracking frameworks and datasets.
- Detailed evaluation conducted on five tracking benchmarks with quantitative and qualitative results.
- Comprehensive summary of trackers with different ...
Adaptive fusion of particle filtering and spatio-temporal motion energy for human tracking

Object tracking is an active research area nowadays due to its importance in human computer interface, teleconferencing and video surveillance. However, reliable tracking of objects in the presence of occlusions, pose and illumination changes is still a ...

Comments

Information & Contributors

Information

Published In

cover image IET Computer Vision

IET Computer Vision Volume 17, Issue 8

December 2023

179 pages

EISSN:1751-9640

DOI:10.1049/cvi2.v17.8

Issue’s Table of Contents

© 2023 The Authors. IET Computer Vision published by John Wiley & Sons Ltd on behalf of The Institution of Engineering and Technology.

This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.

Publisher

John Wiley & Sons, Inc.

United States

Publication History

Published: 04 May 2023

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 26 Sep 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents