Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

SiamCCF: : Siamese visual tracking via cross‐layer calibration fusion

Published: 04 May 2023 Publication History

Abstract

Siamese networks have attracted wide attention in visual tracking due to their competitive accuracy and speed. However, the existing Siamese trackers usually leverage a fixed linear aggregation of feature maps, which does not effectively fuse the different layers of features with attention. Besides, most of Siamese trackers calculate the similarity between the template and the search region through a cross‐correlation operation between the features of the last blocks from the two branches, which might introduce the redundant noise information. In order to solve these problems, this study proposes a novel Siamese visual tracking method via cross‐layer calibration fusion, termed SiamCCF. An attention‐based feature fusion module is employed using local attention and non‐local attention to fuse the features from the deep and shallow layers, so as to capture both local details and high‐level semantic information. Moreover, a cross‐layer calibration module can use the fused features to calibrate the features of the last network blocks and build the cross‐layer long‐range spatial and inter‐channel dependencies around each spatial location. Extensive experiments demonstrate that the proposed method has achieved competitive tracking performance compared with state‐of‐the‐art trackers on challenging benchmarks, including OTB100, OTB2013, UAV123, UAV20L, and LaSOT.

Graphical Abstract

This study proposes a novel Siamese visual tracking method via cross‐layer calibration fusion, termed SiamCCF. We first employ an attention‐based feature fusion module (FFM) by using local attention and non‐local attention to fuse the features from the deep and shallow layers, so as to capture both local details and high‐level semantic information. Moreover, a cross‐layer calibration module (CCM) can use the fused features to calibrate the features of the last network blocks, and build the long‐range spatial and inter‐channel dependencies around each spatial location between the different layers.

References

[1]
Zhu, Z., et al.: Distractor‐aware siamese networks for visual object tracking. In: Proceedings of the European Conference on Computer Vision, pp. 101–117 (2018)
[2]
Bertinetto, L., et al.: Fully‐convolutional siamese networks for object tracking. In: Proceedings of the European Conference on Computer Vision, pp. 850–8653 (2016)
[3]
Li, S., et al.: Tracking every thing in the wild. In: Proceedings of the European Conference on Computer Vision, pp. 498–515 (2022)
[4]
Li, B., et al.: Evolution of siamese visual tracking with very deep networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16–20 (2019)
[5]
Tao, R., Gavves, E., Smeulders, A.W.: Siamese instance search for tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1420–1429 (2016)
[6]
Li, B., et al.: High performance visual tracking with siamese region proposal network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8971–8980 (2018)
[7]
Fan, H., Ling, H.: Siamese cascaded region proposal networks for real‐time visual tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7952–7961 (2019)
[8]
Wang, G., et al.: Spm‐tracker: series‐parallel matching for real‐time visual object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3643–3652 (2019)
[9]
Guo, D., et al.: Siamcar: siamese fully convolutional classification and regression for visual tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6269–6277 (2020)
[10]
He, A., et al.: A twofold siamese network for real‐time object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4834–4843 (2018)
[11]
Voigtlaender, P., et al.: Siam R‐CNN: visual tracking by re‐detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6578–6588 (2020)
[12]
Chen, X., et al.: Transformer tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8126–8135 (2021)
[13]
Guo, D., et al.: Graph attention tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9543–9552 (2021)
[14]
Shen, Q., et al.: ULAST: unsupervised learning for anchor‐free Siamese tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8101–8110 (2022)
[15]
Sun, X., et al.: Two‐stage aware attentional siamese network for visual tracking. Pattern Recogn. 124, 108502 (2022). https://doi.org/10.1016/j.patcog.2021.108502
[16]
Christian, S.: Inception‐resnet and the impact of residual connections on learning. In: Proceedings of the AAAI Conference on Artificial Intelligence (2019)
[17]
Szegedy, C., et al.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)
[18]
Yang, J., et al.: Feature fusion: parallel strategy vs. serial strategy. Pattern Recogn. 36(6), 1369–1381 (2003). https://doi.org/10.1016/s0031-3203(02)00262-5
[19]
Yu, R., et al.: CFFNN: cross feature fusion neural network for collaborative filtering. IEEE Trans. Knowl. Data Eng. 34(10), 4650–4662 (2022). https://doi.org/10.1109/tkde.2020.3048788
[20]
Ronneberger, O., Fischer, P., Brox, T.: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer‐Assisted Intervention, pp. 234–241 (2015)
[21]
He, X., et al.: Co‐attention fusion network for multimodal skin cancer diagnosis. Pattern Recogn. 133, 108990 (2023). https://doi.org/10.1016/j.patcog.2022.108990
[22]
Dai, Y., et al.: Attentional feature fusion. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 3560–3569 (2021)
[23]
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large scale image recognition. ArXiv Preprint ArXiv:14091556 (2014)
[24]
He, K., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
[25]
He, K., et al.: Identity mappings in deep residual networks. In: Proceedings of the European Conference on Computer Vision, pp. 630–645 (2016)
[26]
Zhang, H., et al.: Resnest: split‐attention networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2736–2746 (2022)
[27]
Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
[28]
Szegedy, C., et al.: Inception‐v4, inception‐resnet and the impact of residual connections on learning. In: Proceedings of the AAAI Conference on Artificial Intelligence (2017)
[29]
Shi, C., et al.: Remote sensing scene image classification based on selfcompensating convolution neural network. Rem. Sens. 14(3), 545 (2022). https://doi.org/10.3390/rs14030545
[30]
Misra, D., et al.: Rotate to attend: convolutional triplet attention module. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 3139–3148 (2021)
[31]
Yu, F., et al.: Deep layer aggregation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2403–2412 (2018)
[32]
Liu, J.J., et al.: Improving convolutional networks with self‐calibrated convolutions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10096–10105 (2020)
[33]
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017). https://doi.org/10.1145/3065386
[34]
Lin, T.Y., et al.: Microsoft coco: common objects in context. In: Proceedings of the European Conference on Computer Vision, pp. 740–755 (2014)
[35]
Wu, Y., Lim, J., Yang, M.H.: Online object tracking: a benchmark. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2411–2418 (2013)
[36]
Sosnovik, I., Moskalev, A., Smeulders, A.W.: Scale equivariance improves siamese tracking. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 2765–2774 (2021)
[37]
Xu, Y., et al.: Siamfc++: towards robust and accurate visual tracking with target estimation guidelines. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 12549–12556 (2020)
[38]
Zhang, Z., Peng, H.: Deeper and wider siamese networks for real‐time visual tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4591–4600 (2019)
[39]
Huang, Z., et al.: Learning aberrance repressed correlation filters for real‐time UAV tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2891–2900 (2019)
[40]
Dai, K., et al.: Visual tracking via adaptive spatially‐regularized correlation filters. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4670–4679 (2019)
[41]
Feng, W., et al.: Dynamic saliency‐aware regularization for correlation filter based object tracking. IEEE Trans. Image Process. 28(7), 3232–3245 (2019). https://doi.org/10.1109/tip.2019.2895411
[42]
Wang, N., et al.: Multi‐cue correlation filters for robust visual tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4844–4853 (2018)
[43]
Nam, H., Han, B.: Learning multi‐domain convolutional neural networks for visual tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4293–4302 (2016)
[44]
Li, P., et al.: Gradnet: Gradient‐guided network for visual object tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6162–6171 (2019)
[45]
Wang, N., et al.: Unsupervised deep tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1308–1317 (2019)
[46]
Bertinetto, L., et al.: Staple: complementary learners for real‐time tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1401–1409 (2016)
[47]
Chen, Z., et al.: Siamese box adaptive network for visual tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6668–6677 (2020)
[48]
Wang, Q., et al.: Fast online object tracking and segmentation: a unifying approach. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1328–1338 (2019)
[49]
Danelljan, M., et al.: Convolutional features for correlation filter based visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 58–66 (2015)
[50]
Danelljan, M., et al.: Learning spatially regularized correlation filters for visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4310–4318 (2015)
[51]
Mueller, M., Smith, N., Ghanem, B.: A benchmark and simulator for UAV tracking. In: Proceedings of the European Conference on Computer Vision, pp. 445–461 (2016)
[52]
Yang, T., Chan, A.B.: Learning dynamic memory networks for object tracking. In: Proceedings of the European Conference on Computer Vision, pp. 152–167 (2018)
[53]
Valmadre, J., et al.: End‐to‐end representation learning for correlation filter based tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2805–2813 (2017)
[54]
Cao, Z., et al.: Hift: hierarchical feature transformer for aerial tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 15457–15466 (2021)
[55]
Yun, S., et al.: Action‐decision networks for visual tracking with deep reinforcement learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2711–2720 (2017)
[56]
Song, Y., et al.: Vital: visual tracking via adversarial learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8990–8999 (2018)
[57]
Lu, X., et al.: Multi‐template Temporal Information Fusion for Siamese Object Tracking. IET Computer Vision (2022)
[58]
Abdelpakey, M.H., Shehata, M.S.: Dp‐siam: dynamic policy siamese network for robust object tracking. IEEE/CVF Transactions on Image Processing 29, 1479–1492 (2019). https://doi.org/10.1109/tip.2019.2942506
[59]
Fan, H., et al.: Lasot: a high‐quality benchmark for large‐scale single object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5374–5383 (2019)
[60]
Guo, Q., et al.: Learning dynamic siamese network for visual object tracking. In: Proceedings of the IEEE Conference on Computer Vision, pp. 1763–1771 (2017)
[61]
Yang, T., et al.: ROAM: recurrently optimizing tracking model. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6718–6727 (2020)
[62]
Fan, H., Liu, H.: Parallel tracking and verifying: a framework for real‐time and high accuracy visual tracking. In: Proceedings of the IEEE Conference on Computer Vision, pp. 5486–5494 (2017)

Index Terms

  1. SiamCCF: Siamese visual tracking via cross‐layer calibration fusion
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image IET Computer Vision
        IET Computer Vision  Volume 17, Issue 8
        December 2023
        179 pages
        EISSN:1751-9640
        DOI:10.1049/cvi2.v17.8
        Issue’s Table of Contents
        This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.

        Publisher

        John Wiley & Sons, Inc.

        United States

        Publication History

        Published: 04 May 2023

        Author Tags

        1. computer vision
        2. object tracking

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • 0
          Total Citations
        • 0
          Total Downloads
        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 26 Sep 2024

        Other Metrics

        Citations

        View Options

        View options

        Get Access

        Login options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media