Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Visual tracking via dynamic weighting with pyramid-redetection based Siamese networks

Published: 01 December 2019 Publication History

Highlights

An improved end-to-end Siamese network for visual tracking algorithm.
The dynamic weighting module is used for both offline and online learning.
The residual architecture is exploited for better prediction.
The online pyramid-redetection module is resorted to re-track the target object.
Experiments of both short and long-term tracking show excellent tracking results.

Abstract

Siamese network based similarity-learning algorithm is currently a significant branch of visual tracking. However, most of existing deep Siamese networks depend much on the offline-trained knowledge and always assume the same importance for different prediction views. In this paper, we first introduce a dynamic weighting module in Siamese framework, which could make the offline-trained network adapt to the current circumstance well and weight predictive response maps discriminatively. The thought stems from the basis that different maps have different predictive preference, which should not be treated equally. Secondly, in order to focus more on the accurate preference, we then introduce the residual structure to form the residual dynamic weighting module. Thirdly, we construct a simple online pyramid-redetection module to avoid local search and also consider the global viewpoint. Extensive experiments on both short-term and long-term tracking demonstrate that the proposed tracker possesses the competitive tracking performance over many mainstream state-of-the-art trackers.

References

[1]
X. Jia, H. Lu, M. Yang, Visual tracking via adaptive structural local sparse appearance model, in: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 2012, pp. 1822–1829.
[2]
J. Henriques, R. Caseiro, P. Martins, J. Batista, Exploiting the circulant structure of tracking-by-detection with kernels, in: Computer Vision–ECCV 2012, Springer, 2012, pp. 702–715.
[3]
J. Henriques, R. Caseiro, P. Martins, J. Batista, High-speed tracking with kernelized correlation filters, IEEE Trans. Pattern Anal. Mach. Intell. 37 (3) (2015) 583–596,.
[4]
M. Danelljan, G. Häger, F. Khan, M. Felsberg, Accurate scale estimation for robust visual tracking, in: 2014 British Machine Vision Conference (BMVC), BMVA, 2014, pp. 65.1–65.11.
[5]
M. Danelljan, G. Häger, F. Khan, et al., Discriminative scale space tracking, IEEE Trans. Pattern Anal. Mach. Intell. 39 (8) (2017) 1561–1575,.
[6]
B. Babenko, M. Yang, S. Belongie, Visual tracking with online multiple instance learning, in: 2009 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 2009, pp. 983–990.
[7]
C. Bao, Y. Wu, H. Ling, et al., Real time robust l1 tracker using accelerated proximal gradient approach, in: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 2012, pp. 1830–1837.
[8]
K. Zhang, L. Zhang, M. Yang, Real-time compressive tracking, in: Computer Vision–ECCV 2012, Springer, 2012, pp. 864–877.
[9]
Z. Kalal, K. Mikolajczyk, J. Matas, Tracking-learning-detection, IEEE Trans. Pattern Anal. Mach. Intell. 34 (7) (2012) 1409–1422,.
[10]
D. Ross, J. Lim, R. Lin, M. Yang, Incremental learning for robust visual tracking, Int. J. Comput. Vis. 77 (1–3) (2008) 125–141,.
[11]
M. Danelljan, F. Khan, M. Felsberg, et al., Adaptive color attributes for real-time visual tracking, in: 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 2014, pp. 1090–1097.
[12]
O. Russakovsky, J. Deng, H. Su, et al., Imagenet large scale visual recognition challenge, Int. J. Comput. Vis. 115 (3) (2015) 211–252,.
[13]
B. Zhou, A. Lapedriza, A. Khosla, et al., Places: A 10 million image database for scene recognition, IEEE Trans. Pattern Anal. Mach. Intell. PP (99) (2017) 1–14.
[14]
K. Soomro, A. Zamir, M. Shah, UCF101: A dataset of 101 human actions classes from videos in the wild, arXiv preprint arXiv: 1212.0402, 2012.
[15]
K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, arXiv preprint arXiv: 1409.1556, 2014.
[16]
K. He, X. Zhang, S. Ren, et al., Deep residual learning for image recognition, in: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 2016, pp. 770–778,.
[17]
G. Huang, Z. Liu, K. Weinberger, et al., Densely connected convolutional networks, in: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 2017, pp. 2261–2269,.
[18]
C. Feichtenhofer, A. Pinz, A. Zisserman, Convolutional two-stream network fusion for video action recognition, in: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 2016, pp. 1933–1941,.
[19]
J. Donahue, L. Hendricks, S. Guadarrama, et al., Long-term recurrent convolutional networks for visual recognition and description, in: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 2015, pp. 2625–2634.
[20]
L. Wang, W. Ouyang, X. Wang, H. Lu, STCT: sequentially training convolutional networks for visual tracking, in: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 2016, pp. 1373–1381.
[21]
Y. Song, C. Ma, L. Gong, et al., CREST: Convolutional residual learning for visual tracking, in: 2017 IEEE International Conference on Computer Vision (ICCV), IEEE, 2017, pp. 2574–2583.
[22]
C. Ma, J. Huang, X. Yang, M. Yang, Hierarchical convolutional features for visual tracking, in: 2015 IEEE International Conference on Computer Vision (ICCV), IEEE, 2015, pp. 3074–3082.
[23]
Y. Qi, S. Zhang, L. Qin, et al., Hedged deep tracking, in: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 2016, pp. 4303–4311.
[24]
N. Wang, D. Yeung, Learning a deep compact image representation for visual tracking, in: Advances in Neural Information Processing Systems, 2013, pp. 809–817.
[25]
H. Nam, B. Han, Learning multi-domain convolutional neural networks for visual tracking, in: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 2016, pp. 4293–4302.
[26]
R. Tao, E. Gavves, A. Smeulders, Siamese instance search for tracking, in: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 2016, pp. 1420–1429.
[27]
K. Chen, W. Tao, Once for all: a two-flow convolutional neural network for visual tracking, IEEE Trans. Circuits Syst. Video Technol PP (99) (2017) 1–10.
[28]
L. Bertinetto, J. Valmadre, J. Henriques, et al., Fully-convolutional siamese networks for object tracking, in: Computer Vision–ECCV 2016, Springer, 2016, pp. 850–865.
[29]
D. Held, S. Thrun, S. Savarese, Learning to track at 100 fps with deep regression networks, in: Computer Vision–ECCV 2016, Springer, 2016, pp. 749–765.
[30]
Q. Liu, X. Lu, Z. He, et al., Deep convolutional neural networks for thermal infrared object tracking, Knowledge-Based Syst. 134 (2017) 189–198,.
[31]
Q. Guo, W. Feng, C. Zhou, et al., Learning dynamic siamese network for visual object tracking, in: 2017 IEEE International Conference on Computer Vision (ICCV), IEEE, 2017, pp. 1781–1789.
[32]
C. Huang, S. Lucey, D. Ramanan, Learning policies for adaptive tracking with deep feature cascades, in: 2017 IEEE International Conference on Computer Vision (ICCV), IEEE, 2017, pp. 105–114.
[33]
W. Zheng, H. Yu, W. Huang, Visual tracking via graph regularized kernel correlation filer and multi-memory voting, J. Visual Commun. Image Represent. 55 (2018) 688–697,.
[34]
J. Valmadre, L. Bertinetto, J. Henriques, et al., End-to-end representation learning for correlation filter based tracking, in: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 2017, pp. 5000–5008.
[35]
A. Lukezic, T. Vojír, L. Zajc, et al., Discriminative correlation filter with channel and spatial reliability, in: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 2017, pp. 4847–4856.
[36]
G. Zhu, J. Wang, Y. Wu, H. Lu, Collaborative correlation tracking, in: 2015 British Machine Vision Conference (BMVC), BMVA, 2015, pp. 184.1–184.12.
[37]
M. Wang, Y. Liu, Z. Huang, Large margin object tracking with circulant feature maps, in: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 2017, pp. 21–26.
[38]
J. Zhang, S. Ma, S. Sclaroff, MEEM: robust tracking via multiple experts using entropy minimization, in: Computer Vision–ECCV 2014, Springer, 2014, pp. 188–203.
[39]
J. Choi, H. Chang, J. Jeong, et al., Visual tracking using attention-modulated disintegration and integration, in: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 2016, pp. 4321–4330.
[40]
B. Cai, X. Xu, X. Xing, et al., BIT: Biologically inspired tracker, IEEE Trans. Image Process. 25 (3) (2016) 1327–1339,.
[41]
M. Danelljan, G. Hager, F. Khan, et al., Learning spatially regularized correlation filters for visual tracking, in: 2015 IEEE International Conference on Computer Vision (ICCV), IEEE, 2015, pp. 4310–4318.
[42]
Y. Li, J. Zhu, A scale adaptive kernel correlation filter tracker with feature integration, in: Computer Vision–ECCV 2014 Workshops (ECCV Workshops), Springer, 2014, pp. 254–265.
[43]
L. Bertinetto, J. Valmadre, S. Golodetz, et al., Staple: Complementary learners for real-time tracking, in: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 2016, pp. 1401–1409.
[44]
Y. Wu, J. Lim, M. Yang, Online object tracking: a benchmark, in: 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 2013, pp. 2411–2418.
[45]
Y. Wu, J. Lim, M. Yang, Object tracking benchmark, IEEE Trans. Pattern Anal. Mach. Intell. 37 (9) (2015) 1834–1848,.
[46]
A. Smeulders, D. Chu, R. Cucchiara, et al., Visual tracking: an experimental survey, IEEE Trans. Pattern Anal. Mach. Intell. 36 (7) (2014) 1442–1468,.
[47]
M. Kristan, R. Pflugfelder, A. Leonardis, et al., The visual object tracking vot2013 challenge results, in: 2013 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), IEEE, 2013, pp. 98–111.
[48]
M. Kristan, R. Pflugfelder, A. Leonardis, et al., The visual object tracking vot2014 challenge results, in: Computer Vision–ECCV 2014 Workshops (ECCV Workshops), Springer, 2014, pp. 191–217.
[49]
M. Kristan, J. Matas, A. Leonardis, et al., The visual object tracking vot2015 challenge results, in: 2015 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), IEEE, 2015, pp. 1–23.
[50]
T. Yang, A. Chan, Recurrent filter learning for visual tracking, in: 2017 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), IEEE, 2017, pp. 2010–2019.
[51]
O. Parkhi, A. Vedaldi, A. Zisserman, Deep face recognition, in: 2015 British Machine Vision Conference (BMVC), BMVA, 2015, pp. 41.1–41.12.
[52]
E. Simo-Serra, E. Trulls, L. Ferraz, et al., Discriminative learning of deep convolutional feature point descriptors, in: 2015 IEEE International Conference on Computer Vision (ICCV), IEEE, 2015, pp. 118–126.
[53]
V. Mnih, K. Kavukcuoglu, D. Silver, et al., Human-level control through deep reinforcement learning, Nature 518 (7540) (2015) 529–533,.
[54]
A. Vedaldi, K. Lenc, Matconvnet: convolutional neural networks for matlab, in: Proceedings of the 23rd ACM international conference on Multimedia, ACM, 2015, pp. 689–692.
[55]
K. He, X. Zhang, S. Ren, et al., Delving deep into rectifiers: Surpassing human-level performance on imagenet classification, in: 2015 IEEE International Conference on Computer Vision (ICCV), IEEE, 2015, pp. 1026–1034.
[56]
Q. Wang, Z. Teng, J. Xing, et al., Learning attentions: residual attentional Siamese Network for high performance online visual tracking, in: 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 2018, pp. 4854–4863.
[57]
A. He, C. Luo, X. Tian, et al., A twofold siamese network for real-time object tracking, in: 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 2018, pp. 4834–4843.
[58]
Z. Ji, W. Wang, Correlation filter tracker based on sparse regularization, J. Visual Commun. Image Represent. 55 (2018) 354–362,.
[59]
H. Song, Y. Zheng, K. Zhang, Robust visual tracking via self-similarity learning, Electron. Lett. 53 (1) (2016) 20–22,.
[60]
B. Bai, B. Zhong, G. Ouyang, et al., Kernel correlation filters for visual tracking with adaptive fusion of heterogeneous cues, Neurocomputing 286 (2018) 109–120,.
[61]
G. Li, M. Peng, K. Nai, et al., Visual tracking via context-aware local sparse appearance model, J. Visual Commun. Image Represent. 56 (2018) 92–105,.
[62]
S. Hare, A. Saffari, P. Torr, Struck: structured output tracking with kernels, in: 2011 IEEE International Conference on Computer Vision (ICCV), IEEE, 2011, pp. 263–270.
[63]
Y. Li, J. Zhu, A scale adaptive kernel correlation filter tracker with feature integration, in: Computer Vision–ECCV 2014, Springer, 2014, pp. 254–265.
[64]
Z. Hong, Z. Chen, C. Wang, et al., Multi-store tracker (muster): A cognitive psychology inspired approach to object tracking, in: 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 2018, pp. 749–758.
[65]
H. Grabner, M. Grabner, H. Bischof, Real-time tracking via on-line boosting, in: 2006 British Machine Vision Conference (BMVC), BMVA, 2006, 1(5): 6. https://doi.org/10.5244/c.20.6.
[66]
M. Mueller, N. Smith, B. Ghanem, A benchmark and simulator for uav tracking, in: Computer Vision–ECCV 2016, Springer, 2016, pp. 445–461.
[67]
H. Fan, H. Ling, Parallel tracking and verifying: a framework for real-time and high accuracy visual tracking, in: 2017 IEEE International Conference on Computer Vision (ICCV), IEEE, 2017, pp. 5487–5495.
[68]
T. Lin, P. Dollár, R. Girshick, et al., Feature pyramid networks for object detection, in: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 2017, pp. 936–944.
[69]
Z. Zhu, Q. Wang, B. Li, et al., Distractor-aware siamese networks for visual object tracking, arXiv preprint arXiv:1808.06048, 2018.

Index Terms

  1. Visual tracking via dynamic weighting with pyramid-redetection based Siamese networks
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image Journal of Visual Communication and Image Representation
        Journal of Visual Communication and Image Representation  Volume 65, Issue C
        Dec 2019
        271 pages

        Publisher

        Academic Press, Inc.

        United States

        Publication History

        Published: 01 December 2019

        Author Tags

        1. Visual tracking
        2. Siamese networks
        3. Dynamic weighting
        4. Residual structure
        5. Convolutional neural networks
        6. Pyramid-redetection

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • 0
          Total Citations
        • 0
          Total Downloads
        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 15 Jan 2025

        Other Metrics

        Citations

        View Options

        View options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media