Abstract
Object tracking is a fundamental problem of computer vision. Although being studied for decades, the single object tracking problem has not been completely solved, since there exist various challenges in the real physical world, such as object deformation, complex background and imperfect imaging, which make tracking difficult. For these challenges, we design a robust feature extraction network. Specifically, we propose a novel channel-wise feature attention mechanism, which is integrated into the pipeline of a well-known convolutional neural network based visual tracking algorithm. It is crucial to represent the object robustly. Due to the representative feature, the tracking performance is improved. In experiments, we test the proposed tracking algorithm in OTB100, VOT2018, VOT2020 and VOT-TIR datasets. Compared to the baseline algorithm, our proposed algorithm obtains consistent performance improvement for different benchmarks with absolute increase of tracking success score in OTB100 up to 0.6, and absolute increase of EAO up to 0.022, 0.007, and 0.008 in VOT2018, VOT2020, VOT-TIR2015 respectively. The source codes are publicly available.
Similar content being viewed by others
References
Bertinetto L, Valmadre J, Henriques JF, Vedaldi A, Torr PHS (2016) Fully-convolutional Siamese networks for object tracking. In: European conference on computer vision 2016 workshops. https://doi.org/10.1007/978-3-319-48881-3_56, pp 850–865
Bhat G, Danelljan M, Van Gool L, Timofte R (2019) Learning discriminative model prediction for tracking. In: In 2019 IEEE/CVF international conference on computer vision. https://doi.org/10.1109/ICCV.2019.00628, pp 6181–6190
Cao Y, Xu J, Lin S, Wei F, Hu H (2019) GCNet: non-local networks meet squeeze-excitation networks and beyond. In: 2019 IEEE/CVF international conference on computer vision workshop (ICCVW). https://doi.org/10.1109/ICCVW.2019.00246, pp 1971–1980
Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: ECCV 2020 - 16th European conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part I, pp 213–229. https://doi.org/10.1007/978-3-030-58452-8_13
Danelljan M, Häger G, Khan FS, Felsberg M (2015) Learning spatially regularized correlation filters for visual tracking. Int Conf Comput Vis 4310–4318:2015. https://doi.org/10.1109/ICCV.2015.490
Danelljan M, Bhat G, Khan FS, Felsberg M (2017) ECO: efficient convolution operators for tracking. In: 2017 IEEE conference on computer vision and pattern recognition. https://doi.org/10.1109/CVPR.2017.733, pp 6931–6939
Danelljan M, Hager G, Khan FS, Felsberg M (2017) Discriminative scale space tracking. IEEE transactions on pattern analysis and machine intelligence, pattern analysis and machine Intelligence, IEEE Transactions on, IEEE Trans Pattern Anal Mach Intell 39(8):1561–1575. https://doi.org/10.1109/TPAMI.2016.2609928
Danelljan M, Bhat G, Khan FS, Felsberg M (2019) ATOM: accurate tracking by overlap maximization. In: 2019 IEEE/CVF conference on computer vision and pattern recognition. https://doi.org/10.1109/CVPR.2019.00479, pp 4655–4664
Danelljan M, Van Gool L, Timofte R (2020) Probabilistic regression for visual tracking. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), computer vision and pattern recognition (CVPR), 2020 IEEE/CVF Conference on, CVPR. https://doi.org/10.1109/CVPR42600.2020.00721. IEEE, pp 7181–7190
Fan H, Bai H, Lin L, Yang F, Chu P, Deng G, Yu S, Harshit HM, Liu J, Xu Y, Liao C, Yuan L, Ling H (2021) LaSOT: a high-quality large-scale single object tracking benchmark international. J Comput Vis 129(2):439–461. https://doi.org/10.1007/s11263-020-01387-y
Felsberg M., Berg A., Hager G., Ahlberg J., Kristan M., Matas J., Pflugfelder R. (2015) The thermal infrared visual object tracking VOT-TIR2015 challenge results. In Proceedings of the ieee international conference on computer vision workshops (pp. 76–88). https://doi.org/10.1109/ICCVW.2015.86
Galoogahi HK, Fagg A, Lucey S (2017) Learning background-aware correlation filters for visual tracking. In: Proceedings of the IEEE international conference on computer vision, 2017-October. https://doi.org/10.1109/ICCV.2017.129, pp 1144–1152
Gao J, Zhang T, Xu C (2019) Graph convolutional tracking. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR). https://doi.org/10.1109/CVPR.2019.00478. IEEE, pp 4644–4654
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition. https://doi.org/10.1109/CVPR.2016.90, pp 770–778
Henriques J, Caseiro R, Martins P, Batista J (2014) High-speed tracking with kernelized correlation filters. IEEE Trans Pattern Anal Mach Intell 37(3):583–596. https://doi.org/10.1109/TPAMI.2014.2345390
Hu H, Gu J, Zhang Z, Dai J, Wei Y (2018) Relation networks for object detection. In: 2018 IEEE/CVF computer vision and pattern recognition (CVPR). https://doi.org/10.1109/CVPR.2018.00378. IEEE, pp 3588–3597
Hu J, Shen L, Albanie S, Sun G, Wu E (2020) Squeeze-and-excitation networks. IEEE Trans Pattern Anal Mach Intell 42 (8):2011–2023. https://doi.org/10.1109/TPAMI.2019.2913372
Huang L, Zhao X, Huang K (2021) GOT-10k: A large high-diversity benchmark for generic object tracking in the wild. IEEE Trans Pattern Anal Mach Intell 43(5):1562–1577. https://doi.org/10.1109/TPAMI.2019.2957464
Jiang F, Kong B, Li J, Dashtipour K, Gogate M (2021) Robust visual saliency optimization based on bidirectional Markov chains. Cogn Comput 13(1):69. https://doi.org/10.1007/s12559-020-09724-6
Kristan M, Leonardis A, Matas J, Felsberg M, Pflugfelder R, Zajc LC et al (2018) The sixth visual object tracking VOT2018 challenge results, vol 11129 LNCS. Springer Verlag. https://doi.org/10.1007/978-3-030-11009-3_1
Kristan M, Leonardis A, Matas J, Felsberg M, Pflugfelder R, Kamarainen J-K, Zajc LC et al (2020) The eighth visual object tracking VOT2020 challenge results. In: European conference on computer vision, workshops ECCV 2020. Lecture notes in computer science. https://doi.org/10.1007/978-3-030-68238-5_39, vol 12539. Springer, Cham
Krizhevsky A, Sutskever I, Hinton GE (2017) ImageNet classification with deep convolutional neural networks. Commun ACM 60(6):84–90. https://doi.org/10.1145/3065386
Li B, Yan J, Wu W, Zhu Z, Hu X (2018) High performance visual tracking with siamese region proposal network. In: 2018 IEEE/CVF conference on computer vision and pattern recognition. https://doi.org/10.1109/CVPR.2018.00935, pp 8971–8980
Li F, Tian C, Zuo W, Zhang L, Yang M-H (2018) Learning spatial-temporal regularized correlation filters for visual tracking. In: 2018 IEEE/CVF conference on computer vision and pattern recognition,(CVPR). https://doi.org/10.1109/CVPR.2018.00515. IEEE, pp 4904–4913
Li B, Wu W, Wang Q, Zhang F, Xing J, Yan J (2019) SiamRPN++: evolution of Siamese visual tracking with very deep networks. In: In 2019 IEEE/CVF conference on computer vision and pattern recognition. https://doi.org/10.1109/CVPR.2019.00441, pp 4277–4286
Li X, Wang W, Hu X, Yang J (2019) Selective kernel networks. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR). https://doi.org/10.1109/CVPR.2019.00060. IEEE, pp 510–519
Li X, Sun W, Wu T (2020) Attentive normalization. ECCV 2020. Lecture notes in computer science, vol 12362. Springer, Cham. https://doi.org/10.1007/978-3-030-58520-4_5
Müller M, Bibi A, Giancola S, Alsubaihi S, Ghanem B (2018) TrackingNet: a large-scale dataset and benchmark for object tracking in the wild. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics). https://doi.org/10.1007/978-3-030-01246-5_19, vol 11205 LNCS, pp 310–327
Nam H, Han B (2016) Learning multi-domain convolutional neural networks for visual tracking. In: 2016 IEEE conference on computer vision and pattern recognition. https://doi.org/10.1109/CVPR.2016.465, pp 4293–4302
Park E, Berg AC (2018) Meta-tracker: fast and robust online adaptation for visual object trackers. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics): vol 11207 LNCS, pp 587–604. https://doi.org/10.1007/978-3-030-01219-9_35
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, et al. (2015) Imagenet large scale visual recognition challenge. International journal of computer vision 115(3):211–252. https://doi.org/10.1007/s11263-015-0816-y
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: International conference on learning representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, conference track proceedings. 1409.1556
Smeulders AWM, Chu DM, Cucchiara R, Calderara S, Dehghan A, Shah M (2014) Visual tracking: an experimental survey. IEEE Trans Pattern Anal Mach Intell 36(7):1442–1468. https://doi.org/10.1109/TPAMI.2013.230
Valmadre J, Bertinetto L, Henriques J, Vedaldi A, Torr PHS (2017) End-to-end representation learning for correlation filter based tracking. In: Proceedings - 30th IEEE conference on computer vision and pattern recognition, CVPR 2017, 2017-January 5000–5008. https://doi.org/10.1109/CVPR.2017.531
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN et al (2017) Attention is all you need. Paper presented at the advances in neural information processing systems, 2017-December, pp 5999–6009. http://papers.nips.cc/paper/7181-attention-is-all-you-need
Wang Q, Teng Z, Xing J, Gao J, Hu W, Maybank S (2018) Learning attentions: residual attentional siamese network for high performance online visual tracking. In: 2018 IEEE/CVF Conference on computer vision and pattern recognition(CVPR). https://doi.org/10.1109/CVPR.2018.00510. IEEE, pp 4854–4863
Wang X, Girshick R, Gupta A, He K (2018) Non-local neural networks. In: In 2018 IEEE/CVF conference on computer vision and pattern recognition. https://doi.org/10.1109/CVPR.2018.00813, pp 7794–7803
Wang Q, Wu B, Zhu P, Li P, Zuo W, Hu Q (2020) ECA-Net: efficient channel attention for deep convolutional neural networks. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), computer vision and pattern recognition (CVPR), 2020 IEEE/CVF conference on, CVPR. https://doi.org/10.1109/CVPR42600.2020.01155. IEEE, pp 11531–11539
Woo S, Park J, Lee J-Y, Kweon IS (2018) CBAM: convolutional block attention module. ECCV 2018. Lecture Notes in computer science, vol 11211. Springer, Cham. https://doi.org/10.1007/978-3-030-01234-2_1
Wu Y, Lim J, Yang MH (2015) Object tracking benchmark. IEEE Trans Pattern Anal Mach Intell 37(9):1834–1848. https://doi.org/10.1109/TPAMI.2014.2388226
Xu Y, Zhou X, Chen S, Li F (2019) Deep learning for multiple object tracking: a survey. IET Comput Vis (Wiley-Blackwell) 13(4):355–368. https://doi.org/10.0.4.25/iet-cvi.2018.5598
Zhang Z, Peng H, Fu J, Li B, Hu W (2020) Ocean: object-aware anchor-free tracking. ECCV 2020. Lecture notes in computer science, vol 12366. Springer, Cham. https://doi.org/10.1007/978-3-030-58589-1_46
Zhao H, Jia J, Koltun V (2020) Exploring self-attention for image recognition. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR). https://doi.org/10.1109/CVPR42600.2020.01009. IEEE, pp 10073–10082
Zhou X, Xie L, Zhang P, Zhang Y (2014) An ensemble of deep neural networks for object tracking. In: 2014 IEEE International conference on image processing (ICIP). https://doi.org/10.1109/ICIP.2014.7025169. IEEE, pp 843–847
Zhu Z, Wang Q, Li B, Wu W, Yan J, Hu W (2018) Distractor-aware Siamese networks for visual object tracking. In: European conference on computer vision. 15th European conference, Munich, Germany, September 8-14, 2018, Proceedings, Part IX. https://doi.org/10.1007/978-3-030-01240-3_7, pp 103–119
Zhu Z, Wu W, Zou W, Yan J (2018) End-to-end flow correlation tracking with spatial-temporal attention. In: 2018 IEEE/CVF conference on computer vision and pattern recognition(CVPR). https://doi.org/10.1109/CVPR.2018.00064. IEEE, pp 548–557
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wu, H., Liu, G. Split-merge-excitation: a robust channel-wise feature attention mechanism applied to MDNet tracking. Multimed Tools Appl 81, 40737–40754 (2022). https://doi.org/10.1007/s11042-022-12752-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-12752-z