Abstract
Visual object tracking is to locate an object of interest in a sequence of consecutive video frames, which is widely applied in many high-level computer vision tasks such as intelligent video surveillance and robotics. It is of great challenges for visual tracking methods to handle large target appearance variations caused by pose deformation, fast motion, occlusion, and surrounding environments in real-time videos. In this paper, inspired by human attention cognitive saliency model, we propose a visual tracking method based on salient superpixels which integrates the target appearance similarity and cognitive saliency, and helps to location inference and appearance model updating. The saliency of superpixel is detected by graph model and manifold ranking. We cluster the superpixels of the first four target boxes into a set corresponding to object foreground and model the target appearance with color descriptors. While tracking, the relevance is computed between the candidate superpixels and the target appearance set. We also propose an iterative threshold segmentation method to distinguish the foreground and background of superpixels based on saliency and relevance. To increase the accuracy of location inference, we explore particle filter in both confidence estimation and sampling procedures. We compared our method with the existing techniques in OTB100 dataset in terms of precision based on center location error and success rate based on overlap, and the experimental results show that our proposed method achieved substantially better performance. Promising results have shown that the proposed salient superpixel-based approach is effective to deformation, occlusion, and other challenges in object tracking.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Yang C, Zhang L, Lu H, Ruan X, Yang M. Saliency detection via graph-based manifold ranking. 2013 I.E. Conference on Computer Vision and Pattern Recognition (CVPR); 2013. p. 3166–3173. https://doi.org/10.1109/CVPR.2013.407.
Radhakrishna A, Appu S, Kevin S, Aurelien L, Pascal F, Sabine S. Slic Superpixels Compared to State-of-the-art Superpixel Methods. IEEE Trans Pattern Anal Mach Intell (PAMI) 2012;34(11):2274–2282. https://doi.org/10.1109/TPAMI.2012.120.
Wang S, Lu H, Yang F, Yang M. Superpixel Tracking, In: IEEE International Conference on Computer Vision (ICCV); 2011. p. 1323–1330. https://doi.org/10.1109/ICCV.2011.6126385.
Carsten R, Vladimir K, Grabcut AB. Interactive foreground extraction using iterated graph cuts. ACM Trans Graph (TOG) 2004;23(3):309–314. https://doi.org/10.1145/1186562.1015720.
Yan Y, Ren J, Zhao H, Sun G, Wang Z, Zheng J, Stephen M, John S. Cognitive fusion of thermal and visible imagery for effective detection and tracking of pedestrians in videos. Cogn Comput. 2017;(9):1–11. https://doi.org/10.1007/s12559-017-9529-6.
Vasileios B, Falk S, Nassir N, Slobodan I. Segmentation based particle filtering for real-time 2d object tracking. In: European Conference on Computer Vision (ECCV); 2012. p. 842–855. https://doi.org/10.1007/978-3-642-33765-9_60.
Li G, Wang ZY, Luo J, Chen X, Li H. Spatio-context-based target tracking with adaptive multi-feature fusion for real-world hazy scenes. Cogn Comput 2018;10(4):545–557. https://doi.org/10.1007/s12559-018-9550-4.
Son J, Jung I, Park K, Han B. Tracking-by-segmentation with online gradient boosting decision tree. In: IEEE International Conference on Computer Vision (ICCV); 2015. p. 3056–3064. https://doi.org/10.1109/ICCV.2015.350.
Kalal Z, Mikolajczyk K, Matas J. Tracking-learning-detection. IEEE Trans Pattern Anal Mach Intell (PAMI) 2012;34(7):1409–1422. https://doi.org/10.1109/TPAMI.2011.239.
Xu C, Tao W, Meng Z. Robust visual tracking via online multiple instance learning with fisher information. Pattern Recogn 2015;48(12):3917–3926. https://doi.org/10.1016/j.patcog.2015.06.004.
Hare S, Saffari A, Torr P. Struck: structured output tracking with kernels. In: IEEE International Conference on Computer Vision (ICCV); 2011. p. 263–270. https://doi.org/10.1109/ICCV.2011.6126251.
Zhang P, Zhuo T, Xie L, Zhang Y. Deformable object tracking with spatiotemporal segmentation in big vision surveillance. Neurocomputing 2016;204(C):87–96. https://doi.org/10.1016/j.neucom.2015.07.149.
Zhang J, Zhang T, Dai Y, Harandi M, Hartley R. 2018. Deep unsupervised saliency Detection: A multiple noisy labeling perspective. In: 2018 I.E. Conference on Computer vision and pattern recognition (CVPR).
Gao F, Ma F, Wang J, Sun J, Yang E, Zhou H. Visual saliency modeling for river detection in high-resolution SAR imagery. IEEE Access 2017;6:1000–1014. https://doi.org/10.1109/ACCESS.2017.2777444.
Gao F, You J, Wang J, Sun J, Yang E, Zhou H. A novel target detection method for SAR images based on shadow proposal and saliency analysis. Neurocomputing 2017;267(C):220–231. https://doi.org/10.1016/j.neucom.2017.06.004.
Gao F, Zhang Y, Wang J, Sun J, Yang E. Amir h. Visual attention model based vehicle target detection in synthetic aperture radar images a novel approach. Cogn Comput 2015;7(4):434–444. https://doi.org/10.1007/s12559-014-9312-x.
Gao F, Ma F, Zhang Y, Wang J, Sun J, Yang E, Amir H. Biologically inspired progressive enhancement target detection from heavy cluttered SAR images. Cogn Comput 2016;8(5):955–966. https://doi.org/10.1007/s12559-016-9405-9.
Liu Q, Wang Y, Yin M, Ren J, Li R. Decontaminate feature for tracking: adaptive tracking via evolutionary feature subset. J Electron Imaging 2017;26(6):1–10. https://doi.org/10.1117/1.JEI.26.6.063025.
Ding G, Chen W, Zhao S, Han J, Liu Q. Real-time scalable visual tracking via quadrangle kernelized correlation filters. IEEE Trans Intell Trans Syst 2018;19(1):140–150. https://doi.org/10.1109/TITS.2017.2774778.
Zhang B, Luan S, Chen C, Han J, Wang W. Latent constrained correlation filter. IEEE Trans Image Process (TIP) 2018;27(3):1038–1048. https://doi.org/10.1109/TIP.2017.2775060.
Amir R, Daphna W. Extracting foreground masks towards object recognition. In: International Conference on Computer Vision (ICCV); 2011. p. 1371–1378. https://doi.org/10.1109/ICCV.2011.6126391.
Chai Y, Ren J, Zhao H, Li Y, Ren JC, Paul M. Hierarchical and multi-featured fusion for effective gait recognition under variable scenarios. Pattern Anal Appl 2016;19(4):905–917. https://doi.org/10.1007/s10044-015-0471-5.
Ezrinda MZ, Kamarul HG, Ren J, Mohd ZS. 2018. A hybrid thermal-visible fusion for outdoor human detection. Journal of Telecommunication, Electronic and Computer Engineering (JTEC.
Yan Y, Ren J, Sun G, Zhao H, Han J, Li X, Stephen M, Zhan J. Unsupervised image saliency detection with Gestalt-laws guided optimization and visual attention based refinement. Pattern Recogn 2018; 79:65–78. https://doi.org/10.1109/TIP.2017.2775060.
Wang Z, Ren J, Zhang D, Sun M, Jiang J. A Deep-Learning based feature hybrid framework for spatiotemporal saliency detection inside videos. Neurocomputing 2018;287:68–83. https://doi.org/10.1016/j.neucom.2018.01.076.
Shi J, Carlo T. Good features to track. In: 1994 I.E. Conference on Computer Vision and Pattern Recognition (CVPR); 1994. p. 593–600. https://doi.org/10.1109/CVPR.1994.323794.
Yang F, Lu H, Yang M. Robust superpixel tracking. IEEE Trans Image Process (TIP). 2014;23(4): 1639–1651. https://doi.org/10.1109/TIP.2014.2300823.
Perera A, Law Y, Chahl J. Human pose and path estimation from aerial video using dynamic classifier selection. Cognitive Comput 2018;10:1019–1041. https://doi.org/10.1007/s12559-018-9577-6.
Zhang L, Dai J, Lu H, He Y, Gang W. A bi-directional message passing model for salient object detection. In: 2018 I.E. Conference on Computer Vision and Pattern Recognition (CVPR); 2018. p. 1741–1750.
Zhou X, Li X, Hu W, Learning A. Superpixel-driven speed function for level set tracking. IEEE Trans Cybern 2016;46(7):1498–1510. https://doi.org/10.1109/TCYB.2015.2451100.
Han J, Eric J, Paul M, Peter H. Employing a RGB-D sensor for real-time tracking of humans across multiple re-entries in a smart environment. IEEE Trans Consum Electron 2012;58(2):255–263. https://doi.org/10.1109/TCE.2012.6227420.
Hong Z, Wang C, Mei X, Prokhorov D, Tao D. Tracking using multilevel quantizations. In: European Conference on Computer Vision (ECCV); 2014. vol 8694. p. 155–171. https://doi.org/10.1007/978-3-319-10599-4_11.
Xiao J, Stolkin R, Ales L. Single target tracking using adaptive clustered decision trees and dynamic multilevel appearance models. In: 2015 I.E. Conference on Computer Vision and Pattern Recognition (CVPR); 2015. p. 4978–4987.
Yeo D, Son J, Han B, Han JH. Superpixel-based tracking-by-segmentation using Markov chains. In: 2017 I.E. Conference on Computer Vision and Pattern Recognition (CVPR); 2017. p. 511–520. https://doi.org/10.1109/CVPR.2017.62.
Wang L, Lu H, Yang M. Constrained superpixel tracking. IEEE Trans Cybern 2018;48(3):1030–1041. https://doi.org/10.1109/TCYB.2017.2675910.
Wu Y, Lim J, Yang M. Object tracking benchmark. IEEE Trans Pattern Anal Mach Intell (PAMI) 2015; 37(9):1834–1848. https://doi.org/10.1109/TPAMI.2014.2388226.
Zhang K, Zhang L, Yang M. Real-time compressive tracking. In: European Conference on Computer Vision (ECCV); 2012. p. 864–877. https://doi.org/10.1007/978-3-642-33712-3_62.
Liu B, Huang J, Yang L, Casimir K. Robust tracking using local sparse appearance model and k-selection. In: 2011 I.E. Conference on Computer Vision and Pattern Recognition (CVPR); 2011. p. 1313–1320. https://doi.org/10.1109/CVPR.2011.5995730.
Adam A, Rivlin E, Shimshoni I. 2006. Robust fragments-based tracking using the integral histogram. In: 2006 I.E. Conference on Computer vision and pattern recognition (CVPR).
Borji A, Sihite D, Itti L. Quantitative analysis of human-model agreement in visual saliency modeling: a comparative study. IEEE Trans Image Process (TIP) 2013;22(1):55–69. https://doi.org/10.1109/TIP.2012.2210727.
Cheng M, Zhang G, Niloy J, Huang X, Wu S. Global contrast based salient region detection. In: 2011 I.E. Conference on Computer Vision and Pattern Recognition (CVPR); 2011. p. 409–416. https://doi.org/10.1109/CVPR.2011.5995344.
Borji A, Cheng M, Jiang H, Li J. Salient object detection: a benchmark. IEEE Trans Image Process (TIP) 2015;24(12):5706–5722. https://doi.org/10.1109/TIP.2015.2487833.
Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Wang X, Tang X. Residual attention network for image classification. In: 2017 I.E. Conference on Computer Vision and Pattern Recognition (CVPR); 2017. vol 1. p. 6450–6458. https://doi.org/10.1109/CVPR.2017.683.
Mnih V, Heess N, Graves A, Kavukcuoglu K. Recurrent models of visual attention. In: The 27th International Conference on Neural Information Processing Systems (NIPS); 2014. vol 2. p. 2204–2212.
Henriques J, Rui C, Martins P, Batista J. High-speed tracking with kernelized correlation filters. IEEE Trans Pattern Anal Mach Intell (PAMI) 2014;37(3):583–596. https://doi.org/10.1109/tpami.2014.2345390.
Danelljan M, Hager G, Khan F, Felsberg M. Convolutional features for correlation filter based visual tracking. In: IEEE International Conference on Computer Vision Workshop; 2015. p. 621–629. https://doi.org/10.1109/ICCVW.2015.84.
Lukezic A, Vojir T, Zajc L, Jiri M, Matej K. Discriminative correlation filter with channel and spatial reliability. In: 2017 I.E. Conference on Computer Vision and Pattern Recognition (CVPR); 2017. vol 1. p. 4847–4856. https://doi.org/10.1109/CVPR.2017.515.
Benfold B, Reid I. Stable multi-target tracking in real-time surveillance video. In: 2011 I.E. Conference on Computer Vision and Pattern Recognition (CVPR); 2011. p. 3457–3464. https://doi.org/10.1109/CVPR.2011.5995667.
Funding
This study was partly supported by National Natural Science Foundation of China (61772144, 61672008, 61876045), Foreign Science and Technology Cooperation Plan Project of Guangzhou Science Technology and Innovation Commission (201807010059), Guangdong Provincial Application oriented Technical Research and Development Special Fund Project (2016B010127006), the Scientific and Technological Projects of Guangdong Province (2017A050501039), Innovation Team Project (Natural Science) of the Education Department of Guangdong Province (2017KCXTD021), National Natural Science Foundation of China Youth Fund (61602116), Innovation Research Project (Natural Science) of Education Department of Guangdong Province (2016KTSCX077), and Zhujiang Science and Technology New Star Project of Guangzhou (201906010057).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
The authors declare that they have no conflict of interest.
Ethical Approval
This article does not contain any studies with human participants performed by any of the authors.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhan, J., Zhao, H., Zheng, P. et al. Salient Superpixel Visual Tracking with Graph Model and Iterative Segmentation. Cogn Comput 13, 821–832 (2021). https://doi.org/10.1007/s12559-019-09662-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12559-019-09662-y