Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
rapid-communication

Modality-correlation-aware sparse representation for RGB-infrared object tracking

Published: 01 February 2020 Publication History

Highlights

A modality-correlation-aware sparse representation model is proposed for RGB-infrared object tracking.
A effective and efficient learning algorithm is derived to obtain the optimal model parameters.
Extensive experiments are performed to demonstrate the effectiveness of the proposed method on some large appearance variations such as low illumination condition.

Abstract

To intelligently analyze and understand video content, a key step is to accurately perceive the motion of the interested objects in videos. To this end, the task of object tracking, which aims to determine the position and status of the interested object in consecutive video frames, is very important, and has received great research interest in the last decade. Although numerous algorithms have been proposed for object tracking in RGB videos, most of them may fail to track the object when the information from the RGB video is not reliable (e.g. in dim environment or large illumination change). To address this issue, with the popularity of dual-camera systems for capturing RGB and infrared videos, this paper presents a feature representation and fusion model to combine the feature representation of the object in RGB and infrared modalities for object tracking. Specifically, this proposed model is able to (1) perform feature representation of objects in different modalities by employing the robustness of sparse representation, and (2) combine the representation by exploiting the modality correlation. Extensive experiments demonstrate the effectiveness of the proposed method.

References

[1]
B. Babenko, M. Yang, S. Belongie, Robust object tracking with online multiple instance learning, IEEE Trans. Pattern Anal. Mach. Intell. 33 (8) (2011) 1619–1632.
[2]
S. Boyd, N. Parikh, E. Chu, B. Peleato, J. Eckstein, Distributed optimization and statistical learning via the alternating direction method of multipliers, Found. Trends Mach. Learn. 3 (1) (2011).
[3]
F. Bunyak, K. Palaniappan, S.K. Nath, G. Seetharaman, Geodesic active contour based fusion of visible and infrared video for persistent object tracking, Proceedings of the WACV, 2007.
[4]
E.J. Candès, B. Recht, Exact matrix completion via convex optimization, Found. Comput. Math. 9 (6) (2009) 717–772.
[5]
X. Chang, Z. Ma, Y. Yang, Z. Zeng, A.G. Hauptmann, Bi-level semantic representation analysis for multimedia event detection, IEEE Trans. Cybern. 47 (5) (2017) 1180–1197.
[6]
X. Chang, Y. Yang, Semisupervised feature analysis by mining correlations among multiple tasks, IEEE Trans. Neural Netw. Learn. Syst. 28 (10) (2017) 2294–2305.
[7]
X. Chang, Y. Yu, Y. Yang, E.P. Xing, Semantic pooling for complex event analysis in untrimmed videos, IEEE Trans. Pattern Anal. Mach. Intell. 39 (8) (2017) 1617–1632.
[8]
C.Ó. Conaire, N.E. O’Connor, A.F. Smeaton, Thermo-visual feature fusion for object tracking using multiple spatiogram trackers, Mach. Vis. Appl. 19 (5–6) (2008) 483–494.
[9]
N. Dalal, B. Triggs, Histograms of oriented gradients for human detection, Proceedings of the CVPR, 2005, pp. 886–893.
[10]
M. Danelljan, F.S. Khan, M. Felsberg, J. van de Weijer, Adaptive color attributes for real-time visual tracking, Proceedings of the CVPR, IEEE, 2014, pp. 1090–1097.
[11]
G. Ding, W. Chen, S. Zhao, J. Han, Q. Liu, Real-time scalable visual tracking via quadrangle Kernelized correlation filters, IEEE Trans. Intell. Trans. Syst. 19 (1) (2018) 140–150.
[12]
J. Eckstein, D.P. Bertsekas, On the douglas-rachford splitting method and the proximal point algorithm for maximal monotone operators, Math. Program. 55 (1992) 293–318.
[13]
Y. Guo, G. Ding, L. Liu, J. Han, L. Shao, Learning to hash with optimized anchor embedding for scalable retrieval, IEEE Trans. Image Process. 26 (3) (2017) 1344–1354.
[14]
J. Han, E.J. Pauwels, P.M. de Zeeuw, P.H.N. de With, Employing a RGB-D sensor for real-time tracking of humans across multiple re-entries in a smart environment, IEEE Trans. Consumer Electron. 58 (2) (2012) 255–263.
[15]
J. Han, L. Shao, D. Xu, J. Shotton, Enhanced computer vision with microsoft kinect sensor: a review, IEEE Trans. Cybern. 43 (5) (2013) 1318–1334.
[16]
S. Hare, S. Golodetz, A. Saffari, V. Vineet, M. Cheng, S.L. Hicks, P.H.S. Torr, Struck: structured output tracking with kernels, IEEE Trans. Pattern Anal. Mach. Intell. 38 (10) (2016) 2096–2109.
[17]
J.F. Henriques, R. Caseiro, P. Martins, J. Batista, High-speed tracking with kernelized correlation filters, IEEE Trans. Pattern Anal. Mach. Intell. 37 (3) (2015) 583–596.
[18]
Z. Hong, X. Mei, D. Prokhorov, D. Tao, Tracking via robust multi-task multi-view joint sparse representation, Proceedings of the ICCV, 2013, pp. 649–656.
[19]
R. Hou, C. Chen, M. Shah, Tube convolutional neural network (T-CNN) for action detection in videos, Proceedings of the ICCV, 2017, pp. 5823–5832.
[20]
J.-F. Hu, W.-S. Zheng, J. Lai, J. Zhang, Jointly learning heterogeneous features for RGB-D activity recognition, Proceedings of the CVPR, 2015, pp. 5344–5352.
[21]
W. Hu, W. Li, X. Zhang, S. Maybank, Single and multiple object tracking using a multi-feature joint sparse representation, IEEE Trans. Pattern Anal. Mach. Intell. 37 (4) (2015) 816–833.
[22]
X. Lan, A. Ma, P. Yuen, R. Chellappa, Joint sparse representation and robust feature-level fusion for multi-cue visual tracking, IEEE Trans. Image Process. 24 (12) (2015) 5826–5841.
[23]
X. Lan, A.J. Ma, P.C. Yuen, Multi-cue visual tracking using robust feature-level fusion based on joint sparse representation, Proceedings of the CVPR, 2014, pp. 1194–1201.
[24]
X. Lan, M. Ye, S. Zhang, P.C. Yuen, Robust collaborative discriminative learning for RGB-infrared tracking, Proceeding of the AAAI, 2018, pp. 7008–7015.
[25]
X. Lan, P.C. Yuen, R. Chellappa, Robust mil-based feature template learning for object tracking, Proceedings of the AAAI, 2017, pp. 4118–4125.
[26]
X. Lan, S. Zhang, P.C. Yuen, Robust joint discriminative feature learning for visual tracking, Proceedings of the IJCAI, 2016, pp. 3403–3410.
[27]
X. Lan, S. Zhang, P.C. Yuen, R. Chellappa, Learning common and feature-specific patterns: a novel multiple-sparse-representation-based tracker, IEEE Trans. Image Process. 27 (4) (2018) 2022–2037.
[28]
A. Leykin, R.I. Hammoud, Pedestrian tracking by fusion of thermal-visible surveillance videos, Mach. Vis. Appl. 21 (4) (2010) 587–595.
[29]
C. Li, H. Cheng, S. Hu, X. Liu, J. Tang, L. Lin, Learning collaborative sparse representation for grayscale-thermal tracking, IEEE Trans. Image Process. 25 (12) (2016) 5743–5756.
[30]
X. Li, W. Hu, C. Shen, Z. Zhang, A. Dick, A.v.d. Hengel, A survey of appearance models in visual object tracking, ACM Trans. Intell. Syst. Technol. 4 (4) (2013) 58:1–58:48.
[31]
Y. Li, J. Zhu, S.C. Hoi, Reliable patch trackers: robust visual tracking by exploiting reliable patches, Proceedings of the CVPR, 2017, pp. 353–361.
[32]
Z. Li, F. Nie, X. Chang, Y. Yang, Beyond trace ratio: weighted harmonic mean of trace ratios for multiclass discriminant analysis, IEEE Trans. Knowl. Data Eng. 29 (10) (2017) 2100–2110.
[33]
H. Liu, F. Sun, Fusion tracking in color and infrared images using joint sparse representation, Sci. China Inf. Sci. 55 (3) (2012) 590–599.
[34]
R. Liu, X. Lan, P.C. Yuen, G. Feng, Robust visual tracking using dynamic feature weighting based on multiple dictionary learning, Proceedings of the EUSIPCO, 2016, pp. 2166–2170.
[35]
S.-Q. Liu, X. Lan, P.C. Yuen, Remote photoplethysmography correspondence feature for 3d mask face presentation attack detection, Proceedings of the ECCV, 2018, pp. 558–573.
[36]
M. Luo, X. Chang, Z. Li, L. Nie, A.G. Hauptmann, Q. Zheng, Simple to complex cross-modal learning to rank, Comput. Vis. Image Understand. 163 (2017) 67–77.
[37]
B. Ma, H. Hu, J. Shen, Y. Liu, L. Shao, Generalized pooling for robust object tracking, IEEE Trans. Image Process. 25 (9) (2016) 4199–4208.
[38]
B. Ma, J. Shen, Y. Liu, H. Hu, L. Shao, X. Li, Visual tracking using strong classifier and structural local sparse descriptors, IEEE Trans. Multimed. 17 (10) (2015) 1818–1828.
[39]
Z. Ma, X. Chang, Z. Xu, N. Sebe, A.G. Hauptmann, Joint attributes and event analysis for multimedia event detection, IEEE Trans. Neural Netw. Learn. Syst. 29 (7) (2018) 2921–2930.
[40]
X. Mei, H. Ling, Robust visual tracking and vehicle classification via sparse representation, IEEE Trans. Pattern Anal. Mach. Intell. 33 (11) (2011) 2259–2272.
[41]
M. Pang, B. Wang, Y. Cheung, C. Lin, Discriminant manifold learning via sparse coding for robust feature extraction, IEEE Access 5 (2017) 13978–13991.
[42]
X. Peng, C. Lu, Y. Zhang, H. Tang, Connections between nuclear norm and frobenius norm based representation, IEEE Trans. Neural Netw. Learn. Syst. 29 (1) (2018) 218–224.
[43]
X. Peng, J. Lu, Z. Yi, Y. Rui, Automatic subspace learning via principal coefficients embedding, IEEE Trans. Cybern. 47 (11) (2017) 3583–3596.
[44]
X. Peng, H. Tang, L. Zhang, Z. Yi, S. Xiao, A unified framework for representation-based subspace clustering of out-of-sample and large-scale data, IEEE Trans. Neural Netw. Learn. Syst. 27 (12) (2016) 2499–2512.
[45]
S. Salti, A. Cavallaro, L. di Stefano, Adaptive appearance modeling for video tracking: survey and evaluation, IEEE Trans. Image Process. 21 (10) (2012) 4334–4348.
[46]
R. Shao, X. Lan, P.C. Yuen, Joint discriminative learning of deep dynamic textures for 3d mask face anti-spoofing, IEEE Trans. Inf. Forensics Secur. (2018),.
[47]
A.W. Smeulders, D.M. Chu, R. Cucchiara, S. Calderara, A. Dehghan, M. Shah, Visual tracking: an experimental survey, IEEE Trans. Pattern Anal. Mach. Intell. 36 (7) (2014) 1442–1468.
[48]
Y. Song, C. Ma, L. Gong, J. Zhang, R.W.H. Lau, M. Yang, CREST: convolutional residual learning for visual tracking, Proceedings of the ICCV, 2017, pp. 2574–2583.
[49]
Y. Song, C. Ma, X. Wu, L. Gong, L. Bao, W. Zuo, C. Shen, L. Rynson, M.-H. Yang, Vital: visual tracking via adversarial learning, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018.
[50]
Y. Sui, Y. Tang, L. Zhang, Discriminative low-rank tracking, Proceedings of the ICCV, 2015, pp. 3002–3010.
[51]
A. Wang, J. Cai, J. Lu, T. Cham, Modality and component aware feature fusion for RGB-D scene classification, Proceedings of the CVPR, 2016, pp. 5995–6004.
[52]
A. Wang, J. Cai, J. Lu, T.-J. Cham, Mmss: Multi-modal sharable and specific feature learning for RGB-D object recognition, Proceedings of the ICCV, 2015, pp. 1125–1133.
[53]
D. Wang, H. Lu, C. Bo, Visual tracking via weighted local cosine similarity, IEEE Trans. Cybern. 45 (9) (2015) 1838–1850.
[54]
Q. Wang, J. Fang, Y. Yuan, Multi-cue based tracking, Neurocomputing 131 (2014) 227–236.
[55]
S. Wang, X. Chang, X. Li, G. Long, L. Yao, Q.Z. Sheng, Diagnosis code assignment using sparsity-based disease correlation embedding, IEEE Trans. Knowl. Data Eng. 28 (12) (2016) 3191–3202.
[56]
S. Wang, X. FLi, X. Chang, L. Yao, Q.Z. Sheng, G. Long, Learning multiple diagnosis codes for ICU patients with local disease correlation mining, TKDD 11 (3) (2017) 31:1–31:21.
[57]
J. Wright, Y. Ma, J. Mairal, G. Sapiro, T.S. Huang, S. Yan, Sparse representation for computer vision and pattern recognition, Proc. IEEE 98 (6) (2010) 1031–1044.
[58]
Y. Wu, E. Blasch, G. Chen, L. Bai, H. Ling, Multiple source data fusion via sparse representation for robust visual tracking, Proceeding of the International Conference on Information Fusion, 2011, pp. 1–8.
[59]
Y. Wu, J. Lim, M. Yang, Object tracking benchmark, IEEE Trans. Pattern Anal. Mach. Intell. 37 (9) (2015) 1834–1848.
[60]
B. Yang, A.J. Ma, P.C. Yuen, Body parts synthesis for cross-quality pose estimation, IEEE Trans. Circuits Syst. Video Technol. (2018).
[61]
B. Yang, A.J. Ma, P.C. Yuen, Learning domain-shared group-sparse representation for unsupervised domain adaptation, Pattern Recognit. 81 (2018) 615–632.
[62]
L. Yang, C. Chen, H. Wang, B. Zhang, J. Han, Adaptive multi-class correlation filters, Proceeding of the PCM, 2016, pp. 680–688.
[63]
M. Ye, X. Lan, P.C. Yuen, Robust anchor embedding for unsupervised video person re-identification in the wild, Proceedings of the ECCV, 2018, pp. 2651–2664.
[64]
M. Ye, C. Liang, Z. Wang, Q. Leng, J. Chen, Ranking optimization for person re-identification via similarity and dissimilarity, Proceedings of the ACM MM, 2015, pp. 1239–1242.
[65]
M. Ye, C. Liang, Y. Yu, Z. Wang, Q. Leng, C. Xiao, J. Chen, R. Hu, Person reidentification via ranking aggregation of similarity pulling and dissimilarity pushing, IEEE Trans. Multimed. 18 (12) (2016) 2553–2566.
[66]
M. Ye, A.J. Ma, L. Zheng, J. Li, P.C. Yuen, Dynamic label graph matching for unsupervised video re-identification, Proceedings of the ICCV, 2017, pp. 5142–5150.
[67]
M. Ye, Z. Wang, X. Lan, P.C. Yuen, Visible thermal person re-identification via dual-constrained top-ranking, Proceedings of the IJCAI, 2018, pp. 1092–1099.
[68]
X.-T. Yuan, X. Liu, S. Yan, Visual classification with multitask joint sparse representation, Proceedings of the CVPR, 2010, pp. 3493–3500.
[69]
Y. Yuan, J. Fang, Q. Wang, Robust superpixel tracking via depth fusion, IEEE Trans. Circuits Syst. Video Tech. 24 (1) (2014) 15–26.
[70]
B. Zhang, J. Gu, C. Chen, J. Han, X. Su, X. Cao, J. Liu, One-two-one networks for compression artifacts reduction in remote sensing, ISPRS J. Photogramm. Remote Sens. (2018).
[71]
B. Zhang, Z. Li, X. Cao, Q. Ye, C. Chen, L. Shen, A. Perina, R. Ji, Output constraint transfer for kernelized correlation filter in tracking, IEEE Trans. Syst. Man Cybern. Syst. 47 (4) (2017) 693–703.
[72]
B. Zhang, S. Luan, C. Chen, J. Han, W. Wang, A. Perina, L. Shao, Latent constrained correlation filter, IEEE Trans. Image Process. 27 (3) (2018) 1038–1048.
[73]
B. Zhang, A. Perina, Z. Li, V. Murino, J. Liu, R. Ji, Bounding multiple gaussians uncertainty with application to object tracking, Int J. Comput. Vis. 118 (3) (2016) 364–379.
[74]
B. Zhang, Y. Yang, C. Chen, L. Yang, J. Han, L. Shao, Action recognition using 3d histograms of texture and a multi-class boosting classifier., IEEE Trans. Image Process. 26 (10) (2017) 4648–4660.
[75]
J. Zhang, S. Ma, S. Sclaroff, Meem: Robust tracking via multiple experts using entropy minimization, Proceedings of the ECCV, 2014, pp. 188–203.
[76]
K. Zhang, L. Zhang, Q. Liu, D. Zhang, M.-H. Yang, Fast visual tracking via dense spatio-temporal context learning, Proceedings of the ECCV, 2014, pp. 127–141.
[77]
K. Zhang, L. Zhang, M.-H. Yang, Fast compressive tracking, IEEE Trans. Pattern Anal. Mach. Intell. 36 (10) (2014) 2002–2015.
[78]
S. Zhang, X. Lan, Y. Qi, P.C. Yuen, Robust visual tracking via basis matching, IEEE Trans. Circuits Syst. Video Techn. 27 (3) (2017) 421–430.
[79]
S. Zhang, X. Lan, H. Yao, H. Zhou, D. Tao, X. Li, A biologically inspired appearance model for robust visual tracking, IEEE Trans. Neural Netw. Learn. Syst. 28 (10) (2017) 2357–2370.
[80]
S. Zhang, Y. Qi, F. Jiang, X. Lan, P.C. Yuen, H. Zhou, Point-to-set distance metric learning on deep representations for visual tracking, IEEE Trans. Intell. Trans. Syst. 19 (1) (2018) 187–198.
[81]
S. Zhang, H. Yao, X. Sun, X. Lu, Sparse coding based visual tracking: review and experimental comparison, Pattern Recognit. 46 (7) (2013) 1772–1788.
[82]
S. Zhang, H. Yao, H. Zhou, X. Sun, S. Liu, Robust visual tracking based on online learning sparse representation, Neurocomputing 100 (2013) 31–40.
[83]
S. Zhang, H. Zhou, F. Jiang, X. Li, Robust visual tracking using structurally random projection and weighted least squares, IEEE Trans. Circuits Syst. Video Techn. 25 (11) (2015) 1749–1760.
[84]
T. Zhang, B. Ghanem, S. Liu, N. Ahuja, Robust visual tracking via structured multi-task sparse learning, Int. J. Comput. Vis. 101 (2) (2013) 367–383.
[85]
T. Zhang, S. Liu, C. Xu, S. Yan, B. Ghanem, N. Ahuja, M.-H. Yang, Structural sparse tracking, Proceedings of the CVPR, 2015, pp. 150–158.
[86]
B. Zhong, H. Yao, S. Chen, R. Ji, T. Chin, H. Wang, Visual tracking via weakly supervised learning from multiple imperfect oracles, Pattern Recognit. 47 (3) (2014) 1395–1410.
[87]
J.T. Zhou, I.W. Tsang, S.-s. Ho, K.-R. Müller, N-Ary decomposition for multi-class classification, Mach. Learn. (2018).
[88]
J.T. Zhou, H. Zhao, X. Peng, M. Fang, Z. Qin, R.S.M. Goh, Transfer hashing: from shallow to deep, IEEE Trans. Neural Netw. Learn. Syst. (2018),.
[89]
W.W. Zou, P.C. Yuen, R. Chellappa, Low-resolution face tracker robust to illumination variations, IEEE Trans. Image Process. 22 (5) (2013) 1726–1739.

Cited By

View all
  • (2024)Review and Analysis of RGBT Single Object Tracking Methods: A Fusion PerspectiveACM Transactions on Multimedia Computing, Communications, and Applications10.1145/365130820:8(1-27)Online publication date: 7-Mar-2024
  • (2024)Efficient Image Classification via Structured Low-Rank Matrix Factorization RegressionIEEE Transactions on Information Forensics and Security10.1109/TIFS.2023.333771719(1496-1509)Online publication date: 1-Jan-2024
  • (2024)Knowledge Synergy Learning for Multi-Modal TrackingIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.335257334:7(5519-5532)Online publication date: 1-Jul-2024
  • Show More Cited By

Index Terms

  1. Modality-correlation-aware sparse representation for RGB-infrared object tracking
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image Pattern Recognition Letters
          Pattern Recognition Letters  Volume 130, Issue C
          Feb 2020
          386 pages

          Publisher

          Elsevier Science Inc.

          United States

          Publication History

          Published: 01 February 2020

          Qualifiers

          • Rapid-communication

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)0
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 18 Aug 2024

          Other Metrics

          Citations

          Cited By

          View all
          • (2024)Review and Analysis of RGBT Single Object Tracking Methods: A Fusion PerspectiveACM Transactions on Multimedia Computing, Communications, and Applications10.1145/365130820:8(1-27)Online publication date: 7-Mar-2024
          • (2024)Efficient Image Classification via Structured Low-Rank Matrix Factorization RegressionIEEE Transactions on Information Forensics and Security10.1109/TIFS.2023.333771719(1496-1509)Online publication date: 1-Jan-2024
          • (2024)Knowledge Synergy Learning for Multi-Modal TrackingIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2024.335257334:7(5519-5532)Online publication date: 1-Jul-2024
          • (2023)Robust RGB-T Tracking via Adaptive Modality Weight Correlation Filters and Cross-modality LearningACM Transactions on Multimedia Computing, Communications, and Applications10.1145/363010020:4(1-20)Online publication date: 25-Oct-2023
          • (2023)Dynamic Fusion Network for RGBT TrackingIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2022.322983024:4(3822-3832)Online publication date: 1-Apr-2023
          • (2023)SwinEFT: a robust and powerful Swin Transformer based Event Frame TrackerApplied Intelligence10.1007/s10489-023-04763-653:20(23564-23581)Online publication date: 13-Jul-2023
          • (2022)An End-to-end Heterogeneous Restraint Network for RGB-D Cross-modal Person Re-identificationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/350670818:4(1-22)Online publication date: 4-Mar-2022
          • (2022)M5L: Multi-Modal Multi-Margin Metric Learning for RGBT TrackingIEEE Transactions on Image Processing10.1109/TIP.2021.312550431(85-98)Online publication date: 1-Jan-2022
          • (2022)A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasetsThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-021-02166-738:8(2939-2970)Online publication date: 1-Aug-2022

          View Options

          View options

          Get Access

          Login options

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media