Abstract
Multi-Object Tracking, also known as Multi-Target Tracking, is an important area of computer vision with various applications in different domains. The advent of deep learning has had a profound impact on this field, forcing researchers to explore innovative avenues. Deep learning methods have become the cornerstone of today's state-of-the-art solutions, consistently delivering exceptional tracking results. However, the significant computational demands of deep learning models require powerful hardware resources that do not always match real-time tracking requirements, limiting their practical applicability in real-world scenarios. Thus, there is an imperative to strike a balance by merging robust deep learning strategies with conventional approaches to enable more accessible, cost-effective solutions that meet real-time requirements. This paper embarks on this endeavor by presenting a hybrid strategy for real-time multi-target tracking. It effectively combines a classical optical flow algorithm with a deep learning architecture tailored for human crowd tracking systems. This hybrid approach achieves a commendable balance between tracking accuracy and computational efficiency. The proposed architecture, subjected to extensive experimentation in various settings, demonstrated notable results, achieving a Mean Object Tracking Accuracy (MOTA) of 0.608. This level of performance placed it as the highest ranking solution on the MOT15 benchmark, surpassing the state-of-the-art benchmark of 0.549, and consistently ranked among the superior models on the MOT17 and MOT20 benchmarks. Additionally, the incorporation of the optical flow phase resulted in a substantial reduction in processing time, nearly halving the duration, while simultaneously maintaining accuracy levels comparable to established techniques.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data Availability
The datasets used for both training and testing purposes may be provided upon request.
Notes
The official GitHub repository of this work is freely available at https://github.com/Dantekk/A-hybrid-approach-to-Real-Time-Multi-Target-Tracking.
References
Zhao Z, Chen Z, Voros S, Cheng X (2019) Real-time tracking of surgical instruments based on spatio-temporal context and deep learning. Comput Assist Surg 24(sup1):20–29. https://doi.org/10.1080/24699322.2018.1560097
Amamra A (2021) Smooth head tracking for virtual reality applications. arXiv:2110.14193
Chandrajit M, Girisha R, Vasudev T (2016) Multiple objects tracking in surveillance video using color and hu moments. arXiv:1608.06148
Leal-Taixé L, Milan A, Reid ID, Roth S, Schindler K (2015) Motchallenge 2015: toward a benchmark for multi-target tracking. arXiv:1504.01942
Hornáková A, Henschel R, Rosenhahn B, Swoboda P (2020) Lifted disjoint paths with application in multiple object tracking. arXiv:2006.14550
Brasó G, Leal-Taixé L (2019) Learning a neural solver for multiple object tracking. arXiv:1912.07515
Hornáková A, Kaiser T, Swoboda P, Rolínek M, Rosenhahn B, Henschel R (2021) Making higher order MOT scalable: an efficient approximate solver for lifted disjoint paths. arXiv:2108.10606
Yang J, Ge H, Yang J, Tong Y, Su S (2022) Online multi-object tracking using multi-function integration and tracking simulation training. Appl Intell 52(2):1268–1288. https://doi.org/10.1007/s10489-021-02457-5
Papakis I, Sarkar A, Karpatne A (2020) Gcnnmatch: graph convolutional neural networks for multi-object tracking via sinkhorn normalization. arXiv:2010.00067
Bergmann P, Meinhardt T, Leal-Taixe L (2019) Tracking without bells and whistles. In: 2019 IEEE/CVF international conference on computer vision (ICCV). IEEE computer society, Los Alamitos, CA, USA, pp 941–951. https://doi.org/10.1109/ICCV.2019.00103
Han J, Li W, Pan F, Zheng D, Gao Q (2022) Spatial-attention location-aware multi-object tracking. In: 2022 41st Chinese Control Conference (CCC), pp 6341–6346. https://doi.org/10.23919/CCC55666.2022.9902510
Xu Y, Ban Y, Alameda-Pineda X, Horaud R (2019) Deepmot: a differentiable framework for training multiple object trackers. arXiv:1906.06618
Gu F, Lu J, Cai C, Zhu Q, Ju Z (2023) Eantrack: an efficient attention network for visual tracking. IEEE Trans Autom Sci Eng. https://doi.org/10.1109/TASE.2023.3319676
Yuan D, Shu X, Liu Q, He Z (2023) Aligned spatial-temporal memory network for thermal infrared target tracking. IEEE Trans Circuits Syst II Express Briefs 70(3):1224–1228. https://doi.org/10.1109/TCSII.2022.3223871
Gu F, Lu J, Cai C (2022) Rpformer: a robust parallel transformer for visual tracking in complex scenes. IEEE Trans Instrum Meas 71:1–14. https://doi.org/10.1109/TIM.2022.3170972
Gu F, Lu J, Cai C, Zhu Q, Ju Z (2023) Repformer: a robust shared-encoder dual-pipeline transformer for visual tracking. Neural Comput Appl 35(28):20581
Zhang Y, Wang C, Wang X, Zeng W, Liu W (2021) Fairmot: on the fairness of detection and re-identification in multiple object tracking. Int J Comput Vis 129:1–19. https://doi.org/10.1007/s11263-021-01513-4
Zhang Y, Sun P, Jiang Y, Yu D, Yuan Z, Luo P, Liu W, Wang X (2021) Bytetrack: multi-object tracking by associating every detection box. arXiv:2110.06864
Duan K, Bai S, Xie L, Qi H, Huang Q, Tian Q (2019) Centernet: keypoint triplets for object detection. In: 2019 IEEE/CVF international conference on computer vision (ICCV), pp 6568–6577. https://doi.org/10.1109/ICCV.2019.00667
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 770–778. https://doi.org/10.1109/CVPR.2016.90
Kalman RE (1960) A new approach to linear filtering and prediction problems. Trans ASME-J Basic Eng 82(Series D):35–45
Lin T-Y, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: 2017 IEEE international conference on computer vision (ICCV), pp 2999–3007. https://doi.org/10.1109/ICCV.2017.324
Liu Z, Mao H, Wu C, Feichtenhofer C, Darrell T, Xie S (2022) A convnet for the 2020s. arXiv:2201.03545
Tan M, Le QV (2019) Efficientnet: rethinking model scaling for convolutional neural networks. arXiv:1905.11946
Kim S-W, Kook H-K, Sun J-Y, Kang M-C, Ko S-J (2018) Parallel feature pyramid network for object detection. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Computer vision—ECCV 2018. Springer, Cham, pp 239–256
Lucas BD, Kanade T (1981) An iterative image registration technique with an application to stereo vision. IJCAI'81. Morgan Kaufmann Publishers Inc., San Francisco, pp 674–679
Rosten E, Drummond T (2006) Machine learning for high-speed corner detection. In: Leonardis A, Bischof H, Pinz A (eds) Computer vision—ECCV 2006. Springer, Berlin, pp 430–443
Fischler MA, Bolles RC (1981) Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun ACM 24(6):381–395. https://doi.org/10.1145/358669.358692
Sun P, Cao J, Jiang Y, Yuan Z, Bai S, Kitani K, Luo P (2021) Dancetrack: multi-object tracking in uniform appearance and diverse motion. arXiv:2111.14690
Shao S, Zhao Z, Li B, Xiao T, Yu G, Zhang X, Sun J (2018) Crowdhuman: a benchmark for detecting human in a crowd. arXiv:1805.00123
Zhang S, Benenson R, Schiele B (2017) Citypersons: a diverse dataset for pedestrian detection. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 4457–4465
Xiao T, Li S, Wang B, Lin L, Wang X (2016) End-to-end deep learning for person search. arXiv:1604.01850
Zheng L, Zhang H, Sun S, Chandraker M, Tian Q (2016) Person re-identification in the wild. arXiv:1604.02531
Ess A, Leibe B, Schindler K, Van Gool L (2008) A mobile vision system for robust multi-person tracking. In: 2008 IEEE conference on computer vision and pattern recognition, pp 1–8. https://doi.org/10.1109/CVPR.2008.4587581
Milan A, Leal-Taixé L, Reid ID, Roth S, Schindler K (2016) MOT16: a benchmark for multi-object tracking. arXiv:1603.00831
Dendorfer P, Rezatofighi H, Milan A, Shi J, Cremers D, Reid ID, Roth S, Schindler K, Leal-Taixé L (2020) MOT20: a benchmark for multi object tracking in crowded scenes. arXiv:2003.09003
Kingma DP, Ba J (2017) Adam: A Method for Stochastic Optimization
Ristani E, Solera F, Zou R, Cucchiara R, Tomasi C (2016) Performance measures and a data set for multi-target, multi-camera tracking. In: Hua G, Jégou H (eds) Computer Vision, ECCV 2016 Workshops. Springer, Cham, pp 17–35
Luiten J, Osep A, Dendorfer P, Torr PHS, Geiger A, Leal-Taixé L, Leibe B (2020) HOTA: a higher order metric for evaluating multi-object tracking. arXiv:2009.07736
Li Y, Huang C, Nevatia R (2009) Learning to associate: HybridBoosted multi-target tracker for crowded scene. In: 2009 IEEE conference on computer vision and pattern recognition, pp 2953–2960. https://doi.org/10.1109/CVPR.2009.5206735
Tokmakov P, Li J, Burgard W, Gaidon A (2021) Learning to track with object permanence. arXiv:2103.14258
Xu Y, Ban Y, Delorme G, Gan C, Rus D, Alameda-Pineda X (2022) TransCenter: transformers with dense representations for multiple-object tracking. IEEE Trans Pattern Anal Mach Intell 45(6):7820–7835
Boragule A, Jang H, Ha N, Jeon M (2022) Pixel-guided association for multi-object tracking. Sensors. https://doi.org/10.3390/s22228922
Zeng K, You Y, Shen T, Qingwang W, Tao Z, Wang Z, Liu Q (2023) NCT: noise-control multi-object tracking. Complex Intell Syst 9:1–17
Liu Q, Chen D, Chu Q, Yuan L, Liu B, Zhang L, Yu N (2022) Online multi-object tracking with unsupervised re-identification learning and occlusion estimation. Neurocomputing 483:333
Girbau A, Giró-i-Nieto X, Rius I, Marqués F (2021)Multiple object tracking with mixture density networks for trajectory estimation. arXiv:2106.10950
You S, Yao H, Bao B-K, Xu C (2023) UTM: a unified multiple object tracking model with identity-aware feature enhancement. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 21876–21886
Kawanishi Y (2022) Label-based multiple object ensemble tracking with randomized frame dropping. In: 2022 26th international conference on pattern recognition (ICPR), pp 900–906. https://doi.org/10.1109/ICPR56361.2022.9956158
Cetintas O, Brasó G, Leal-Taixé L (2023) Unifying short and long-term tracking with graph hierarchies
Stadler D, Beyerer J (2021) Multi-pedestrian tracking with clusters. In: 2021 17th IEEE international conference on advanced video and signal based surveillance (AVSS), pp 1–10. https://doi.org/10.1109/AVSS52988.2021.9663829
Stadler D, Beyerer J (2021) Improving multiple pedestrian tracking by track management and occlusion handling. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 10958–10967
Cao J, Pang J, Weng X, Khirodkar R, Kitani K (2023) Observation-centric sort: rethinking sort for robust multi-object tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9686–9696
Ilg E, Mayer N, Saikia T, Keuper M, Dosovitskiy A, Brox T (2016) FlowNet 2.0: evolution of optical flow estimation with deep networks
Author information
Authors and Affiliations
Contributions
All authors contributed equally to this work.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no Conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Scarrica, V.M., Panariello, C., Ferone, A. et al. A hybrid approach to real-time multi-target tracking. Neural Comput & Applic 36, 10055–10066 (2024). https://doi.org/10.1007/s00521-024-09799-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-024-09799-4