Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1007/978-981-96-0901-7_17guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

BoT-FaceSORT: Bag-of-Tricks for Robust Multi-face Tracking in Unconstrained Videos

Published: 08 December 2024 Publication History

Abstract

Multi-face tracking (MFT) is a subtask of multi-object tracking (MOT) that focuses on detecting and tracking multiple faces across video frames. Modern MOT trackers adopt the Kalman filter (KF), a linear model that estimates current motions based on previous observations. However, these KF-based trackers struggle to predict motions in unconstrained videos with frequent shot changes, occlusions, and appearance variations. To address these limitations, we propose BoT-FaceSORT, a novel MFT framework that integrates shot change detection, shared feature memory, and an adaptive cascade matching strategy for robust tracking. It detects shot changes by comparing the color histograms of adjacent frames and resets KF states to handle discontinuities. Additionally, we introduce MovieShot, a new benchmark of challenging movie clips to evaluate MFT performance in unconstrained scenarios. We also demonstrate the superior performance of our method compared to existing methods on three benchmarks, while an ablation study validates the effectiveness of each component in handling unconstrained videos.

References

[1]
Aharon, N., Orfaig, R., Bobrovsky, B.Z.: Bot-sort: Robust associations multi-pedestrian tracking. arXiv preprint arXiv:2206.14651 (2022)
[2]
Bernardin K and Stiefelhagen R Evaluating multiple object tracking performance: the clear mot metrics EURASIP Journal on Image and Video Processing 2008 2008 1-10
[3]
Bewley, A., Ge, Z., Ott, L., Ramos, F., Upcroft, B.: Simple online and realtime tracking. In: 2016 IEEE International Conference on Image Processing (ICIP). pp. 3464–3468 (2016)
[4]
Cao, J., Pang, J., Weng, X., Khirodkar, R., Kitani, K.: Observation-centric sort: Rethinking sort for robust multi-object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 9686–9696 (June 2023)
[5]
Dendorfer, P., Rezatofighi, H., Milan, A., Shi, J., Cremers, D., Reid, I., Roth, S., Schindler, K., Leal-Taixé, L.: Mot20: A benchmark for multi object tracking in crowded scenes. arXiv preprint arXiv:2003.09003 (2020)
[6]
Deng, J., Guo, J., Ververas, E., Kotsia, I., Zafeiriou, S.: Retinaface: Single-shot multi-level face localisation in the wild. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (June 2020)
[7]
Deng, J., Guo, J., Xue, N., Zafeiriou, S.: Arcface: Additive angular margin loss for deep face recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (June 2019)
[8]
Du Y, Zhao Z, Song Y, Zhao Y, Su F, Gong T, and Meng H Strongsort: Make deepsort great again IEEE Trans. Multimedia 2023 25 8725-8737
[9]
Fang Y, Ko S, and Jo GS Robust visual tracking based on global-and-local search with confidence reliability estimation Neurocomputing 2019 367 273-286
[10]
Ge, Z., Liu, S., Wang, F., Li, Z., Sun, J.: Yolox: Exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430 (2021)
[11]
Giancola, S., Amine, M., Dghaily, T., Ghanem, B.: Soccernet: A scalable dataset for action spotting in soccer videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops (June 2018)
[12]
Guo, J., Deng, J., Lattas, A., Zafeiriou, S.: Sample and computation redistribution for efficient face detection. arXiv preprint arXiv:2105.04714 (2021)
[13]
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2016)
[14]
Kalman, R.E.: A New Approach to Linear Filtering and Prediction Problems. Journal of Basic Engineering 82(1), 35–45 (03 1960)
[15]
Kim, M., Jain, A.K., Liu, X.: Adaface: Quality adaptive margin for face recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 18750–18759 (June 2022)
[16]
Kuhn HW The hungarian method for the assignment problem Naval Research Logistics Quarterly 1955 2 1–2 83-97
[17]
Leal-Taixé, L., Milan, A., Reid, I., Roth, S., Schindler, K.: Motchallenge 2015: Towards a benchmark for multi-target tracking. arXiv preprint arXiv:1504.01942 (2015)
[18]
Lin, C.C., Hung, Y.: A prior-less method for multi-face tracking in unconstrained videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2018)
[19]
Luiten J, Osep A, Dendorfer P, Torr P, Geiger A, Leal-Taixé L, and Leibe B Hota: A higher order metric for evaluating multi-object tracking Int. J. Comput. Vision 2021 129 548-578
[20]
Luo W, Xing J, Milan A, Zhang X, Liu W, and Kim TK Multiple object tracking: A literature review Artif. Intell. 2021 293 103448
[21]
Maggiolino, G., Ahmad, A., Cao, J., Kitani, K.: Deep oc-sort: Multi-pedestrian tracking by adaptive re-identification. In: 2023 IEEE International Conference on Image Processing (ICIP). pp. 3025–3029 (2023)
[22]
Milan, A., Leal-Taixé, L., Reid, I., Roth, S., Schindler, K.: Mot16: A benchmark for multi-object tracking. arXiv preprint arXiv:1603.00831 (2016)
[23]
Pele O and Werman M Daniilidis K, Maragos P, and Paragios N The Quadratic-Chi Histogram Distance Family Computer Vision – ECCV 2010 2010 Heidelberg Springer 749-762 6312
[24]
Pernici, F., Bartoli, F., Bruni, M., Del Bimbo, A.: Memory based online learning of deep representations from video streams. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2018)
[25]
Pernici F, Bruni M, and Del Bimbo A Self-supervised on-line cumulative learning from video streams Comput. Vis. Image Underst. 2020 197–198 102983
[26]
Ristani, E., Solera, F., Zou, R., Cucchiara, R., Tomasi, C.: Performance measures and a data set for multi-target, multi-camera tracking. In: Computer Vision – ECCV 2016 Workshops. pp. 17–35. Springer International Publishing, Cham (2016)
[27]
Shao, S., Zhao, Z., Li, B., Xiao, T., Yu, G., Zhang, X., Sun, J.: Crowdhuman: A benchmark for detecting human in a crowd. arxiv 2018. arXiv preprint arXiv:1805.00123 (2018)
[28]
Sun, P., Cao, J., Jiang, Y., Yuan, Z., Bai, S., Kitani, K., Luo, P.: Dancetrack: Multi-object tracking in uniform appearance and diverse motion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 20993–21002 (June 2022)
[29]
Wang, C.Y., Bochkovskiy, A., Liao, H.Y.M.: Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 7464–7475 (June 2023)
[30]
Wang Z, Zheng L, Liu Y, Li Y, and Wang S Vedaldi A, Bischof H, Brox T, and Frahm J-M Towards Real-Time Multi-Object Tracking Computer Vision – ECCV 2020 2020 Cham Springer 107-122 12356
[31]
Wojke, N., Bewley, A., Paulus, D.: Simple online and realtime tracking with a deep association metric. In: 2017 IEEE International Conference on Image Processing (ICIP). pp. 3645–3649 (2017)
[32]
Wong, Y., Chen, S., Mau, S., Sanderson, C., Lovell, B.C.: Patch-based probabilistic image quality assessment for face selection and improved video-based face recognition. In: CVPR 2011 WORKSHOPS. pp. 74–81. IEEE (2011)
[33]
Yang M, Han G, Yan B, Zhang W, Qi J, Lu H, and Wang D Hybrid-sort: Weak cues matter for online multi-object tracking Proceedings of the AAAI Conference on Artificial Intelligence 2024 38 7 6504-6512
[34]
Zhang S, Gong Y, Huang J-B, Lim J, Wang J, Ahuja N, and Yang M-H Leibe B, Matas J, Sebe N, and Welling M Tracking Persons-of-Interest via Adaptive Discriminative Features Computer Vision – ECCV 2016 2016 Cham Springer 415-433 9909
[35]
Zhang S, Huang JB, Lim J, Gong Y, Wang J, Ahuja N, and Yang MH Tracking persons-of-interest via unsupervised representation adaptation Int. J. Comput. Vision 2020 128 96-120
[36]
Zhang Y, Sun P, Jiang Y, Yu D, Weng F, Yuan Z, Luo P, Liu W, and Wang X Avidan S, Brostow G, Cissé M, Farinella GM, and Hassner T Bytetrack: Multi-object tracking by associating every detection box Computer Vision - ECCV 2022 2022 Cham Springer Nature Switzerland 1-21
[37]
Zhang Y, Wang C, Wang X, Zeng W, and Liu W Fairmot: On the fairness of detection and re-identification in multiple object tracking Int. J. Comput. Vision 2021 129 3069-3087
[38]
Zhu, Z., Huang, G., Deng, J., Ye, Y., Huang, J., Chen, X., Zhu, J., Yang, T., Lu, J., Du, D., Zhou, J.: Webface260m: A benchmark unveiling the power of million-scale deep face recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 10492–10502 (June 2021)

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
Computer Vision – ACCV 2024: 17th Asian Conference on Computer Vision, Hanoi, Vietnam, December 8–12, 2024, Proceedings, Part II
Dec 2024
520 pages
ISBN:978-981-96-0900-0
DOI:10.1007/978-981-96-0901-7
  • Editors:
  • Minsu Cho,
  • Ivan Laptev,
  • Du Tran,
  • Angela Yao,
  • Hongbin Zha

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 08 December 2024

Author Tags

  1. Multi-Face Tracking
  2. SORT
  3. Kalman Filter

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 31 Jan 2025

Other Metrics

Citations

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media