An Efficient One-Shot Network and Robust Data Associations in Multi-pedestrian Tracking

He, Fuxiao; Xiao, Guoqiang

doi:10.1007/978-3-031-40286-9_10

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14118))

Included in the following conference series:

International Conference on Knowledge Science, Engineering and Management

832 Accesses

Abstract

Recently, one-shot trackers, which integrate multi-tasks into a unified network, achieve good performances in multi-object video tracking and successfully handle the core challenge of multi-object tracking, that is, how to realize the trade-off between the high accuracy and real-time performance. In this paper, we abandon the traditional approach of redundant backbones and feature fusion networks commonly used by one-shot trackers, and propose a new one-shot model that is faster and lighter. We propose a new channel-spatial attention module to improve the detection and re-identification performance of the one-shot model for more robust tracking. Furthermore, in order to deal with complex video tracking scenarios more robust, we have made innovations in data association and proposed a new robust association method, which combines the advantages of the motion, appearance and the detection information to associate. On the MOT20 testing set, our proposed one-shot model with robust associations termed as BFMOT reduces the number of ID switches by 52.1% and improves the tracking accuracy (i.e. MOTA) by 6.7% compared with the state-of-the-art tracker. BFMOT runs close to 30 FPS on MOT16,17 testing sets, which is more oriented to real-time tracking.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Zhou, X., Wang, D., Krähenbühl, P.: Objects as points. arXiv preprint arXiv:1904.07850 (2019)
Zhou, X., Koltun, V., Krähenbühl, P.: Tracking objects as points. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12349, pp. 474–490. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58548-8_28
Chapter Google Scholar
Zhang, Y., et al.: FairMOT: on the fairness of detection and re-identification in multiple object tracking. Int. J. Comput. Vis. 129(11), 3069–3087 (2021)
Google Scholar
Wu, J., et al.: Track to detect and segment: an online multi-object tracker. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12352–12361 (2021)
Google Scholar
Wang, Y., Kitani, K., Weng, X.: Joint object detection and multi-object tracking with graph neural networks. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 13708–13715 (2021)
Google Scholar
Mostafa, R., Baraka, H., Bayoumi, A.: LMOT: efficient light-weight detection and tracking in crowds. IEEE Access 10, 83085–83095 (2022)
Google Scholar
Wojke, N., Bewley, A., Paulus, P.: Simple online and realtime tracking with a deep association metric. In: 2017 IEEE International Conference on Image Processing (ICIP), pp. 3645–3649 (2017)
Google Scholar
Chen, L., et al.: Real-time multiple people tracking with deeply learned candidate selection and person re-identification. In: 2018 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6 (2018)
Google Scholar
Zhang, Y., et al.: ByteTrack: multi-object tracking by associating every detection box. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision – ECCV 2022. LNCS, vol. 13682, pp. 1–21. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20047-2_1
Hu, J., Li S., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
Google Scholar
Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: CBAM: convolutional block attention module. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 3–19. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_1
Chapter Google Scholar
Kalman, R.E.: A new approach to linear filtering and prediction problems, pp. 35–45 (1960)
Google Scholar
Lu, Z., et al.: RetinaTrack: online single stage joint detection and tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14668–14678 (2020)
Google Scholar
Wang, Z., Zheng, L., Liu, Y., Li, Y., Wang, S.: Towards real-time multi-object tracking. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12356, pp. 107–122. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58621-8_7
Chapter Google Scholar
Liang, C., et al.: Rethinking the competition between detection and Re-ID in Multi-Object Tracking. arXiv preprint arXiv:2010.12138v2 (2020)
Liang, C., et al.: One more check: making “fake background” be tracked again. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 2, pp. 1546–1554 (2022)
Google Scholar
Wang, Q., et al.: Multiple object tracking with correlation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3876–3886 (2021)
Google Scholar
Bewley, A., et al.: Simple online and realtime tracking. In: 2016 IEEE International Conference on Image Processing (ICIP), pp. 3464–3468 (2016)
Google Scholar
Wang, C., Alexey, B., Liao, H.: YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. Preprint arXiv:2207.02696 (2022)
Lin, T., et al.: Feature pyramid networks for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)
Google Scholar
Tan, M., Pang, R., Le, Q.: EfficientDet: scalable and efficient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10781–10790 (2020)
Google Scholar
Milan, A., Leal-Taixé, L., Reid, I., Roth, S., Schindler, K.: MOT16: a benchmark for multi-object tracking. arXiv preprint arXiv:1603.00831 (2016)
Dendorfer, P., et al.: MOTChallenge: a benchmark for single-camera multiple target tracking. Int. J. Comput. Vis. 129(4), 845–881 (2021)
Google Scholar
Lin, T., et al.: Focal loss for dense object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2980–2988 (2017)
Google Scholar
Kuhn, H.: The Hungarian method for the assignment problem. Naval Res. Logistics Q. 2.1-2, 83–97 (1955)
Google Scholar
Zhang, S., Benenson, R., Schiele, B.: CityPersons: a diverse dataset for pedestrian detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3213–3221 (2017)
Google Scholar
Ess, A., et al.: A mobile vision system for robust multi-person tracking. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2008)
Google Scholar
Aharon, N., Orfaig R., Bobrovsky B.: BoT-SORT: robust associations multi-pedestrian tracking. arXiv preprint arXiv:2206.14651v2 (2022)
Dollár, P., et al.: Pedestrian detection: a benchmark. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 304–311 (2009)
Google Scholar
Xiao, T., et al.: Joint detection and identification feature learning for person search. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3415–3424 (2017)
Google Scholar
Zheng, L., et al.: Person re-identification in the wild. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1367–1376 (2017)
Google Scholar
Shao, S., et al.: CrowdHuman: a benchmark for detecting human in a crowd. arXiv preprint arXiv:1805.00123 (2018)
Dendorfer, P., et al.: MOT20: a benchmark for multi object tracking in crowded scenes. arXiv preprint arXiv:2003.09003 (2020)
Bernardin, K., Stiefelhagen, R.: Evaluating multiple object tracking performance: the clear MOT metrics. EURASIP J. Image Video Process. 1–10 (2008)
Google Scholar
Luiten, J., et al.: HOTA: a higher order metric for evaluating multi-object tracking. Int. J. Comput. Vis. 129(2), 548–578 (2021)
Google Scholar
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Glenn, J.: https://github.com/ultralytics/yolov5/releases/tag/v6.1
Cai, J., et al.: MeMOT: multi-object tracking with memory. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8090–8100 (2022)
Google Scholar
Meinhardt, T., et al.: TrackFormer: multi-object tracking with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8844–8854 (2022)
Google Scholar
Zhou, X., et al.: Global tracking transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8771–8780 (2022)
Google Scholar
Hyun, J., et al.: Detection recovery in online multi-object tracking with Sparse Graph Tracker. arXiv preprint arXiv:2205.00968 (2022)
Zhu, T., et al.: Looking beyond two frames: end-to-end multi-object tracking using spatial and temporal transformers. IEEE Trans. Pattern Anal. Mach. Intell. (2022)
Google Scholar

Download references

Author information

Authors and Affiliations

College of Computer and Information Science, Southwest University, Chongqin, China
Fuxiao He & Guoqiang Xiao

Authors

Fuxiao He
View author publications
You can also search for this author in PubMed Google Scholar
Guoqiang Xiao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guoqiang Xiao .

Editor information

Editors and Affiliations

Peking University, Beijing, China
Zhi Jin
South China Normal University, Guangzhou, China
Yuncheng Jiang
Babeș-Bolyai University, Cluj-Napoca, Romania
Robert Andrei Buchmann
Ulster University, Belfast, UK
Yaxin Bi
Babeș-Bolyai University, Cluj-Napoca, Romania
Ana-Maria Ghiran
South China Normal University, Guangzhou, China
Wenjun Ma

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

He, F., Xiao, G. (2023). An Efficient One-Shot Network and Robust Data Associations in Multi-pedestrian Tracking. In: Jin, Z., Jiang, Y., Buchmann, R.A., Bi, Y., Ghiran, AM., Ma, W. (eds) Knowledge Science, Engineering and Management. KSEM 2023. Lecture Notes in Computer Science(), vol 14118. Springer, Cham. https://doi.org/10.1007/978-3-031-40286-9_10

Download citation

DOI: https://doi.org/10.1007/978-3-031-40286-9_10
Published: 09 August 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-40285-2
Online ISBN: 978-3-031-40286-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

An Efficient One-Shot Network and Robust Data Associations in Multi-pedestrian Tracking