Abstract
Recently, one-shot trackers, which integrate multi-tasks into a unified network, achieve good performances in multi-object video tracking and successfully handle the core challenge of multi-object tracking, that is, how to realize the trade-off between the high accuracy and real-time performance. In this paper, we abandon the traditional approach of redundant backbones and feature fusion networks commonly used by one-shot trackers, and propose a new one-shot model that is faster and lighter. We propose a new channel-spatial attention module to improve the detection and re-identification performance of the one-shot model for more robust tracking. Furthermore, in order to deal with complex video tracking scenarios more robust, we have made innovations in data association and proposed a new robust association method, which combines the advantages of the motion, appearance and the detection information to associate. On the MOT20 testing set, our proposed one-shot model with robust associations termed as BFMOT reduces the number of ID switches by 52.1% and improves the tracking accuracy (i.e. MOTA) by 6.7% compared with the state-of-the-art tracker. BFMOT runs close to 30 FPS on MOT16,17 testing sets, which is more oriented to real-time tracking.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Zhou, X., Wang, D., Krähenbühl, P.: Objects as points. arXiv preprint arXiv:1904.07850 (2019)
Zhou, X., Koltun, V., Krähenbühl, P.: Tracking objects as points. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12349, pp. 474–490. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58548-8_28
Zhang, Y., et al.: FairMOT: on the fairness of detection and re-identification in multiple object tracking. Int. J. Comput. Vis. 129(11), 3069–3087 (2021)
Wu, J., et al.: Track to detect and segment: an online multi-object tracker. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12352–12361 (2021)
Wang, Y., Kitani, K., Weng, X.: Joint object detection and multi-object tracking with graph neural networks. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 13708–13715 (2021)
Mostafa, R., Baraka, H., Bayoumi, A.: LMOT: efficient light-weight detection and tracking in crowds. IEEE Access 10, 83085–83095 (2022)
Wojke, N., Bewley, A., Paulus, P.: Simple online and realtime tracking with a deep association metric. In: 2017 IEEE International Conference on Image Processing (ICIP), pp. 3645–3649 (2017)
Chen, L., et al.: Real-time multiple people tracking with deeply learned candidate selection and person re-identification. In: 2018 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6 (2018)
Zhang, Y., et al.: ByteTrack: multi-object tracking by associating every detection box. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision – ECCV 2022. LNCS, vol. 13682, pp. 1–21. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20047-2_1
Hu, J., Li S., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: CBAM: convolutional block attention module. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 3–19. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_1
Kalman, R.E.: A new approach to linear filtering and prediction problems, pp. 35–45 (1960)
Lu, Z., et al.: RetinaTrack: online single stage joint detection and tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14668–14678 (2020)
Wang, Z., Zheng, L., Liu, Y., Li, Y., Wang, S.: Towards real-time multi-object tracking. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12356, pp. 107–122. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58621-8_7
Liang, C., et al.: Rethinking the competition between detection and Re-ID in Multi-Object Tracking. arXiv preprint arXiv:2010.12138v2 (2020)
Liang, C., et al.: One more check: making “fake background” be tracked again. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 2, pp. 1546–1554 (2022)
Wang, Q., et al.: Multiple object tracking with correlation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3876–3886 (2021)
Bewley, A., et al.: Simple online and realtime tracking. In: 2016 IEEE International Conference on Image Processing (ICIP), pp. 3464–3468 (2016)
Wang, C., Alexey, B., Liao, H.: YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. Preprint arXiv:2207.02696 (2022)
Lin, T., et al.: Feature pyramid networks for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)
Tan, M., Pang, R., Le, Q.: EfficientDet: scalable and efficient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10781–10790 (2020)
Milan, A., Leal-Taixé, L., Reid, I., Roth, S., Schindler, K.: MOT16: a benchmark for multi-object tracking. arXiv preprint arXiv:1603.00831 (2016)
Dendorfer, P., et al.: MOTChallenge: a benchmark for single-camera multiple target tracking. Int. J. Comput. Vis. 129(4), 845–881 (2021)
Lin, T., et al.: Focal loss for dense object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 2980–2988 (2017)
Kuhn, H.: The Hungarian method for the assignment problem. Naval Res. Logistics Q. 2.1-2, 83–97 (1955)
Zhang, S., Benenson, R., Schiele, B.: CityPersons: a diverse dataset for pedestrian detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3213–3221 (2017)
Ess, A., et al.: A mobile vision system for robust multi-person tracking. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2008)
Aharon, N., Orfaig R., Bobrovsky B.: BoT-SORT: robust associations multi-pedestrian tracking. arXiv preprint arXiv:2206.14651v2 (2022)
Dollár, P., et al.: Pedestrian detection: a benchmark. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 304–311 (2009)
Xiao, T., et al.: Joint detection and identification feature learning for person search. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3415–3424 (2017)
Zheng, L., et al.: Person re-identification in the wild. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1367–1376 (2017)
Shao, S., et al.: CrowdHuman: a benchmark for detecting human in a crowd. arXiv preprint arXiv:1805.00123 (2018)
Dendorfer, P., et al.: MOT20: a benchmark for multi object tracking in crowded scenes. arXiv preprint arXiv:2003.09003 (2020)
Bernardin, K., Stiefelhagen, R.: Evaluating multiple object tracking performance: the clear MOT metrics. EURASIP J. Image Video Process. 1–10 (2008)
Luiten, J., et al.: HOTA: a higher order metric for evaluating multi-object tracking. Int. J. Comput. Vis. 129(2), 548–578 (2021)
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Glenn, J.: https://github.com/ultralytics/yolov5/releases/tag/v6.1
Cai, J., et al.: MeMOT: multi-object tracking with memory. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8090–8100 (2022)
Meinhardt, T., et al.: TrackFormer: multi-object tracking with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8844–8854 (2022)
Zhou, X., et al.: Global tracking transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8771–8780 (2022)
Hyun, J., et al.: Detection recovery in online multi-object tracking with Sparse Graph Tracker. arXiv preprint arXiv:2205.00968 (2022)
Zhu, T., et al.: Looking beyond two frames: end-to-end multi-object tracking using spatial and temporal transformers. IEEE Trans. Pattern Anal. Mach. Intell. (2022)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
He, F., Xiao, G. (2023). An Efficient One-Shot Network and Robust Data Associations in Multi-pedestrian Tracking. In: Jin, Z., Jiang, Y., Buchmann, R.A., Bi, Y., Ghiran, AM., Ma, W. (eds) Knowledge Science, Engineering and Management. KSEM 2023. Lecture Notes in Computer Science(), vol 14118. Springer, Cham. https://doi.org/10.1007/978-3-031-40286-9_10
Download citation
DOI: https://doi.org/10.1007/978-3-031-40286-9_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-40285-2
Online ISBN: 978-3-031-40286-9
eBook Packages: Computer ScienceComputer Science (R0)