Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1007/978-3-031-20047-2_1guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

ByteTrack: Multi-object Tracking by Associating Every Detection Box

Published: 23 October 2022 Publication History

Abstract

Multi-object tracking (MOT) aims at estimating bounding boxes and identities of objects in videos. Most methods obtain identities by associating detection boxes whose scores are higher than a threshold. The objects with low detection scores, e.g. occluded objects, are simply thrown away, which brings non-negligible true object missing and fragmented trajectories. To solve this problem, we present a simple, effective and generic association method, tracking by associating almost every detection box instead of only the high score ones. For the low score detection boxes, we utilize their similarities with tracklets to recover true objects and filter out the background detections. When applied to 9 different state-of-the-art trackers, our method achieves consistent improvement on IDF1 score ranging from 1 to 10 points. To put forwards the state-of-the-art performance of MOT, we design a simple and strong tracker, named ByteTrack. For the first time, we achieve 80.3 MOTA, 77.3 IDF1 and 63.1 HOTA on the test set of MOT17 with 30 FPS running speed on a single V100 GPU. ByteTrack also achieves state-of-the-art performance on MOT20, HiEve and BDD100K tracking benchmarks. The source code, pre-trained models with deploy versions and tutorials of applying to other trackers are released at https://github.com/ifzhang/ByteTrack.

References

[1]
Bae, S.H., Yoon, K.J.: Robust online multi-object tracking based on tracklet confidence and online discriminative appearance learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1218–1225 (2014)
[2]
Berclaz J, Fleuret F, Turetken E, and Fua P Multiple object tracking using k-shortest paths optimization IEEE Trans. Pattern Anal. Mach. Intell. 2011 33 9 1806-1819
[3]
Bergmann, P., Meinhardt, T., Leal-Taixe, L.: Tracking without bells and whistles. In: ICCV, pp. 941–951 (2019)
[4]
Bernardin K and Stiefelhagen R Evaluating multiple object tracking performance: the clear mot metrics EURASIP J. Image Video Process. 2008 2008 1-10
[5]
Bertinetto L, Valmadre J, Henriques JF, Vedaldi A, and Torr PHS Hua G and Jégou H Fully-convolutional Siamese networks for object tracking Computer Vision – ECCV 2016 Workshops 2016 Cham Springer 850-865
[6]
Bewley, A., Ge, Z., Ott, L., Ramos, F., Upcroft, B.: Simple online and realtime tracking. In: ICIP, pp. 3464–3468. IEEE (2016)
[7]
Bochinski, E., Eiselein, V., Sikora, T.: High-speed tracking-by-detection without using image information. In: 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 1–6. IEEE (2017)
[8]
Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: YOLOv4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020)
[9]
Brasó, G., Leal-Taixé, L.: Learning a neural solver for multiple object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6247–6257 (2020)
[10]
Cai, J., Xu, M., Li, W., Xiong, Y., Xia, W., Tu, Z., Soatto, S.: MeMOT: multi-object tracking with memory. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8090–8100 (2022)
[11]
Cai, Z., Vasconcelos, N.: Cascade R-CNN: delving into high quality object detection. In: CVPR, pp. 6154–6162 (2018)
[12]
Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, and Zagoruyko S Vedaldi A, Bischof H, Brox T, and Frahm J-M End-to-end object detection with transformers Computer Vision – ECCV 2020 2020 Cham Springer 213-229
[13]
Chen, L., Ai, H., Zhuang, Z., Shang, C.: Real-time multiple people tracking with deeply learned candidate selection and person re-identification. In: 2018 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6. IEEE (2018)
[14]
Chu, P., Fan, H., Tan, C.C., Ling, H.: Online multi-object tracking with instance-aware tracker and dynamic model refreshment. In: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 161–170. IEEE (2019)
[15]
Chu, P., Ling, H.: FAMNet: joint learning of feature, affinity and multi-dimensional assignment for online multiple object tracking. In: ICCV, pp. 6172–6181 (2019)
[16]
Chu, P., Wang, J., You, Q., Ling, H., Liu, Z.: TransMOT: spatial-temporal graph transformer for multiple object tracking. arXiv preprint arXiv:2104.00194 (2021)
[17]
Dendorfer, P., et al.: MOT20: a benchmark for multi object tracking in crowded scenes. arXiv preprint arXiv:2003.09003 (2020)
[18]
Dicle, C., Camps, O.I., Sznaier, M.: The way they move: tracking multiple targets with similar appearance. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2304–2311 (2013)
[19]
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
[20]
Ess, A., Leibe, B., Schindler, K., Van Gool, L.: A mobile vision system for robust multi-person tracking. In: CVPR, pp. 1–8. IEEE (2008)
[21]
Fang Y et al. You only look at one sequence: rethinking transformer in vision through object detection Adv. Neural. Inf. Process. Syst. 2021 34 26183-26197
[22]
Fang, Y., Yang, S., Wang, S., Ge, Y., Shan, Y., Wang, X.: Unleashing vanilla vision transformer with masked image modeling for object detection. arXiv preprint arXiv:2204.02964 (2022)
[23]
Felzenszwalb, P., McAllester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: CVPR, pp. 1–8. IEEE (2008)
[24]
Fu, J., Zong, L., Li, Y., Li, K., Yang, B., Liu, X.: Model adaption object detection system for robot. In: 2020 39th Chinese Control Conference (CCC), pp. 3659–3664. IEEE (2020)
[25]
Ge, Z., Liu, S., Wang, F., Li, Z., Sun, J.: YOLOX: exceeding YOLO series in 2021. arXiv preprint arXiv:2107.08430 (2021)
[26]
Han, S., et al.: MAT: motion-aware multi-object tracking. arXiv preprint arXiv:2009.04794 (2020)
[27]
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: ICCV, pp. 2961–2969 (2017)
[28]
He, L., Liao, X., Liu, W., Liu, X., Cheng, P., Mei, T.: FastReID: a PyTorch toolbox for general instance re-identification. arXiv preprint arXiv:2006.02631 (2020)
[29]
Hornakova, A., Henschel, R., Rosenhahn, B., Swoboda, P.: Lifted disjoint paths with application in multiple object tracking. In: International Conference on Machine Learning, pp. 4364–4375. PMLR (2020)
[30]
Kalman RE A new approach to linear filtering and prediction problems J. Fluids Eng. 1960 82 1 35-45
[31]
Khurana, T., Dave, A., Ramanan, D.: Detecting invisible people. arXiv preprint arXiv:2012.08419 (2020)
[32]
Kuhn HW The Hungarian method for the assignment problem Naval Res. Logistics Q. 1955 2 1–2 83-97
[33]
Li, W., Xiong, Y., Yang, S., Xu, M., Wang, Y., Xia, W.: Semi-TCL: semi-supervised track contrastive representation learning. arXiv preprint arXiv:2107.02396 (2021)
[34]
Liang, C., et al.: Rethinking the competition between detection and ReID in multi-object tracking. arXiv preprint arXiv:2010.12138 (2020)
[35]
Liang, C., Zhang, Z., Zhou, X., Li, B., Lu, Y., Hu, W.: One more check: making “fake background” be tracked again. arXiv preprint arXiv:2104.09441 (2021)
[36]
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV, pp. 2980–2988 (2017)
[37]
Lin T-Y et al. Fleet D, Pajdla T, Schiele B, Tuytelaars T, et al. Microsoft COCO: common objects in context Computer Vision – ECCV 2014 2014 Cham Springer 740-755
[38]
Lin, W., et al.: Human in events: a large-scale benchmark for human-centric video analysis in complex events. arXiv preprint arXiv:2005.04490 (2020)
[39]
Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. arXiv preprint arXiv:2103.14030 (2021)
[40]
Lu, Z., Rathod, V., Votel, R., Huang, J.: RetinaTrack: online single stage joint detection and tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14668–14678 (2020)
[41]
Luiten J et al. HOTA: a higher order metric for evaluating multi-object tracking Int. J. Comput. Vision 2021 129 2 548-578
[42]
Luo, H., Xie, W., Wang, X., Zeng, W.: Detect or track: towards cost-effective video object detection/tracking. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 8803–8810 (2019)
[43]
Meinhardt, T., Kirillov, A., Leal-Taixe, L., Feichtenhofer, C.: TrackFormer: multi-object tracking with transformers. arXiv preprint arXiv:2101.02702 (2021)
[44]
Micikevicius, P., et al.: Mixed precision training. arXiv preprint arXiv:1710.03740 (2017)
[45]
Milan, A., Leal-Taixé, L., Reid, I., Roth, S., Schindler, K.: MOT16: a benchmark for multi-object tracking. arXiv preprint arXiv:1603.00831 (2016)
[46]
Milan A, Roth S, and Schindler K Continuous energy minimization for multitarget tracking IEEE Trans. Pattern Anal. Mach. Intell. 2013 36 1 58-72
[47]
Pang, B., Li, Y., Zhang, Y., Li, M., Lu, C.: TubeTK: adopting tubes to track multi-object in a one-step training model. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6308–6318 (2020)
[48]
Pang, J., et al.: Quasi-dense similarity learning for multiple object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 164–173 (2021)
[49]
Peng J et al. Vedaldi A, Bischof H, Brox T, Frahm J-M, et al. Chained-tracker: chaining paired attentive regression results for end-to-end joint multiple-object detection and tracking Computer Vision – ECCV 2020 2020 Cham Springer 145-161
[50]
Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)
[51]
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)
[52]
Ristani E, Solera F, Zou R, Cucchiara R, and Tomasi C Hua G and Jégou H Performance measures and a data set for multi-target, multi-camera tracking Computer Vision – ECCV 2016 Workshops 2016 Cham Springer 17-35
[53]
Sanchez-Matilla R, Poiesi F, and Cavallaro A Hua G and Jégou H Online multi-target tracking with strong and weak detections Computer Vision – ECCV 2016 Workshops 2016 Cham Springer 84-99
[54]
Shan, C., et al.: Tracklets predicting based adaptive graph tracking. arXiv preprint arXiv:2010.09015 (2020)
[55]
Shao, S., et al.: CrowdHuman: a benchmark for detecting human in a crowd. arXiv preprint arXiv:1805.00123 (2018)
[56]
Shuai, B., Berneshawi, A., Li, X., Modolo, D., Tighe, J.: SiamMOT: Siamese multi-object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12372–12382 (2021)
[57]
Sun, P., et al.: What makes for end-to-end object detection? In: Proceedings of the 38th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 139, pp. 9934–9944. PMLR (2021)
[58]
Sun, P., et al.: TransTrack: multiple-object tracking with transformer. arXiv preprint arXiv:2012.15460 (2020)
[59]
Sun, P., et al.: Sparse R-CNN: end-to-end object detection with learnable proposals. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14454–14463 (2021)
[60]
Tang P, Wang C, Wang X, Liu W, Zeng W, and Wang J Object detection in videos by high quality object linking IEEE Trans. Pattern Anal. Mach. Intell. 2019 42 5 1272-1278
[61]
Tokmakov, P., Li, J., Burgard, W., Gaidon, A.: Learning to track with object permanence. arXiv preprint arXiv:2103.14258 (2021)
[62]
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
[63]
Wang, C.Y., Bochkovskiy, A., Liao, H.Y.M.: YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv preprint arXiv:2207.02696 (2022)
[64]
Wang, Q., Zheng, Y., Pan, P., Xu, Y.: Multiple object tracking with correlation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3876–3886 (2021)
[65]
Wang, W., et al.: Pyramid vision transformer: a versatile backbone for dense prediction without convolutions. arXiv preprint arXiv:2102.12122 (2021)
[66]
Wang, Y., Kitani, K., Weng, X.: Joint object detection and multi-object tracking with graph neural networks. arXiv preprint arXiv:2006.13164 (2020)
[67]
Wang, Z., Zhao, H., Li, Y.L., Wang, S., Torr, P.H., Bertinetto, L.: Do different tracking tasks require different appearance models? arXiv preprint arXiv:2107.02156 (2021)
[68]
Wang Z, Zheng L, Liu Y, Li Y, and Wang S Vedaldi A, Bischof H, Brox T, and Frahm J-M Towards real-time multi-object tracking Computer Vision – ECCV 2020 2020 Cham Springer 107-122
[69]
Wojke, N., Bewley, A., Paulus, D.: Simple online and realtime tracking with a deep association metric. In: 2017 IEEE International Conference on Image Processing (ICIP), pp. 3645–3649. IEEE (2017)
[70]
Wu, J., Cao, J., Song, L., Wang, Y., Yang, M., Yuan, J.: Track to detect and segment: an online multi-object tracker. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12352–12361 (2021)
[71]
Xiang, Y., Alahi, A., Savarese, S.: Learning to track: online multi-object tracking by decision making. In: ICCV, pp. 4705–4713 (2015)
[72]
Xu, J., Cao, Y., Zhang, Z., Hu, H.: Spatial-temporal relation networks for multi-object tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3988–3998 (2019)
[73]
Xu, Y., Ban, Y., Delorme, G., Gan, C., Rus, D., Alameda-Pineda, X.: TransCenter: transformers with dense queries for multiple-object tracking. arXiv preprint arXiv:2103.15145 (2021)
[74]
Yan, B., et al.: Towards grand unification of object tracking. arXiv preprint arXiv:2207.07078 (2022)
[75]
Yang F, Chang X, Sakti S, Wu Y, and Nakamura S ReMOT: a model-agnostic refinement for multiple object tracking Image Vis. Comput. 2021 106
[76]
Yang, F., Choi, W., Lin, Y.: Exploit all the layers: fast and accurate CNN object detector with scale dependent pooling and cascaded rejection classifiers. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2129–2137 (2016)
[77]
Yu, E., Li, Z., Han, S., Wang, H.: RelationTrack: relation-aware multiple object tracking with decoupled representation. arXiv preprint arXiv:2105.04322 (2021)
[78]
Yu, F., et al.: BDD100K: a diverse driving dataset for heterogeneous multitask learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2636–2645 (2020)
[79]
Zeng, F., Dong, B., Wang, T., Chen, C., Zhang, X., Wei, Y.: MOTR: end-to-end multiple-object tracking with transformer. arXiv preprint arXiv:2105.03247 (2021)
[80]
Zhang, H., Cisse, M., Dauphin, Y.N., Lopez-Paz, D.: mixup: beyond empirical risk minimization. arXiv preprint arXiv:1710.09412 (2017)
[81]
Zhang, S., Benenson, R., Schiele, B.: CityPersons: a diverse dataset for pedestrian detection. In: CVPR, pp. 3213–3221 (2017)
[82]
Zhang Y, Sheng H, Wu Y, Wang S, Ke W, and Xiong Z Multiplex labeling graph for near-online tracking in crowded scenes IEEE Internet Things J. 2020 7 9 7892-7902
[83]
Zhang, Y., Wang, C., Wang, X., Liu, W., Zeng, W.: VoxelTrack: multi-person 3D human pose estimation and tracking in the wild. IEEE Trans. Pattern Anal. Mach. Intell. (2022)
[84]
Zhang Y, Wang C, Wang X, Zeng W, and Liu W FairMOT: on the fairness of detection and re-identification in multiple object tracking Int. J. Comput. Vision 2021 129 11 3069-3087
[85]
Zhang, Z., Cheng, D., Zhu, X., Lin, S., Dai, J.: Integrated object detection and tracking with tracklet-conditioned detection. arXiv preprint arXiv:1811.11167 (2018)
[86]
Zheng, L., Tang, M., Chen, Y., Zhu, G., Wang, J., Lu, H.: Improving multiple object tracking with single object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2453–2462 (2021)
[87]
Zhou X, Koltun V, and Krähenbühl P Vedaldi A, Bischof H, Brox T, and Frahm J-M Tracking objects as points Computer Vision – ECCV 2020 2020 Cham Springer 474-490
[88]
Zhou, X., Wang, D., Krähenbühl, P.: Objects as points. arXiv preprint arXiv:1904.07850 (2019)
[89]
Zhou, X., Yin, T., Koltun, V., Krähenbühl, P.: Global tracking transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8771–8780 (2022)
[90]
Zhu J, Yang H, Liu N, Kim M, Zhang W, and Yang M-H Ferrari V, Hebert M, Sminchisescu C, and Weiss Y Online multi-object tracking with dual matching attention networks Computer Vision – ECCV 2018 2018 Cham Springer 379-396
[91]
Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable DETR: deformable transformers for end-to-end object detection. arXiv preprint arXiv:2010.04159 (2020)

Cited By

View all
  • (2025)Automated fish counting system based on instance segmentation in aquacultureExpert Systems with Applications: An International Journal10.1016/j.eswa.2024.125318259:COnline publication date: 1-Jan-2025
  • (2025)Lightweight multiobject ship tracking algorithm based on trajectory association and improved YOLOv7tinyExpert Systems with Applications: An International Journal10.1016/j.eswa.2024.125129259:COnline publication date: 1-Jan-2025
  • (2025)RoboDJ: Live Commentary Robots System Driven by Physical- and Cyber-World ObservationsMultiMedia Modeling10.1007/978-981-96-2074-6_21(187-193)Online publication date: 9-Jan-2025
  • Show More Cited By

Index Terms

  1. ByteTrack: Multi-object Tracking by Associating Every Detection Box
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image Guide Proceedings
      Computer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXII
      Oct 2022
      827 pages
      ISBN:978-3-031-20046-5
      DOI:10.1007/978-3-031-20047-2

      Publisher

      Springer-Verlag

      Berlin, Heidelberg

      Publication History

      Published: 23 October 2022

      Author Tags

      1. Multi-object tracking
      2. Data association
      3. Detection boxes

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 31 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2025)Automated fish counting system based on instance segmentation in aquacultureExpert Systems with Applications: An International Journal10.1016/j.eswa.2024.125318259:COnline publication date: 1-Jan-2025
      • (2025)Lightweight multiobject ship tracking algorithm based on trajectory association and improved YOLOv7tinyExpert Systems with Applications: An International Journal10.1016/j.eswa.2024.125129259:COnline publication date: 1-Jan-2025
      • (2025)RoboDJ: Live Commentary Robots System Driven by Physical- and Cyber-World ObservationsMultiMedia Modeling10.1007/978-981-96-2074-6_21(187-193)Online publication date: 9-Jan-2025
      • (2024)Semi-automated computer vision-based tracking of multiple industrial entities: a framework and dataset creation approachJournal on Image and Video Processing10.1186/s13640-024-00623-62024:1Online publication date: 22-Mar-2024
      • (2024)Research on the Long-tailed Distribution of Data in Bee TrackingProceedings of the 2024 3rd International Conference on Artificial Intelligence and Intelligent Information Processing10.1145/3707292.3707350(105-110)Online publication date: 25-Oct-2024
      • (2024)P2FTrack: Multi-Object Tracking with Motion Prior and Feature PosteriorACM Transactions on Multimedia Computing, Communications, and Applications10.1145/370044321:1(1-22)Online publication date: 14-Oct-2024
      • (2024)Enhanced Multi-Object Tracking: Inferring Motion States of Tracked ObjectsACM Transactions on Multimedia Computing, Communications, and Applications10.1145/369996021:1(1-25)Online publication date: 11-Oct-2024
      • (2024)PnRInfo : Interactive Tactical Information Visualization for Pick and Roll EventSIGGRAPH Asia 2024 Posters10.1145/3681756.3697936(1-2)Online publication date: 3-Dec-2024
      • (2024)GLATrack: Global and Local Awareness for Open-Vocabulary Multiple Object TrackingProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681530(2457-2466)Online publication date: 28-Oct-2024
      • (2024)GSLAMOT: A Tracklet and Query Graph-based Simultaneous Locating, Mapping, and Multiple Object Tracking SystemProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681086(7239-7248)Online publication date: 28-Oct-2024
      • Show More Cited By

      View Options

      View options

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media