Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Feature Disentanglement Network: Multi-Object Tracking Needs More Differentiated Features

Published: 10 November 2023 Publication History

Abstract

To reduce computational redundancies, a common approach is to integrate detection and re-identification (Re-ID) into a single network in multi-object tracking (MOT), referred to as “tracking by detection.” Most of the previous research has focused on resolving the conflict between the detection and Re-ID branches, considering it a simple coupling. In our work, we uncover that the entangled state between the detection and Re-ID tasks is much more complex than previous idea, resulting in a form of competition that degrades performance. To address the preceding issue, we propose a feature disentanglement network that deeply disentangles the intricately interwoven latent space of features and provides differentiated feature maps for each individual task. Furthermore, considering the demand for shallow semantic features in the feature re-ID branch, we also introduce a feature re-globalization module to enrich the shallow semantics. By integrating two distinct networks into a one-shot online MOT method, we develop a robust MOT tracker (named HDGTrack). We conduct extensive experiments on a number of benchmarks, and our experimental results demonstrate that our method significantly outperforms state-of-the-art MOT methods. Besides, HDGTrack is efficient and can run at 13.9 (MOT17) and 8.7 (MOT20) frames per second.

References

[1]
Nir Aharon, Roy Orfaig, and Ben-Zion Bobrovsky. 2022. BoT-SORT: Robust associations multi-pedestrian tracking. arXiv:2206.14651 (2022). DOI:
[2]
Anton Andriyenko and Konrad Schindler. 2011. Multi-target tracking by continuous energy minimization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 1265–1272. DOI:
[3]
Philipp Bergmann, Tim Meinhardt, and Laura Leal-Taixe. 2019. Tracking without bells and whistles. In Proceedings of the IEEE International Conference on Computer Vision. IEEE, Los Alamitos, CA, 941–951. DOI:
[4]
Keni Bernardin and Rainer Stiefelhagen. 2008. Evaluating multiple object tracking performance: The clear MOT metrics. EURASIP Journal on Image and Video Processing 2008 (2008), 1–10. DOI:
[5]
Alex Bewley, Zongyuan Ge, Lionel Ott, Fabio Ramos, and Ben Upcroft. 2016. Simple online and realtime tracking. In Proceedings of the IEEE International Conference on Image Processing. IEEE, Los Alamitos, CA, 3464–3468. DOI:
[6]
Erik Bochinski, Volker Eiselein, and Thomas Sikora. 2017. High-speed tracking-by-detection without using image information. In Proceedings of the Workshop on Traffic and Street Surveillance for Safety and Security at the 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance. IEEE, Los Alamitos, CA, 1–6. DOI:
[7]
Jinkun Cao, Xinshuo Weng, Rawal Khirodkar, Jiangmiao Pang, and Kris Kitani. 2022. Observation-centric SORT: Rethinking SORT for robust multi-object tracking. arXiv:2203.14360 (2022). DOI:
[8]
Patrick Dendorfer, Hamid Rezatofighi, Anton Milan, Javen Shi, Daniel Cremers, Ian Reid, Stefan Roth, Konrad Schindler, and Laura Leal-Taixé. 2020. MOT20: A benchmark for multi object tracking in crowded scenes. arXiv:2003.09003 (2020). DOI:
[9]
Piotr Dollár, Christian Wojek, Bernt Schiele, and Pietro Perona. 2009. Pedestrian detection: A benchmark. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 304–311. DOI:
[10]
Yunhao Du, Yang Song, Bo Yang, and Yanyun Zhao. 2023. StrongSORT: Make DeepSORT great again. IEEE Transactions on Multimedia. Early Access, January 31, 2023. DOI:
[11]
Andreas Ess, Bastian Leibe, Konrad Schindler, and Luc Van Gool. 2008. A mobile vision system for robust multi-person tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 1–8. DOI:
[12]
Kuan Fang, Yu Xiang, Xiaocheng Li, and Silvio Savarese. 2018. Recurrent autoregressive networks for online multi-object tracking. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision. IEEE, Los Alamitos, CA, 466–475. DOI:
[13]
Christoph Feichtenhofer, Axel Pinz, and Andrew Zisserman. 2017. Detect to track and track to detect. In Proceedings of the IEEE International Conference on Computer Vision. IEEE, Los Alamitos, CA, 3038–3046. DOI:
[14]
Haidong Wang, Xuan He, Zhiyong Li, Jin Yuan, and Shutao Li. 2023. JDAN: Joint detection and association network for real-time online multi-object tracking. ACM Transactions on Multimedia Computing, Communications, and Applications 19 (2023), Article 45, 17 pages. DOI:
[15]
Shoudong Han, Piao Huang, Hongwei Wang, En Yu, Donghaisheng Liu, and Xiaofeng Pan. 2022. MAT: Motion-aware multi-object tracking. Neurocomputing 476 (2022), 75–86. DOI:
[16]
Alex Kendall, Yarin Gal, and Roberto Cipolla. 2018. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 7482–7491. DOI:
[17]
Chao Liang, Zhipeng Zhang, Xue Zhou, Bing Li, Shuyuan Zhu, and Weiming Hu. 2022. Rethinking the competition between detection and ReID in multiobject tracking. IEEE Transactions on Image Processing 31 (2022), 3182–3196. DOI:
[18]
Yiming Liang and Zhou Yue. 2018. LSTM multiple object tracker combining multiple cues. In Proceedings of the IEEE International Conference on Image Processing. IEEE, Los Alamitos, CA, 2351–2355. DOI:
[19]
Tsung-Yi Lin, Piotr Dollar, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. 2017. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 2117–2125. DOI:
[20]
Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollar. 2017. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision. IEEE, Los Alamitos, CA, 2980–2988. DOI:
[21]
Zhichao Lu, Vivek Rathod, Ronny Votel, and Jonathan Huang. 2020. RetinaTrack: Online single stage joint detection and tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 14668–14678. DOI:
[22]
Jonathon Luiten, Aljosa Osep, Patrick Dendorfer, Philip Torr, Andreas Geiger, Laura Leal-Taixé, and Bastian Leibe. 2021. HOTA: A higher order metric for evaluating multi-object tracking. International Journal of Computer Vision 129 (2021), 548–578. DOI:
[23]
Tim Meinhardt, Alexander Kirillov, Laura Leal-Taixé, and Christoph Feichtenhofer. 2022. TrackFormer: Multi-object tracking with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 8834–8844. DOI:
[24]
Anton Milan, Laura Leal-Taixe, Ian Reid, Stefan Roth, and Konrad Schindler. 2016. MOT16: A benchmark for multi-object tracking. arXiv:1603.00831 (2016). DOI:
[25]
Bo Pang, Yizhuo Li, Yifan Zhang, Muchen Li, and Cewu Lu. 2020. TubeTK: Adopting tubes to track multi-object in a one-step training model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 6308–6318. DOI:
[26]
Florian Particke, Markus Hiller, Lucila Patino Studencki, Christoph Sippl, Christian Feist, and Jorn Thielecke. 2017. Multiple intention tracking by a generalized potential field approach. In Proceedings of Sensor Data Fusion: Trends, Solutions, Applications. IEEE, Los Alamitos, CA, 1–5. DOI:
[27]
Florian Particke, Lucila Patino Studencki, Jörn Thielecke, and Christian Feist. 2017. Pedestrian tracking using a generalized potential field approach. In Proceedings of the International Joint Conference on Computer Vision, Imaging, and Computer Graphics Theory and Applications. 509–514. DOI:
[28]
Jinlong Peng, Changan Wang, Fangbin Wan, Yang Wu, Yabiao Wang, Ying Tai, Chengjie Wang, Jilin Li, Feiyue Huang, and Yanwei Fu. 2020. Chained-Tracker: Chaining paired attentive regression results for end-to-end joint multiple-object detection and tracking. In Proceedings of the European Conference on Computer Vision. 145–161. DOI:
[29]
Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 779–788. DOI:
[30]
Joseph Redmon and Ali Farhadi. 2018. YOLOv3: An incremental improvement. arXiv:1804.02767 (2018). DOI:
[31]
Jimmy Ren, Xiaohao Chen, Jianbo Liu, Wenxiu Sun, Jiahao Pang, Qiong Yan, Yu-Wing Tai, and Li Xu. 2017. Accurate single stage detector using recurrent rolling convolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 5420–5428. DOI:
[32]
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 6 (2015), 1137–1149. 24.3134DOI:
[33]
Seyed Hamid Rezatofighi, Anton Milan, Zhen Zhang, Qinfeng Shi, Anthony Dick, and Ian Reid. 2015. Joint probabilistic data association revisited. In Proceedings of the IEEE International Conference on Computer Vision. IEEE, Los Alamitos, CA, 3047–3055. DOI:
[34]
Amir Sadeghian, Alexandre Alahi, and Silvio Savarese. 2017. Tracking the untrackable: Learning to track multiple cues with long-term dependencies. In Proceedings of the IEEE International Conference on Computer Vision. IEEE, Los Alamitos, CA, 300–311. DOI:
[35]
Shuai Shao, Zijian Zhao, Boxun Li, Tete Xiao, Gang Yu, Xiangyu Zhang, and Jian Sun. 2018. CrowdHuman: A benchmark for detecting human in a crowd. arXiv:1805.00123 (2018). DOI:
[36]
Yongxin Wang, Kris Kitani, and Xinshuo Weng. 2021. Joint object detection and multi-object tracking with graph neural networks. In Proceedings of the IEEE International Conference on Robotics and Automation. IEEE, Los Alamitos, CA, 13708–13715. DOI:
[37]
Zhongdao Wang, Liang Zheng, Yixuan Liu, Yali Li, and Shengjin Wang. 2020. Towards real-time multi-object tracking. In Proceedings of the European Conference on Computer Vision. 107–122. DOI:
[38]
Anton Milan Xiaoqin Zhang Wei Liu Wenhan Luo, Junliang Xing and Tae-Kyun Kim. 2021. Multiple object tracking: A literature review. arXiv:1409.7618 (2021). DOI:
[39]
Nicolai Wojke, Alex Bewley, and Dietrich Paulus. 2018. Simple online and realtime tracking with a deep association metric. In Proceedings of the IEEE International Conference on Image Processing. IEEE, Los Alamitos, CA, 3645–3649. DOI:
[40]
Tong Xiao, Shuang Li, Bochao Wang, Liang Lin, and Xiaogang Wang. 2017. Joint detection and identification feature learning for person search. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 3415–3424. DOI:
[41]
Yihong Xu, Yutong Ban, Guillaume Delorme, Chuang Gan, Daniela Rus, and Xavier Alameda-Pineda. 2023. TransCenter: Transformers with dense representations for multiple-object tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence 45, 6 (2023), 7820–7835. 24.3134DOI:
[42]
Jun Yu Jiabao Wang Yan Jin, Fang Gao and Feng Shuang. 2023. Multi-object tracking: Decoupling features to solve the contradictory dilemma of feature requirements. IEEE Transactions on Circuits and Systems for Video Technology 33, 9 (2023), 5117–5132. 24.3134DOI:
[43]
Fan Yang, Wongun Choi, and Yuanqing Lin. 2016. Exploit all the layers: Fast and accurate CNN object detector with scale dependent pooling and cascaded rejection classifiers. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 2129–2137. DOI:
[44]
En Yu, Zhuoling Li, Shoudong Han, and Hongwei Wang. 2022. RelationTrack: Relation-aware multiple object tracking with decoupled representation. IEEE Transactions on Multimedia 25 (2022), 2686–2697. 8.1826DOI:
[45]
Fisher Yu, Dequan Wang, Evan Shelhamer, and Trevor Darrell. 2018. Deep layer aggregation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 2403–2412. DOI:
[46]
Shanshan Zhang, Rodrigo Benenson, and Bernt Schiele. 2017. CityPersons: A diverse dataset for pedestrian detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 3213–3221. DOI:
[47]
Yang Zhang, Hao Sheng, Yubin Wu, Shuai Wang, Wei Ke, and Zhang Xiong. 2020. Multiplex labeling graph for near-online tracking in crowded scenes. IEEE Internet of Things Journal 7 (2020), 7892–7902. DOI:
[48]
Yifu Zhang, Chunyu Wang, Xinggang Wang, Wenjun Zeng, and Wenyu Liu. 2021. FairMOT: On the fairness of detection and re-identification in multiple object tracking. International Journal of Computer Vision 129 (2021), 3069–3087. DOI:
[49]
Liang Zheng, Hengheng Zhang, Shaoyan Sun, Manmohan Chandraker, Yi Yang, and Qi Tian. 2017. Person re-identification in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Los Alamitos, CA, 1367–1376. DOI:
[50]
Xingyi Zhou, Dequan Wang, and Philipp Krähenbühl. 2019. Objects as points. arXiv:1904.07850 (2019). DOI:
[51]
Zongwei Zhou, Junliang Xing, Mengdan Zhang, and Weiming Hu. 2018. Online multi-target tracking with tensor-based high-order graph matching. In Proceedings of the International Conference on Pattern Recognition. IEEE, Los Alamitos, CA, 1809–1814. DOI:

Index Terms

  1. Feature Disentanglement Network: Multi-Object Tracking Needs More Differentiated Features

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Multimedia Computing, Communications, and Applications
    ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 20, Issue 3
    March 2024
    665 pages
    EISSN:1551-6865
    DOI:10.1145/3613614
    • Editor:
    • Abdulmotaleb El Saddik
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 10 November 2023
    Online AM: 09 October 2023
    Accepted: 26 September 2023
    Revised: 16 September 2023
    Received: 05 June 2023
    Published in TOMM Volume 20, Issue 3

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Multiple object tracking
    2. Feature disentanglement network
    3. one-shot tracking
    4. feature enhancement

    Qualifiers

    • Research-article

    Funding Sources

    • National Natural Science Foundation of China

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 257
      Total Downloads
    • Downloads (Last 12 months)257
    • Downloads (Last 6 weeks)14
    Reflects downloads up to 04 Oct 2024

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media