Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3663976.3663996acmotherconferencesArticle/Chapter ViewAbstractPublication PagescvipprConference Proceedingsconference-collections
research-article

TandemFuse: An Intra- and Inter-Modal Fusion Strategy for RGB-T Tracking

Published: 27 June 2024 Publication History

Abstract

Visual object tracking is a prominent task in the field of computer vision, with significant potential in autonomous driving, human-computer interaction, and intelligent surveillance. Many studies have focused on tracking using single-modality data. Among these, the RGB modality is renowned for its robust color and detail capture capabilities, yet it is susceptible to motion blur, occlusions, and low-light conditions. Conversely, the TIR modality can overcome these issues but is limited by lower resolution and higher costs, making target recognition in complex scenes more challenging. With the rise of multi-modal learning, integrating RGB and TIR modalities can significantly enhance the robustness of single-modality trackers across various scenarios. This paper proposes a comprehensive multi-modal learning strategy for fusing RGB and TIR modality in the tracking task. Two key modules are designed: intra-modal data fusion and inter-modal data fusion. For intra-modal data fusion, we utilize feature pyramid techniques to merge multi-scale representations from single modalities. Subsequently, features learned from both modalities are sent to inter-modal data fusion for enhanced tracking. Our model is trained on the RGBT234 dataset and tested on the GTOT dataset, achieving a success rate of 0.454 and a precision rate of 0.438. The insights and methodologies derive from this research offer guidance for future multi-modal object tracking studies and underscore the critical role of intra-modal fusion in enhancing the efficiency of multi-modal integration.

References

[1]
Luca Bertinetto, Jack Valmadre, Joao F Henriques, Andrea Vedaldi, and Philip HS Torr. 2016. Fully-convolutional siamese networks for object tracking. In Computer Vision–ECCV 2016 Workshops: Amsterdam, The Netherlands, October 8-10 and 15-16, 2016, Proceedings, Part II 14. Springer, 850–865.
[2]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).
[3]
João F Henriques, Rui Caseiro, Pedro Martins, and Jorge Batista. 2014. High-speed tracking with kernelized correlation filters. IEEE transactions on pattern analysis and machine intelligence 37, 3 (2014), 583–596.
[4]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25 (2012).
[5]
Bo Li, Wei Wu, Qiang Wang, Fangyi Zhang, Junliang Xing, and Junjie Yan. 2019. Siamrpn++: Evolution of siamese visual tracking with very deep networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 4282–4291.
[6]
Bo Li, Junjie Yan, Wei Wu, Zheng Zhu, and Xiaolin Hu. 2018. High performance visual tracking with siamese region proposal network. In Proceedings of the IEEE conference on computer vision and pattern recognition. 8971–8980.
[7]
Chenglong Li, Hui Cheng, Shiyi Hu, Xiaobai Liu, Jin Tang, and Liang Lin. 2016. Learning collaborative sparse representation for grayscale-thermal tracking. IEEE Transactions on Image Processing 25, 12 (2016), 5743–5756.
[8]
Chenglong Li, Xinyan Liang, Yijuan Lu, Nan Zhao, and Jin Tang. 2019. RGB-T object tracking: Benchmark and baseline. Pattern Recognition 96 (2019), 106977.
[9]
Chenglong Li, Lei Liu, Andong Lu, Qing Ji, and Jin Tang. 2020. Challenge-aware RGBT tracking. In European Conference on Computer Vision. Springer, 222–237.
[10]
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems 28 (2015).
[11]
Qiangqiang Wu, Tianyu Yang, Ziquan Liu, Baoyuan Wu, Ying Shan, and Antoni B Chan. 2023. Dropmae: Masked autoencoders with spatial-attention dropout for tracking tasks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14561–14571.
[12]
Pengyu Zhang, Dong Wang, and Huchuan Lu. 2024. Multi-modal visual tracking: Review and experimental comparison. Comput. Vis. Media 10, 2 (2024), 193–214. https://doi.org/10.1007/S41095-023-0345-5
[13]
Pengyu Zhang, Dong Wang, Huchuan Lu, and Xiaoyun Yang. 2021. Learning Adaptive Attribute-Driven Representation for Real-Time RGB-T Tracking. Int. J. Comput. Vis. 129, 9 (2021), 2714–2729. https://doi.org/10.1007/S11263-021-01495-3
[14]
Pengyu Zhang, Jie Zhao, Chunjuan Bo, Dong Wang, Huchuan Lu, and Xiaoyun Yang. 2021. Jointly Modeling Motion and Appearance Cues for Robust RGB-T Tracking. IEEE Trans. Image Process. 30 (2021), 3335–3347. https://doi.org/10.1109/TIP.2021.3060862
[15]
Pengyu Zhang, Jie Zhao, Dong Wang, Huchuan Lu, and Xiang Ruan. 2022. Visible-Thermal UAV Tracking: A Large-Scale Benchmark and New Baseline. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022. IEEE, 8876–8885. https://doi.org/10.1109/CVPR52688.2022.00868
[16]
Xingchen Zhang, Ping Ye, Shengyun Peng, Jun Liu, Ke Gong, and Gang Xiao. 2019. SiamFT: An RGB-infrared fusion tracking method via fully convolutional Siamese networks. IEEE Access 7 (2019), 122122–122133.
[17]
Zixiang Zhao, Haowen Bai, Jiangshe Zhang, Yulun Zhang, Shuang Xu, Zudi Lin, Radu Timofte, and Luc Van Gool. 2023. Cddfuse: Correlation-driven dual-branch feature decomposition for multi-modality image fusion. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 5906–5916.
[18]
Yabin Zhu, Chenglong Li, Bin Luo, Jin Tang, and Xiao Wang. 2019. Dense feature aggregation and pruning for RGBT tracking. In Proceedings of the 27th ACM International Conference on Multimedia. 465–472.
[19]
Zheng Zhu, Qiang Wang, Bo Li, Wei Wu, Junjie Yan, and Weiming Hu. 2018. Distractor-aware siamese networks for visual object tracking. In Proceedings of the European conference on computer vision (ECCV). 101–117.

Index Terms

  1. TandemFuse: An Intra- and Inter-Modal Fusion Strategy for RGB-T Tracking

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    CVIPPR '24: Proceedings of the 2024 2nd Asia Conference on Computer Vision, Image Processing and Pattern Recognition
    April 2024
    373 pages
    ISBN:9798400716607
    DOI:10.1145/3663976
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 June 2024

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Multi-Modal Fusion
    2. RGB-T Tracking
    3. Visual Object Tracking

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    CVIPPR 2024

    Acceptance Rates

    Overall Acceptance Rate 14 of 38 submissions, 37%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 23
      Total Downloads
    • Downloads (Last 12 months)23
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 22 Dec 2024

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media