Generative-Based Fusion Mechanism for Multi-Modal Tracking

Authors

  • Zhangyong Tang School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, Jiangsu, PR. China
  • Tianyang Xu School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, Jiangsu, PR. China
  • Xiaojun Wu School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, Jiangsu, PR. China
  • Xue-Feng Zhu School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, Jiangsu, PR. China
  • Josef Kittler Centre for Vision, Speech and Signal Processing, University of Surrey, Guildford,GU2 7XH, UK

DOI:

https://doi.org/10.1609/aaai.v38i6.28325

Keywords:

CV: Motion & Tracking, CV: Multi-modal Vision

Abstract

Generative models (GMs) have received increasing research interest for their remarkable capacity to achieve comprehensive understanding. However, their potential application in the domain of multi-modal tracking has remained unexplored. In this context, we seek to uncover the potential of harnessing generative techniques to address the critical challenge, information fusion, in multi-modal tracking. In this paper, we delve into two prominent GM techniques, namely, Conditional Generative Adversarial Networks (CGANs) and Diffusion Models (DMs). Different from the standard fusion process where the features from each modality are directly fed into the fusion block, we combine these multi-modal features with random noise in the GM framework, effectively transforming the original training samples into harder instances. This design excels at extracting discriminative clues from the features, enhancing the ultimate tracking performance. Based on this, we conduct extensive experiments across two multi-modal tracking tasks, three baseline methods, and four challenging benchmarks. The experimental results demonstrate that the proposed generative-based fusion mechanism achieves state-of-the-art performance by setting new records on GTOT, LasHeR and RGBD1K. Code will be available at https://github.com/Zhangyong-Tang/GMMT.

Published

2024-03-24

How to Cite

Tang, Z., Xu, T., Wu, X., Zhu, X.-F., & Kittler, J. (2024). Generative-Based Fusion Mechanism for Multi-Modal Tracking. Proceedings of the AAAI Conference on Artificial Intelligence, 38(6), 5189-5197. https://doi.org/10.1609/aaai.v38i6.28325

Issue

Section

AAAI Technical Track on Computer Vision V