Dual-modality space-time memory network for RGBT tracking

F Zhang, H Peng, L Yu, Y Zhao… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
F Zhang, H Peng, L Yu, Y Zhao, B Chen
IEEE Transactions on Instrumentation and Measurement, 2023ieeexplore.ieee.org
RGBT tracking is rapidly developing due to its complementary advantages of RGB and
thermal frames. Existing methods with high-accuracy track at a lower speed do not make full
use of the hierarchical information in the feature extraction and the historical information of
the sequences. To address these issues, a novel dual-modality space-time memory
(DMSTM) network is proposed for robust RGBT tracking. Specifically, DMSTM is divided into
three modules. The first module is the dual-modality backbone that uses both shallow and …
RGBT tracking is rapidly developing due to its complementary advantages of RGB and thermal frames. Existing methods with high-accuracy track at a lower speed do not make full use of the hierarchical information in the feature extraction and the historical information of the sequences. To address these issues, a novel dual-modality space-time memory (DMSTM) network is proposed for robust RGBT tracking. Specifically, DMSTM is divided into three modules. The first module is the dual-modality backbone that uses both shallow and deep information by aggregating feature maps of dimensional changes during downsampling. Another module is the space-time memory reader with bimodal fusion. It aggregates features of historical and current frames to share information in the time domain. The last module is the Siamese head network, which computes the predicted loss sum of the two modalities and backpropagates it. This avoids degrading the tracking performance due to sequence frame pairs where the training targets are not perfectly aligned. Extensive experiments on three RGBT benchmark datasets show that the performance and efficiency of the proposed DMSTM exceed that of the state-of-the-art methods while running at 27.6 frames/s.
ieeexplore.ieee.org