Fusing two-stream convolutional neural networks for RGB-T object tracking

C Li, X Wu, N Zhao, X Cao, J Tang - Neurocomputing, 2018 - Elsevier
C Li, X Wu, N Zhao, X Cao, J Tang
Neurocomputing, 2018Elsevier
This paper investigates how to integrate the complementary information from RGB and
thermal (RGB-T) sources for object tracking. We propose a novel Convolutional Neural
Network (ConvNet) architecture, including a two-stream ConvNet and a FusionNet, to
achieve adaptive fusion of different source data for robust RGB-T tracking. Both RGB and
thermal streams extract generic semantic information of the target object. In particular, the
thermal stream is pre-trained on the ImageNet dataset to encode rich semantic information …
Abstract
This paper investigates how to integrate the complementary information from RGB and thermal (RGB-T) sources for object tracking. We propose a novel Convolutional Neural Network (ConvNet) architecture, including a two-stream ConvNet and a FusionNet, to achieve adaptive fusion of different source data for robust RGB-T tracking. Both RGB and thermal streams extract generic semantic information of the target object. In particular, the thermal stream is pre-trained on the ImageNet dataset to encode rich semantic information, and then fine-tuned using thermal images to capture the specific properties of thermal information. For adaptive fusion of different modalities while avoiding redundant noises, the FusionNet is employed to select most discriminative feature maps from the outputs of the two-stream ConvNet, and updated online to adapt to appearance variations of the target object. Finally, the object locations are efficiently predicted by applying the multi-channel correlation filter on the fused feature maps. Extensive experiments on the recently public benchmark GTOT verify the effectiveness of the proposed approach against other state-of-the-art RGB-T trackers.
Elsevier