Authors:
Mohamed Amine Marnissi
1
;
2
;
3
;
Ikram Hattab
2
;
4
;
Hajer Fradi
4
;
3
;
Anis Sahbani
2
and
Najoua Essoukri Ben Amara
5
;
3
Affiliations:
1
Ecole Nationale d’Ingénieurs de Sfax, Université de Sfax, 3038, Sfax, Tunisia
;
2
Enova Robotics, Novation City, 4000, Sousse, Tunisia
;
3
LATIS- Laboratory of Advanced Technology and Intelligent Systems, Université de Sousse, 4023, Sousse, Tunisia
;
4
Institut Supérieur des Sciences Appliquées et de Technologie, Université de Sousse, 4023, Sousse, Tunisia
;
5
Ecole Nationale d’Ingénieurs de Sousse, Université de Sousse, 4023, Sousse, Tunisia
Keyword(s):
Deep Learning, Object Detection, YOLO, Visible and Thermal Cameras, Robotic Vision, Saliency Map, Transformer, Features Fusion.
Abstract:
In this paper, we focus on the problem of automatic pedestrian detection for surveillance applications. Particularly, the main goal is to perform real-time detection from both visible and thermal cameras for complementary aspects. To handle that, a fusion network that uses features from both inputs and performs augmentation by means of visual saliency transformation is proposed. This fusion process is incorporated into YOLO-v3 as base architecture. The resulting detection model is trained in a paired setting in order to improve the results compared to the detection of each single input. To prove the effectiveness of the proposed fusion framework, several experiments are conducted on KAIST multi-spectral dataset. From the obtained results, it has been shown superior results compared to single inputs and to other fusion schemes. The proposed approach has also the advantage of a very low computational cost, which is quite important for real-time applications. To prove that, additional t
ests on a security robot are presented as well.
(More)