Semantics-aware visual localization under challenging perceptual conditions

T Naseer, GL Oliveira, T Brox… - 2017 IEEE International …, 2017 - ieeexplore.ieee.org
2017 IEEE International Conference on Robotics and Automation (ICRA), 2017ieeexplore.ieee.org
Visual place recognition under difficult perceptual conditions remains a challenging problem
due to changing weather conditions, illumination and seasons. Long-term visual navigation
approaches for robot localization should be robust to these dynamics of the environment.
Existing methods typically leverage feature descriptions of whole images or image regions
from Deep Convolutional Neural Networks. Some approaches also exploit sequential
information to alleviate the problem of spatially inconsistent and non-perfect image matches …
Visual place recognition under difficult perceptual conditions remains a challenging problem due to changing weather conditions, illumination and seasons. Long-term visual navigation approaches for robot localization should be robust to these dynamics of the environment. Existing methods typically leverage feature descriptions of whole images or image regions from Deep Convolutional Neural Networks. Some approaches also exploit sequential information to alleviate the problem of spatially inconsistent and non-perfect image matches. In this paper, we propose a novel approach for learning a discriminative holistic image representation which exploits the image content to create a dense and salient scene description. These salient descriptions are learnt over a variety of datasets under large perceptual changes. Such an approach enables us to precisely segment the regions of an image which are geometrically stable over large time lags. We combine features from these salient regions and an off-the-shelf holistic representation to form a more robust scene descriptor. We also introduce a semantically labeled dataset which captures extreme perceptual and structural scene dynamics over the course of 3 years. We evaluated our approach with extensive experiments on data collected over several kilometers in Freiburg and show that our learnt image representation outperforms off-the-shelf features from the deep networks and hand-crafted features.
ieeexplore.ieee.org