Abstract
Loop closure detection is one of the most important module in Simultaneously Localization and Mapping (SLAM) because it enables to find the global topology among different places. A loop closure is detected when the current place is recognized to match the previous visited places. When the SLAM is executed throughout a long-term period, there will be additional challenges for the loop closure detection. The illumination, weather, and vegetation conditions can often change significantly during the life-long SLAM, resulting in the critical strong perceptual aliasing and appearance variation problems in loop closure detection. In order to address this problem, we propose a new Robust Multimodal Sequence-based (ROMS) method for robust loop closure detection in long-term visual SLAM. A sequence of images is used as the representation of places in our ROMS method, where each image in the sequence is encoded by multiple feature modalites so that different places can be recognized discriminatively. We formulate the robust place recognition problem as a convex optimization problem with structured sparsity regularization due to the fact that only a small set of template places can match the query place. In addition, we also develop a new algorithm to solve the formulated optimization problem efficiently, which guarantees to converge to the global optima theoretically. Our ROMS method is evaluated through extensive experiments on three large-scale benchmark datasets, which record scenes ranging from different times of the day, months, and seasons. Experimental results demonstrate that our ROMS method outperforms the existing loop closure detection methods in long-term SLAM, and achieves the state-of-the-art performance.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
The template groups can be designed to have overlaps, e.g., using sliding window techniques. However, in the experiments, we found that groups with or without overlaps result in almost identical performance, as demonstrated by the example in Fig. 5c, since our method can activate highly similar scene templates outside of the selected group (and vise versa) to solve the sequence misalignment issue.
When \(\mathbf {D}\mathbf {a}_i - \mathbf {b}_i = \mathbf {0}\), Eq. 8 is not differentiable. Following Gorodnitsky and Rao (1997) and Wang et al. (2013), we can regularize the i-the diagonal element of the matrix \(\mathbf {U}\) using \(u_{ii} = \frac{1}{2\sqrt{\Vert \mathbf {D}\mathbf {a}_i - \mathbf {b}_i\Vert ^2_2 + \zeta }}\). Similarly, when \(\mathbf {a}^i = \mathbf {0}\), the ith diagonal element of the matrix \(\mathbf {V}\) can be regularized using \(\frac{1}{2\sqrt{\Vert \mathbf {a}^i\Vert _2^2 + \zeta }}\). When \(\mathbf {a}_i^j = \mathbf {0}\), we employ the same small perturbation to regularize the jth diagonal block of \(\mathbf {W}^i\) as \(\frac{1}{2\sqrt{\Vert \mathbf {a}_i^j\Vert ^2_2 + \zeta }} \mathbf {I}_j\). Then, the derived algorithm can be proved to minimize the following function: \(\sum _{i=1}^{s}\sqrt{\Vert \mathbf {D}\mathbf {a}_i - \mathbf {b}_i\Vert _2^2 + \zeta } + \lambda _1 \sum _{i=1}^{n}\sqrt{\Vert \mathbf {a}^i\Vert _2^2+ \zeta } + \lambda _2 \sum _{i=1}^{s}\sum _{j=1}^{k} \sqrt{\Vert \mathbf {a}_i^j\Vert _2^2 + \zeta }\). It is easy to verify that this new problem is reduced to the problem in Eq. 8, when \(\zeta \rightarrow 0\).
References
Angeli, A., Filliat, D., Doncieux, S., & Meyer, J. A. (2008). Fast and incremental method for loop-closure detection using bags of visual words. IEEE Transactions on Robotics, 24(5), 1027–1037.
Arroyo, R., Alcantarilla, P., Bergasa, L., & Romera, E. (2015). Towards life-long visual localization using an efficient matching of binary sequences from images. In IEEE international conference on robotics and automation.
Badino, H., Huber, D., & Kanade, T. (2012). Real-time topometric localization. In IEEE international conference on robotics and automation.
Cadena, C., Carlone, L., Carrillo, H., Latif, Y., Scaramuzza, D., Neira, J., et al. (2016). Past, present, and future of simultaneous localization and mapping: Toward the robust-perception age. IEEE Transactions on Robotics, 32(6), 1309–1332.
Cadena, C., Gálvez-López, D., Tardós, J. D., & Neira, J. (2012). Robust place recognition with stereo sequences. IEEE Transactions on Robotics, 28(4), 871–885.
Chen, C., & Wang, H. (2006). Appearance-based topological Bayesian inference for loop-closing detection in a cross-country environment. The International Journal of Robotics Research, 25(10), 953–983.
Cummins, M., & Newman, P. (2008). FAB-MAP: Probabilistic localization and mapping in the space of appearance. The International Journal of Robotics Research, 27(6), 647–665.
Cummins, M., & Newman, P. (2009). Highly scalable appearance-only SLAM-FAB-MAP 2.0. In Robotics: Science and systems.
Estrada, C., Neira, J., & Tardós, J. D. (2005). Hierarchical SLAM: Real-time accurate mapping of large environments. IEEE Transactions on Robotics, 21(4), 588–596.
Gálvez-López, D., & Tardós, J. D. (2012). Bags of binary words for fast place recognition in image sequences. IEEE Transactions on Robotics, 28(5), 1188–1197.
Glover, A. J., Maddern, W. P., Milford M. J., & Wyeth, G. F. (2010). FAB-MAP + RatSLAM: Appearance-based SLAM for multiple times of day. In IEEE international conference on robotics and automation.
Glover, A., Maddern, W., Warren, M., Reid, S., Milford, M., & Wyeth, G. (2012). OpenFABMAP: An open source toolbox for appearance-based loop closure detection. In IEEE international conference on robotics and automation.
Goldberg, S. B., Maimone, M. W., & Matthies, L. (2002). Stereo vision and rover navigation software for planetary exploration. In IEEE aerospace conference proceedings.
Gorodnitsky, I. F., & Rao, B. D. (1997). Sparse signal reconstruction from limited data using FOCUSS: A re-weighted minimum norm algorithm. IEEE Transactions on Signal Processing, 45(3), 600–616.
Gutmann, J. S., & Konolige, K. (1999). Incremental mapping of large cyclic environments. In IEEE international symposium on computational intelligence in robotics and automation.
Han, F., Wang, H., & Zhang, H. (2018). Learning of integrated holism-landmark representations for long-term loop closure detection. In AAAI conference on artificial intelligence.
Han, F., Yang, X., Deng, Y., Rentschler, M., Yang, D., & Zhang, H. (2017). SRAL: Shared representative appearance learning for long-term visual place recognition. IEEE Robotics and Automation Letters, 2(2), 1172–1179.
Hansen, P., & Browning, B. (2014). Visual place recognition using HMM sequence matching. In IEEE/RSJ international conference on intelligent robots and systems.
Henry, P., Krainin, M., Herbst, E., Ren, X., & Fox, D. (2012). RGB-D mapping: Using Kinect-style depth cameras for dense 3D modeling of indoor environments. The International Journal of Robotics Research, 31(5), 647–663.
Ho, K. L., & Newman, P. (2007). Detecting loop closure with scene sequences. International Journal of Computer Vision, 74(3), 261–286.
Johns, E., & Yang, G. Z. (2013). Feature co-occurrence maps: Appearance-based localisation throughout the day. In IEEE international conference on robotics and automation.
Kleiner, A., & Dornhege, C. (2007). Real-time localization and elevation mapping within urban search and rescue scenarios. Journal of Field Robotics, 24(8–9), 723–745.
Klopschitz, M., Zach, C., Irschara, A., & Schmalstieg, D. (2008). Generalized detection and merging of loop closures for video sequences. In 3D data processing, visualization, and transmission.
Labbe, M., & Michaud, F. (2013). Appearance-based loop closure detection for online large-scale and long-term operation. IEEE Transactions on Robotics, 29(3), 734–745.
Labbe, M., & Michaud, F. (2014). Online global loop closure detection for large-scale multi-session graph-based SLAM. In IEEE/RSJ international conference on intelligent robots and systems.
Latif, Y., Cadena, C., & Neira, J. (2013). Robust loop closing over time for pose graph SLAM. The International Journal of Robotics Research, 32, 1611–1626.
Latif, Y., Huang, G., Leonard, J., & Neira, J. (2014). An online sparsity-cognizant loop-closure algorithm for visual navigation. In Robotics: Science and systems conference.
Li, S., Huang, H., Zhang, Y., & Liu, M. (2015). An efficient multi-scale convolutional neural network for image classification based on PCA. In International conference on real-time computing and robotics.
Lowry, S., Sünderhauf, N., Newman, P., Leonard, J. J., Cox, D., Corke, P., et al. (2016). Visual place recognition: A survey. IEEE Transactions on Robotics, 32, 1.
Milford, M. J., & Wyeth, G. F. (2012). SeqSLAM: Visual route-based navigation for sunny summer days and stormy winter nights. In IEEE international conference on robotics and automation.
Milford, M. J., Wyeth, G. F., & Rasser, D. (2004). RatSLAM: A hippocampal model for simultaneous localization and mapping. In IEEE international conference on robotics and automation.
Mur-Artal, R., Montiel, J. M. M., & Tardos, J. D. (2015). ORB-SLAM: A versatile and accurate monocular SLAM system. IEEE Transactions on Robotics, 31(5), 1147–1163.
Mur-Artal, R., & Tardós, J. D. (2014). Fast relocalisation and loop closing in keyframe-based SLAM. In IEEE international conference on robotics and automation.
Naseer, T., Ruhnke, M., Stachniss, C., Spinello, L., & Burgard, W. (2015). Robust visual SLAM across seasons. In IEEE/RSJ international conference on intelligent robots and systems.
Naseer, T., Spinello, L., Burgard, W., & Stachniss, C. (2014). Robust visual robot localization across seasons using network flows. In AAAI conference on artificial intelligence.
Nie, F., Huang, H., Cai, X., & Ding, C. H. (2010). Efficient and robust feature selection via joint \(\ell _{2,1}\)-norms minimization. In Advances in neural information processing systems.
Pepperell, E., Corke, P., & Milford, M. J. (2014). All-environment visual place recognition with SMART. In IEEE international conference on robotics and automation.
Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems.
Santos, J. M., Couceiro, M. S., Portugal, D., & Rocha, R. P. (2015). A sensor fusion layer to cope with reduced visibility in SLAM. Journal of Intelligent & Robotic Systems, 80(3), 401–422.
Sünderhauf, N., Neubert, P., & Protzel, P. (2013). Are we there yet? Challenging SeqSLAM on a 3000 km journey across all four seasons. In Workshop on IEEE international conference on robotics and automation.
Sünderhauf, N., & Protzel, P. (2011). BRIEF-Gist—closing the loop by simple means. In IEEE/RSJ international conference on intelligent robots and systems.
Sünderhauf, N., Shirazi, S., Jacobson, A., Dayoub, F., Pepperell, E., Upcroft, B., & Milford, M. (2015). ConvNet landmarks: Viewpoint-robust, condition-robust, training-free. In Robotics: Science and systems.
Thrun, S., Burgard, W., & Fox, D. (2000). A real-time algorithm for mobile robot mapping with applications to multi-robot and 3D mapping. In IEEE international conference on robotics and automation.
Thrun, S., & Leonard, J. J. (2008). Simultaneous localization and mapping. In B. Siciliano & O. Khatib (Eds.), Springer handbook of robotics (pp. 871–889). Berlin: Springer.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B (Methodological), 58, 267–288.
Wang, H., Nie, F., & Huang, H. (2013). Multi-view clustering and feature learning via structured sparsity. In International conference on machine learning.
Zhang, H., Han, F., & Wang, H. (2016). Robust multimodal sequence-based loop closure detection via structured sparsity. In Robotics: Science and systems.
Acknowledgements
This work was partially supported by ARO W911NF-17-1-0447, NSF-IIS 1423591, and NSF-IIS 1652943.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This is one of several papers published in Autonomous Robots comprising the “Special Issue on Robotics Science and Systems”.
Appendix Proof of Lemma 1:
Appendix Proof of Lemma 1:
For any vector \(\tilde{\mathbf {v}}\) and \(\mathbf {v}\), the following inequality holds: \(\Vert \tilde{\mathbf {v}}\Vert _2 - \frac{\Vert \tilde{\mathbf {v}}\Vert _2^2}{2\Vert \mathbf {v}\Vert _2} \le \Vert \mathbf {v}\Vert _2 - \frac{\Vert \mathbf {v}\Vert _2^2}{2\Vert \mathbf {v}\Vert _2} \).
Proof
Obviously, the inequality \(-(\Vert \tilde{\mathbf {a}}\Vert _2 - \Vert \mathbf {a}\Vert _2)^2 \le 0\) holds. Thus, we have:
This completes the proof. \(\square \)
1.1 Proof of Theorem 1:
Algorithm 1 monotonically decreases the objective value of the problem in Eq. 8 in each iteration.
Proof
Assume the update of \(\mathbf {A}\) is \(\tilde{\mathbf {A}}\). According to Step 6 in Algorithm 1, we know that:
where \(Tr(\cdot )\) is the trace of a matrix. Thus, we can derive
According to the definition of \(\mathbf {U}\), \(\mathbf {V}\), and \(\mathbf {W}\), we have
According to Lemma 1, we can obtain the following inequalities:
Computing the summation of the three equations in Eq. 15 on both sides (weighted by \(\lambda \)s), we obtain:
Therefore, Algorithm 1 monotonically decreases the objective value in each iteration.\(\square \)
Rights and permissions
About this article
Cite this article
Han, F., Wang, H., Huang, G. et al. Sequence-based sparse optimization methods for long-term loop closure detection in visual SLAM. Auton Robot 42, 1323–1335 (2018). https://doi.org/10.1007/s10514-018-9736-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10514-018-9736-3