research-article

An adaptive network fusing light detection and ranging height-sliced bird’s-eye view and vision for place recognition

Authors:

Zhiyuan ZhangAuthors Info & Claims

Volume 137, Issue PB

https://doi.org/10.1016/j.engappai.2024.109230

Published: 01 November 2024 Publication History

Abstract

Place recognition, a fundamental component of robotic perception, aims to identify previously visited locations within an environment. In this study, we present a novel global descriptor that uses height-sliced Bird’s Eye View (BEV) from Light Detection and Ranging (LiDAR) and vision images, to facilitate high-recall place recognition in autonomous driving field. Our descriptor generation network, incorporates an adaptive weights generation branch to learn weights of visual and LiDAR features, enhancing its adaptability to different environments. The generated descriptor exhibits excellent yaw-invariance. The entire network is trained using a self-designed quadruplet loss, which discriminates inter-class boundaries and alleviates overfitting to one particular modality. We evaluate our approach on three benchmarks derived from two public datasets and achieve optimal performance on these evaluation sets. Our approach demonstrates excellent generalization ability and efficient runtime, which are indicative of its practical viability in real-world scenarios. For those interested in applying this Artificial Intelligence contribution to engineering, the implementation of our approach can be found at: https://github.com/Bryan-ZhengRui/LocFuse.

Graphical abstract

Display Omitted

References

[1]

Arandjelovic, R., Gronat, P., Torii, A., Pajdla, T., Sivic, J., 2016. NetVLAD: CNN architecture for weakly supervised place recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 5297–5307.

[2]

Arcanjo B., Ferrarini B., Milford M., McDonald-Maier K.D., Ehsan S., An efficient and scalable collection of fly-inspired voting units for visual place recognition in changing environments, IEEE Robot. Autom. Lett. 7 (2) (2022) 2527–2534.

[3]

Arshad S., Kim G.-W., A robust feature matching strategy for fast and effective visual place recognition in challenging environmental conditions, Int. J. Control Autom. Syst. 21 (3) (2023) 948–962.

[4]

Cai K., Wang B., Lu C.X., Autoplace: Robust place recognition with single-chip automotive radar, in: 2022 International Conference on Robotics and Automation, ICRA, IEEE, 2022, pp. 2222–2228.

[5]

Campos C., Elvira R., Rodríguez J.J.G., Montiel J.M., Tardós J.D., Orb-slam3: An accurate open-source library for visual, visual–inertial, and multimap slam, IEEE Trans. Robot. 37 (6) (2021) 1874–1890.

[6]

Chen X., Läbe T., Milioto A., Röhling T., Vysotska O., Haag A., Behley J., Stachniss C., OverlapNet: Loop closing for LiDAR-based SLAM, 2021, arXiv preprint arXiv:2105.11344.

[7]

Chen Z., Liu L., Sa I., Ge Z., Chli M., Learning context flexible attention model for long-term visual place recognition, IEEE Robot. Autom. Lett. 3 (4) (2018) 4015–4022.

[8]

Chen S., Zhou B., Jiang C., Xue W., Li Q., A lidar/visual slam backend with loop closure detection and graph optimization, Remote Sens. 13 (14) (2021) 2720.

[9]

Cheng J., Zhang L., Chen Q., Hu X., Cai J., A review of visual SLAM methods for autonomous driving vehicles, Eng. Appl. Artif. Intell. 114 (2022).

Digital Library

[10]

Cop K.P., Borges P.V., Dubé R., Delight: An efficient descriptor for global localisation using lidar intensities, in: 2018 IEEE International Conference on Robotics and Automation, ICRA, IEEE, 2018, pp. 3653–3660.

[11]

Du J., Wang R., Cremers D., Dh3d: Deep hierarchical 3d descriptors for robust large-scale 6dof relocalization, in: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part IV 16, Springer, 2020, pp. 744–762.

[12]

Fan, Z., Song, Z., Liu, H., Lu, Z., He, J., Du, X., 2022. Svt-net: Super light-weight sparse voxel transformer for large scale place recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 36, pp. 551–560.

[13]

Gálvez-López D., Tardos J.D., Bags of binary words for fast place recognition in image sequences, IEEE Trans. Robot. 28 (5) (2012) 1188–1197.

Digital Library

[14]

Geiger A., Lenz P., Urtasun R., Are we ready for autonomous driving? the kitti vision benchmark suite, in: 2012 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2012, pp. 3354–3361.

[15]

Glover A., Maddern W., Warren M., Reid S., Milford M., Wyeth G., OpenFABMAP: An open source toolbox for appearance-based loop closure detection, in: 2012 IEEE International Conference on Robotics and Automation, IEEE, 2012, pp. 4730–4735.

[16]

Guan, T., Muthuselvam, A., Hoover, M., Wang, X., Liang, J., Sathyamoorthy, A.J., Conover, D., Manocha, D., 2023. CrossLoc3D: Aerial-Ground Cross-Source 3D Place Recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 11335–11344.

[17]

Hausler, S., Garg, S., Xu, M., Milford, M., Fischer, T., 2021. Patch-netvlad: Multi-scale fusion of locally-global descriptors for place recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 14141–14152.

[18]

He L., Wang X., Zhang H., M2DP: A novel 3D point cloud descriptor and its application in loop closure detection, in: 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS, IEEE, 2016, pp. 231–237.

[19]

Hui L., Cheng M., Xie J., Yang J., Cheng M.-M., Efficient 3D point cloud feature learning for large-scale place recognition, IEEE Trans. Image Process. 31 (2022) 1258–1270.

[20]

Jégou H., Douze M., Schmid C., Pérez P., Aggregating local descriptors into a compact image representation, in: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE, 2010, pp. 3304–3311.

[21]

Jin S., Dai X., Meng Q., Loop closure detection with patch-level local features and visual saliency prediction, Eng. Appl. Artif. Intell. 120 (2023).

[22]

Kim G., Kim A., Scan context: Egocentric spatial descriptor for place recognition within 3d point cloud map, in: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS, IEEE, 2018, pp. 4802–4809.

[23]

Komorowski, J., 2021. Minkloc3d: Point cloud based large-scale place recognition. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. pp. 1790–1799.

[24]

Komorowski J., Wysoczańska M., Trzcinski T., MinkLoc++: lidar and monocular image fusion for place recognition, in: 2021 International Joint Conference on Neural Networks, IJCNN, IEEE, 2021, pp. 1–8.

[25]

Lai H., Yin P., Scherer S., Adafusion: Visual-lidar fusion with adaptive weights for place recognition, IEEE Robot. Autom. Lett. 7 (4) (2022) 12038–12045.

[26]

Lang, A.H., Vora, S., Caesar, H., Zhou, L., Yang, J., Beijbom, O., 2019. Pointpillars: Fast encoders for object detection from point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 12697–12705.

[27]

Liu Q., Chen B., Robust visual odometry using sparse optical flow network, Eng. Appl. Artif. Intell. 116 (2022).

[28]

Liu, Z., Zhou, S., Suo, C., Yin, P., Chen, W., Wang, H., Li, H., Liu, Y.-H., 2019. Lpd-net: 3d point cloud learning for large-scale place recognition and environment analysis. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 2831–2840.

[29]

Lu Y., Yang F., Chen F., Xie D., Pic-net: Point cloud and image collaboration network for large-scale place recognition, 2020, arXiv preprint arXiv:2008.00658.

[30]

Ma J., Xiong G., Xu J., Chen X., CVTNet: A cross-view transformer network for LiDAR-based place recognition in autonomous driving environments, IEEE Trans. Ind. Inform. (2023).

[31]

Ma J., Zhang J., Xu J., Ai R., Gu W., Chen X., OverlapTransformer: An efficient and yaw-angle-invariant transformer network for LiDAR-based place recognition, IEEE Robot. Autom. Lett. 7 (3) (2022) 6958–6965.

[32]

Maddern W., Pascoe G., Linegar C., Newman P., 1 year, 1000 km: The oxford robotcar dataset, Int. J. Robot. Res. 36 (1) (2017) 3–15.

Digital Library

[33]

Onyekpe U., Palade V., Herath A., Kanarachos S., Fitzpatrick M.E., WhONet: Wheel odometry neural network for vehicular localisation in GNSS-deprived environments, Eng. Appl. Artif. Intell. 105 (2021).

[34]

Pan Y., Xu X., Li W., Cui Y., Wang Y., Xiong R., Coral: Colored structural representation for bi-modal place recognition, in: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS, IEEE, 2021, pp. 2084–2091.

[35]

Röhling T., Mack J., Schulz D., A fast histogram-based similarity measure for detecting loop closures in 3-d lidar data, in: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS, IEEE, 2015, pp. 736–741.

[36]

Rosinol A., Leonard J.J., Carlone L., Nerf-slam: Real-time dense monocular slam with neural radiance fields, in: 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS, IEEE, 2023, pp. 3437–3444.

[37]

Schaupp L., Bürki M., Dubé R., Siegwart R., Cadena C., OREOS: Oriented recognition of 3D point clouds in outdoor scenarios, in: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS, IEEE, 2019, pp. 3255–3261.

[38]

Song, Z., Wei, H., Bai, L., Yang, L., Jia, C., 2023. GraphAlign: Enhancing accurate feature alignment by graph matching for multi-modal 3D object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 3358–3369.

[39]

Tsintotas K.A., Bampis L., Gasteratos A., The revisiting problem in simultaneous localization and mapping: A survey on visual loop closure detection, IEEE Trans. Intell. Transp. Syst. 23 (11) (2022) 19929–19953.

[40]

Uy, M.A., Lee, G.H., 2018. Pointnetvlad: Deep point cloud based retrieval for large-scale place recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 4470–4479.

[41]

Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A.N., Kaiser Ł., Polosukhin I., Attention is all you need, Adv. Neural Inf. Process. Syst. 30 (2017).

Digital Library

[42]

Wang Y., Sun Z., Xu C.-Z., Sarma S.E., Yang J., Kong H., Lidar iris for loop-closure detection, in: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS, IEEE, 2020, pp. 5769–5775.

[43]

Wang, W., Tran, D., Feiszli, M., 2020a. What makes training multi-modal classification networks hard?. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 12695–12705.

[44]

Weiler M., Cesa G., General e (2)-equivariant steerable cnns, Adv. Neural Inf. Process. Syst. 32 (2019).

[45]

Weiler, M., Hamprecht, F.A., Storath, M., 2018. Learning steerable filters for rotation equivariant cnns. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 849–858.

[46]

Xia, Y., Xu, Y., Li, S., Wang, R., Du, J., Cremers, D., Stilla, U., 2021. SOE-Net: A self-attention and orientation encoding network for point cloud based place recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 11348–11357.

[47]

Xie S., Pan C., Peng Y., Liu K., Ying S., Large-scale place recognition based on camera-lidar fused descriptor, Sensors 20 (10) (2020) 2870.

[48]

Xiu H., Liang Y., Zeng H., Li Q., Liu H., Fan B., Li C., Robust self-supervised monocular visual odometry based on prediction-update pose estimation network, Eng. Appl. Artif. Intell. 116 (2022).

[49]

Yin H., Wang Y., Ding X., Tang L., Huang S., Xiong R., 3D lidar-based global localization using siamese neural network, IEEE Trans. Intell. Transp. Syst. 21 (4) (2019) 1380–1392.

[50]

Zaffar M., Garg S., Milford M., Kooij J., Flynn D., McDonald-Maier K., Ehsan S., Vpr-bench: An open-source visual place recognition evaluation framework with quantifiable viewpoint and appearance change, Int. J. Comput. Vis. 129 (7) (2021) 2136–2174.

[51]

Zhang X., Wang L., Su Y., Visual place recognition: A survey from deep learning perspective, Pattern Recognit. 113 (2021).

[52]

Zhang, W., Xiao, C., 2019. PCAN: 3D attention map learning using contextual information for point cloud based retrieval. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 12436–12445.

[53]

Zhang J., Zhang Y., Liao M., Tian R., Coleman S., Kerr D., CapsLoc3D: Point cloud retrieval for large-scale place recognition based on 3D capsule networks, IEEE Trans. Intell. Transp. Syst. (2024).

[54]

Zhao T., Xu C., Ding M., Tomizuka M., Zhan W., Wei Y., Rsrd: A road surface reconstruction dataset and benchmark for safe and comfortable autonomous driving, 2023, arXiv preprint arXiv:2310.02262.

[55]

Zhao T., Yang L., Xie Y., Ding M., Tomizuka M., Wei Y., RoadBEV: Road surface reconstruction in bird’s eye view, 2024, arXiv preprint arXiv:2404.06605.

[56]

Zhou, Y., Tuzel, O., 2018. Voxelnet: End-to-end learning for point cloud based 3d object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 4490–4499.

[57]

Zhou Z., Xu J., Xiong G., Ma J., LCPR: A multi-scale attention-based lidar-camera fusion network for place recognition, IEEE Robot. Autom. Lett. (2023).

[58]

Zhou Z., Zhao C., Adolfsson D., Su S., Gao Y., Duckett T., Sun L., Ndt-transformer: Large-scale 3d point cloud localisation using the normal distribution transform representation, in: 2021 IEEE International Conference on Robotics and Automation, ICRA, IEEE, 2021, pp. 5654–5660.

[59]

Zhu Z., Wang J., Xu M., Lin S., Chen Z., Interpolationslam: An effective visual SLAM system based on interpolation network, Eng. Appl. Artif. Intell. 115 (2022).

Index Terms

An adaptive network fusing light detection and ranging height-sliced bird’s-eye view and vision for place recognition

Index terms have been assigned to the content through auto-classification.

Recommendations

Multi-View Facial Expression Recognition with Multi-View Facial Expression Light Weight Network
Abstract
Facial expression recognition for frontal faces has become a well-established research area in the last two decades. However, non-frontal facial expression recognition hasn’t been paid much attention until recently. In this paper, we propose an ...
Eye-referenced dynamic bounding box for face recognition using light convolutional neural network

Face recognition systems have imprinted its presence in many applications from offices to security to daily use as in personal digital gadgets. With many Face recognition systems in use, still there is scope for its performance improvement. The ...
Pose-Robust Facial Expression Recognition Using View-Based 2D + 3D AAM

This paper proposes a pose-robust face tracking and facial expression recognition method using a view-based 2D 3D active appearance model (AAM) that extends the 2D 3D AAM to the view-based approach, where one independent face model is used for a ...

Comments

Information & Contributors

Information

Published In

cover image Engineering Applications of Artificial Intelligence

Engineering Applications of Artificial Intelligence Volume 137, Issue PB

Nov 2024

1186 pages

Issue’s Table of Contents

Elsevier Ltd.

Publisher

Pergamon Press, Inc.

United States

Publication History

Published: 01 November 2024

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents