Abstract
Semantic localization, i.e., robot self-localization with semantic image modality, is critical in recently emerging embodied AI applications such as point-goal navigation, object-goal navigation and vision-language navigation. However, most existing works on semantic localization have focused on passive vision tasks without viewpoint planning, or rely on additional rich modalities such as depth measurements. Thus, this problem largely remains unsolved. In this work, we explore a lightweight, entirely CPU-based, domain-adaptive semantic localization framework called Graph Neural Localizer. Our approach is inspired by two recently emerging technologies, including (1) scene graphs, which combine the viewpoint- and appearance-invariance of local and global features, and (2) graph neural networks, which enable direct learning and recognition of graph data (i.e., non-vector data). Specifically, a graph convolutional neural network is first trained as a scene graph classifier for passive vision, and then its knowledge is transferred to a reinforcement-learning planner for active vision. The results of experiments with self supervised learning and unsupervised domain adaptation scenarios with a photo-realistic Habitat simulator validate the effectiveness of the proposed method.
Supported by JSPS KAKENHI Grant Numbers 23K11270, 20K12008.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Boniardi, F., Valada, A., Mohan, R., Caselitz, T., Burgard, W.: Robot localization in floor plans using a room layout edge extraction network. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5291–5297. IEEE (2019)
Bonin-Font, F., Burguera, A.: Nethaloc: a learned global image descriptor for loop closing in underwater visual slam. Expert. Syst. 38(2), e12635 (2021)
Cao, Y., Wang, C., Li, Z., Zhang, L., Zhang, L.: Spatial-bag-of-features. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 3352–3359. IEEE (2010)
Chaplot, D.S., Gandhi, D.P., Gupta, A., Salakhutdinov, R.R.: Object goal navigation using goal-oriented semantic exploration. Adv. Neural. Inf. Process. Syst. 33, 4247–4258 (2020)
Chaplot, D.S., Parisotto, E., Salakhutdinov, R.: Active neural localization. arXiv preprint arXiv:1801.08214 (2018)
Cormack, G.V., Clarke, C.L., Buettcher, S.: Reciprocal rank fusion outperforms condorcet and individual rank learning methods. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 758–759 (2009)
Cummins, M., Newman, P.: Appearance-only slam at large scale with fab-map 2.0. Int. J. Robot. Res. 30(9), 1100–1123 (2011)
Datta, S., Maksymets, O., Hoffman, J., Lee, S., Batra, D., Parikh, D.: Integrating egocentric localization for more realistic point-goal navigation agents. In: Conference on Robot Learning, pp. 313–328. PMLR (2021)
Desai, S.S., Lee, S.: Auxiliary tasks for efficient learning of point-goal navigation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 717–725 (2021)
Garcia-Fidalgo, E., Ortiz, A.: iBoW-LCD: an appearance-based loop-closure detection approach using incremental bags of binary words. IEEE Robot. Autom. Lett. 3(4), 3051–3057 (2018)
Gottipati, S.K., Seo, K., Bhatt, D., Mai, V., Murthy, K., Paull, L.: Deep active localization. IEEE Robot. Autom. Lett. 4(4), 4394–4401 (2019). https://doi.org/10.1109/LRA.2019.2932575
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Huang, G.: Particle filtering with analytically guided sampling. Adv. Robot. 31(17), 932–945 (2017)
Kemker, R., McClure, M., Abitino, A., Hayes, T., Kanan, C.: Measuring catastrophic forgetting in neural networks. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)
Kim, G., Park, B., Kim, A.: 1-day learning, 1-year localization: long-term lidar localization using scan context image. IEEE Robot. Autom. Lett. 4(2), 1948–1955 (2019)
Kim, K., et al.: Development of docking system for mobile robots using cheap infrared sensors. In: Proceedings of the 1st International Conference on Sensing Technology, pp. 287–291. Citeseer (2005)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017)
Kurauchi, K., Tanaka, K., Yamamoto, R., Yoshida, M.: Active domain-invariant self-localization using ego-centric and world-centric maps. In: Tistarelli, M., Dubey, S.R., Singh, S.K., Jiang, X. (eds.) Computer Vision and Machine Intelligence, pp. 475–487. Springer Nature Singapore, Singapore (2023). https://doi.org/10.1007/978-981-19-7867-8_38
Kurland, O., Culpepper, J.S.: Fusion in information retrieval: Sigir 2018 half-day tutorial. In: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, pp. 1383–1386 (2018)
Laine, S., Aila, T.: Temporal ensembling for semi-supervised learning. arXiv preprint arXiv:1610.02242 (2016)
Liu, X., Zhang, F., Hou, Z., Mian, L., Wang, Z., Zhang, J., Tang, J.: Self-supervised learning: generative or contrastive. IEEE Trans. Knowl. Data Eng. 35(1), 857–876 (2021)
Lowry, S., et al.: Visual place recognition: a survey. IEEE Trans. Robot. 32(1), 1–19 (2015)
Mancini, M., Bulo, S.R., Ricci, E., Caputo, B.: Learning deep NBNN representations for robust place categorization. IEEE Robot. Autom. Lett. 2(3), 1794–1801 (2017)
Masone, C., Caputo, B.: A survey on deep visual place recognition. IEEE Access 9, 19516–19547 (2021)
Mo, N., Gan, W., Yokoya, N., Chen, S.: Es6d: a computation efficient and symmetry-aware 6d pose regression framework. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6718–6727 (2022)
Ohta, T., Tanaka, K., Yamamoto, R.: Scene graph descriptors for visual place classification from noisy scene data. In: ICT Express (2023)
Ragab, M., et al.: ADATIME: a benchmarking suite for domain adaptation on time series data. arXiv preprint arXiv:2203.08321 (2022)
Roy, A., Todorovic, S.: Monocular depth estimation using neural regression forest. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5506–5514 (2016)
Schönberger, J.L., Pollefeys, M., Geiger, A., Sattler, T.: Semantic visual localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6896–6906 (2018)
Shah, D., Xie, Q.: Q-learning with nearest neighbors. In: Advances in Neural Information Processing Systems, vol. 31 (2018)
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Advances in Neural Information Processing Systems, vol. 27 (2014)
Song, Y., Soleymani, M.: Polysemous visual-semantic embedding for cross-modal retrieval. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1979–1988 (2019)
Sünderhauf, N., Shirazi, S., Dayoub, F., Upcroft, B., Milford, M.: On the performance of convnet features for place recognition. In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4297–4304. IEEE (2015)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An introduction. MIT Press, Cambridge (2018)
Szot, A., et al.: Habitat 2.0: training home assistants to rearrange their habitat. In: Advances in Neural Information Processing Systems (NeurIPS) (2021)
Toft, C., Olsson, C., Kahl, F.: Long-term 3d localization and pose from semantic labellings. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 650–659 (2017)
Tsintotas, K.A., Bampis, L., Gasteratos, A.: The revisiting problem in simultaneous localization and mapping: a survey on visual loop closure detection. IEEE Trans. Intell. Transp. Syst. 23(11), 19929–19953 (2022)
Wang, H., Wang, W., Liang, W., Xiong, C., Shen, J.: Structured scene memory for vision-language navigation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8455–8464 (2021)
Wang, L., Li, D., Liu, H., Peng, J., Tian, L., Shan, Y.: Cross-dataset collaborative learning for semantic segmentation in autonomous driving. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 2487–2494 (2022)
Wang, M., et al.: Deep graph library: towards efficient and scalable deep learning on graphs. CoRR abs/1909.01315 (2019). http://arxiv.org/abs/1909.01315
Weyand, T., Kostrikov, I., Philbin, J.: PlaNet - photo geolocation with convolutional neural networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 37–55. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_3
Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., Philip, S.Y.: A comprehensive survey on graph neural networks. IEEE Trans. Neural Netw. Learn. Syst. 32(1), 4–24 (2020)
Xu, B., Zeng, Z., Lian, C., Ding, Z.: Few-shot domain adaptation via mixup optimal transport. IEEE Trans. Image Process. 31, 2518–2528 (2022)
Xu, P., Chang, X., Guo, L., Huang, P.Y., Chen, X., Hauptmann, A.G.: A survey of scene graph: generation and application. IEEE Trans. Neural Netw. Learn. Syst 1 (2020)
Ye, J., Batra, D., Wijmans, E., Das, A.: Auxiliary tasks speed up learning point goal navigation. In: Conference on Robot Learning, pp. 498–516. PMLR (2021)
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2881–2890 (2017)
Zhou, B., Zhao, H., Puig, X., Fidler, S., Barriuso, A., Torralba, A.: Scene parsing through ade20k dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 633–641 (2017)
Zhou, B., Krähenbühl, P.: Cross-view transformers for real-time map-view semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13760–13769 (2022)
Zhu, G., et al.: Scene graph generation: a comprehensive survey. arXiv preprint arXiv:2201.00443 (2022)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Yoshida, M., Tanaka, K., Yamamoto, R., Iwata, D. (2023). Active Semantic Localization with Graph Neural Embedding. In: Lu, H., Blumenstein, M., Cho, SB., Liu, CL., Yagi, Y., Kamiya, T. (eds) Pattern Recognition. ACPR 2023. Lecture Notes in Computer Science, vol 14406. Springer, Cham. https://doi.org/10.1007/978-3-031-47634-1_17
Download citation
DOI: https://doi.org/10.1007/978-3-031-47634-1_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-47633-4
Online ISBN: 978-3-031-47634-1
eBook Packages: Computer ScienceComputer Science (R0)