Active Semantic Localization with Graph Neural Embedding

Yoshida, Mitsuki; Tanaka, Kanji; Yamamoto, Ryogo; Iwata, Daiki

doi:10.1007/978-3-031-47634-1_17

Mitsuki Yoshida¹³,
Kanji Tanaka¹³,
Ryogo Yamamoto¹³ &
…
Daiki Iwata¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14406))

Included in the following conference series:

Asian Conference on Pattern Recognition

444 Accesses

Abstract

Semantic localization, i.e., robot self-localization with semantic image modality, is critical in recently emerging embodied AI applications such as point-goal navigation, object-goal navigation and vision-language navigation. However, most existing works on semantic localization have focused on passive vision tasks without viewpoint planning, or rely on additional rich modalities such as depth measurements. Thus, this problem largely remains unsolved. In this work, we explore a lightweight, entirely CPU-based, domain-adaptive semantic localization framework called Graph Neural Localizer. Our approach is inspired by two recently emerging technologies, including (1) scene graphs, which combine the viewpoint- and appearance-invariance of local and global features, and (2) graph neural networks, which enable direct learning and recognition of graph data (i.e., non-vector data). Specifically, a graph convolutional neural network is first trained as a scene graph classifier for passive vision, and then its knowledge is transferred to a reinforcement-learning planner for active vision. The results of experiments with self supervised learning and unsupervised domain adaptation scenarios with a photo-realistic Habitat simulator validate the effectiveness of the proposed method.

Supported by JSPS KAKENHI Grant Numbers 23K11270, 20K12008.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

DGMem: learning visual navigation policy without any labels by dynamic graph memory

Article 27 June 2024

Improving Target-driven Visual Navigation with Attention on 3D Spatial Relationships

Article 23 March 2022

Graph Neural Networks for Human-Aware Social Navigation

References

Boniardi, F., Valada, A., Mohan, R., Caselitz, T., Burgard, W.: Robot localization in floor plans using a room layout edge extraction network. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5291–5297. IEEE (2019)
Google Scholar
Bonin-Font, F., Burguera, A.: Nethaloc: a learned global image descriptor for loop closing in underwater visual slam. Expert. Syst. 38(2), e12635 (2021)
Article Google Scholar
Cao, Y., Wang, C., Li, Z., Zhang, L., Zhang, L.: Spatial-bag-of-features. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 3352–3359. IEEE (2010)
Google Scholar
Chaplot, D.S., Gandhi, D.P., Gupta, A., Salakhutdinov, R.R.: Object goal navigation using goal-oriented semantic exploration. Adv. Neural. Inf. Process. Syst. 33, 4247–4258 (2020)
Google Scholar
Chaplot, D.S., Parisotto, E., Salakhutdinov, R.: Active neural localization. arXiv preprint arXiv:1801.08214 (2018)
Cormack, G.V., Clarke, C.L., Buettcher, S.: Reciprocal rank fusion outperforms condorcet and individual rank learning methods. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 758–759 (2009)
Google Scholar
Cummins, M., Newman, P.: Appearance-only slam at large scale with fab-map 2.0. Int. J. Robot. Res. 30(9), 1100–1123 (2011)
Google Scholar
Datta, S., Maksymets, O., Hoffman, J., Lee, S., Batra, D., Parikh, D.: Integrating egocentric localization for more realistic point-goal navigation agents. In: Conference on Robot Learning, pp. 313–328. PMLR (2021)
Google Scholar
Desai, S.S., Lee, S.: Auxiliary tasks for efficient learning of point-goal navigation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 717–725 (2021)
Google Scholar
Garcia-Fidalgo, E., Ortiz, A.: iBoW-LCD: an appearance-based loop-closure detection approach using incremental bags of binary words. IEEE Robot. Autom. Lett. 3(4), 3051–3057 (2018)
Article Google Scholar
Gottipati, S.K., Seo, K., Bhatt, D., Mai, V., Murthy, K., Paull, L.: Deep active localization. IEEE Robot. Autom. Lett. 4(4), 4394–4401 (2019). https://doi.org/10.1109/LRA.2019.2932575
Article Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Huang, G.: Particle filtering with analytically guided sampling. Adv. Robot. 31(17), 932–945 (2017)
Article Google Scholar
Kemker, R., McClure, M., Abitino, A., Hayes, T., Kanan, C.: Measuring catastrophic forgetting in neural networks. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)
Google Scholar
Kim, G., Park, B., Kim, A.: 1-day learning, 1-year localization: long-term lidar localization using scan context image. IEEE Robot. Autom. Lett. 4(2), 1948–1955 (2019)
Article Google Scholar
Kim, K., et al.: Development of docking system for mobile robots using cheap infrared sensors. In: Proceedings of the 1st International Conference on Sensing Technology, pp. 287–291. Citeseer (2005)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017)
Article Google Scholar
Kurauchi, K., Tanaka, K., Yamamoto, R., Yoshida, M.: Active domain-invariant self-localization using ego-centric and world-centric maps. In: Tistarelli, M., Dubey, S.R., Singh, S.K., Jiang, X. (eds.) Computer Vision and Machine Intelligence, pp. 475–487. Springer Nature Singapore, Singapore (2023). https://doi.org/10.1007/978-981-19-7867-8_38
Chapter Google Scholar
Kurland, O., Culpepper, J.S.: Fusion in information retrieval: Sigir 2018 half-day tutorial. In: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, pp. 1383–1386 (2018)
Google Scholar
Laine, S., Aila, T.: Temporal ensembling for semi-supervised learning. arXiv preprint arXiv:1610.02242 (2016)
Liu, X., Zhang, F., Hou, Z., Mian, L., Wang, Z., Zhang, J., Tang, J.: Self-supervised learning: generative or contrastive. IEEE Trans. Knowl. Data Eng. 35(1), 857–876 (2021)
Google Scholar
Lowry, S., et al.: Visual place recognition: a survey. IEEE Trans. Robot. 32(1), 1–19 (2015)
Google Scholar
Mancini, M., Bulo, S.R., Ricci, E., Caputo, B.: Learning deep NBNN representations for robust place categorization. IEEE Robot. Autom. Lett. 2(3), 1794–1801 (2017)
Article Google Scholar
Masone, C., Caputo, B.: A survey on deep visual place recognition. IEEE Access 9, 19516–19547 (2021)
Article Google Scholar
Mo, N., Gan, W., Yokoya, N., Chen, S.: Es6d: a computation efficient and symmetry-aware 6d pose regression framework. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6718–6727 (2022)
Google Scholar
Ohta, T., Tanaka, K., Yamamoto, R.: Scene graph descriptors for visual place classification from noisy scene data. In: ICT Express (2023)
Google Scholar
Ragab, M., et al.: ADATIME: a benchmarking suite for domain adaptation on time series data. arXiv preprint arXiv:2203.08321 (2022)
Roy, A., Todorovic, S.: Monocular depth estimation using neural regression forest. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5506–5514 (2016)
Google Scholar
Schönberger, J.L., Pollefeys, M., Geiger, A., Sattler, T.: Semantic visual localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6896–6906 (2018)
Google Scholar
Shah, D., Xie, Q.: Q-learning with nearest neighbors. In: Advances in Neural Information Processing Systems, vol. 31 (2018)
Google Scholar
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Advances in Neural Information Processing Systems, vol. 27 (2014)
Google Scholar
Song, Y., Soleymani, M.: Polysemous visual-semantic embedding for cross-modal retrieval. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1979–1988 (2019)
Google Scholar
Sünderhauf, N., Shirazi, S., Dayoub, F., Upcroft, B., Milford, M.: On the performance of convnet features for place recognition. In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4297–4304. IEEE (2015)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An introduction. MIT Press, Cambridge (2018)
Google Scholar
Szot, A., et al.: Habitat 2.0: training home assistants to rearrange their habitat. In: Advances in Neural Information Processing Systems (NeurIPS) (2021)
Google Scholar
Toft, C., Olsson, C., Kahl, F.: Long-term 3d localization and pose from semantic labellings. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 650–659 (2017)
Google Scholar
Tsintotas, K.A., Bampis, L., Gasteratos, A.: The revisiting problem in simultaneous localization and mapping: a survey on visual loop closure detection. IEEE Trans. Intell. Transp. Syst. 23(11), 19929–19953 (2022)
Article Google Scholar
Wang, H., Wang, W., Liang, W., Xiong, C., Shen, J.: Structured scene memory for vision-language navigation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8455–8464 (2021)
Google Scholar
Wang, L., Li, D., Liu, H., Peng, J., Tian, L., Shan, Y.: Cross-dataset collaborative learning for semantic segmentation in autonomous driving. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 2487–2494 (2022)
Google Scholar
Wang, M., et al.: Deep graph library: towards efficient and scalable deep learning on graphs. CoRR abs/1909.01315 (2019). http://arxiv.org/abs/1909.01315
Weyand, T., Kostrikov, I., Philbin, J.: PlaNet - photo geolocation with convolutional neural networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 37–55. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_3
Chapter Google Scholar
Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., Philip, S.Y.: A comprehensive survey on graph neural networks. IEEE Trans. Neural Netw. Learn. Syst. 32(1), 4–24 (2020)
Article MathSciNet Google Scholar
Xu, B., Zeng, Z., Lian, C., Ding, Z.: Few-shot domain adaptation via mixup optimal transport. IEEE Trans. Image Process. 31, 2518–2528 (2022)
Article Google Scholar
Xu, P., Chang, X., Guo, L., Huang, P.Y., Chen, X., Hauptmann, A.G.: A survey of scene graph: generation and application. IEEE Trans. Neural Netw. Learn. Syst 1 (2020)
Google Scholar
Ye, J., Batra, D., Wijmans, E., Das, A.: Auxiliary tasks speed up learning point goal navigation. In: Conference on Robot Learning, pp. 498–516. PMLR (2021)
Google Scholar
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2881–2890 (2017)
Google Scholar
Zhou, B., Zhao, H., Puig, X., Fidler, S., Barriuso, A., Torralba, A.: Scene parsing through ade20k dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 633–641 (2017)
Google Scholar
Zhou, B., Krähenbühl, P.: Cross-view transformers for real-time map-view semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13760–13769 (2022)
Google Scholar
Zhu, G., et al.: Scene graph generation: a comprehensive survey. arXiv preprint arXiv:2201.00443 (2022)

Download references

Author information

Authors and Affiliations

University of Fukui, Bunkyo, Fukui, 3-9-1, Japan
Mitsuki Yoshida, Kanji Tanaka, Ryogo Yamamoto & Daiki Iwata

Authors

Mitsuki Yoshida
View author publications
You can also search for this author in PubMed Google Scholar
Kanji Tanaka
View author publications
You can also search for this author in PubMed Google Scholar
Ryogo Yamamoto
View author publications
You can also search for this author in PubMed Google Scholar
Daiki Iwata
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kanji Tanaka .

Editor information

Editors and Affiliations

Kyushu Institute of Technology, Kitakyushu, Fukuoka, Japan
Huimin Lu
The University of Sydney, Sydney, NSW, Australia
Michael Blumenstein
Yonsei University, Seoul, Korea (Republic of)
Sung-Bae Cho
Chinese Academy of Sciences, Beijing, China
Cheng-Lin Liu
Osaka University, Osaka, Ibaraki, Japan
Yasushi Yagi
Kyushu Institute of Technology, Kitakyushu, Japan
Tohru Kamiya

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yoshida, M., Tanaka, K., Yamamoto, R., Iwata, D. (2023). Active Semantic Localization with Graph Neural Embedding. In: Lu, H., Blumenstein, M., Cho, SB., Liu, CL., Yagi, Y., Kamiya, T. (eds) Pattern Recognition. ACPR 2023. Lecture Notes in Computer Science, vol 14406. Springer, Cham. https://doi.org/10.1007/978-3-031-47634-1_17

Download citation

DOI: https://doi.org/10.1007/978-3-031-47634-1_17
Published: 05 November 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-47633-4
Online ISBN: 978-3-031-47634-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Active Semantic Localization with Graph Neural Embedding