Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Active Semantic Localization with Graph Neural Embedding

  • Conference paper
  • First Online:
Pattern Recognition (ACPR 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14406))

Included in the following conference series:

  • 444 Accesses

Abstract

Semantic localization, i.e., robot self-localization with semantic image modality, is critical in recently emerging embodied AI applications such as point-goal navigation, object-goal navigation and vision-language navigation. However, most existing works on semantic localization have focused on passive vision tasks without viewpoint planning, or rely on additional rich modalities such as depth measurements. Thus, this problem largely remains unsolved. In this work, we explore a lightweight, entirely CPU-based, domain-adaptive semantic localization framework called Graph Neural Localizer. Our approach is inspired by two recently emerging technologies, including (1) scene graphs, which combine the viewpoint- and appearance-invariance of local and global features, and (2) graph neural networks, which enable direct learning and recognition of graph data (i.e., non-vector data). Specifically, a graph convolutional neural network is first trained as a scene graph classifier for passive vision, and then its knowledge is transferred to a reinforcement-learning planner for active vision. The results of experiments with self supervised learning and unsupervised domain adaptation scenarios with a photo-realistic Habitat simulator validate the effectiveness of the proposed method.

Supported by JSPS KAKENHI Grant Numbers 23K11270, 20K12008.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Boniardi, F., Valada, A., Mohan, R., Caselitz, T., Burgard, W.: Robot localization in floor plans using a room layout edge extraction network. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5291–5297. IEEE (2019)

    Google Scholar 

  2. Bonin-Font, F., Burguera, A.: Nethaloc: a learned global image descriptor for loop closing in underwater visual slam. Expert. Syst. 38(2), e12635 (2021)

    Article  Google Scholar 

  3. Cao, Y., Wang, C., Li, Z., Zhang, L., Zhang, L.: Spatial-bag-of-features. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 3352–3359. IEEE (2010)

    Google Scholar 

  4. Chaplot, D.S., Gandhi, D.P., Gupta, A., Salakhutdinov, R.R.: Object goal navigation using goal-oriented semantic exploration. Adv. Neural. Inf. Process. Syst. 33, 4247–4258 (2020)

    Google Scholar 

  5. Chaplot, D.S., Parisotto, E., Salakhutdinov, R.: Active neural localization. arXiv preprint arXiv:1801.08214 (2018)

  6. Cormack, G.V., Clarke, C.L., Buettcher, S.: Reciprocal rank fusion outperforms condorcet and individual rank learning methods. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 758–759 (2009)

    Google Scholar 

  7. Cummins, M., Newman, P.: Appearance-only slam at large scale with fab-map 2.0. Int. J. Robot. Res. 30(9), 1100–1123 (2011)

    Google Scholar 

  8. Datta, S., Maksymets, O., Hoffman, J., Lee, S., Batra, D., Parikh, D.: Integrating egocentric localization for more realistic point-goal navigation agents. In: Conference on Robot Learning, pp. 313–328. PMLR (2021)

    Google Scholar 

  9. Desai, S.S., Lee, S.: Auxiliary tasks for efficient learning of point-goal navigation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 717–725 (2021)

    Google Scholar 

  10. Garcia-Fidalgo, E., Ortiz, A.: iBoW-LCD: an appearance-based loop-closure detection approach using incremental bags of binary words. IEEE Robot. Autom. Lett. 3(4), 3051–3057 (2018)

    Article  Google Scholar 

  11. Gottipati, S.K., Seo, K., Bhatt, D., Mai, V., Murthy, K., Paull, L.: Deep active localization. IEEE Robot. Autom. Lett. 4(4), 4394–4401 (2019). https://doi.org/10.1109/LRA.2019.2932575

    Article  Google Scholar 

  12. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  13. Huang, G.: Particle filtering with analytically guided sampling. Adv. Robot. 31(17), 932–945 (2017)

    Article  Google Scholar 

  14. Kemker, R., McClure, M., Abitino, A., Hayes, T., Kanan, C.: Measuring catastrophic forgetting in neural networks. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)

    Google Scholar 

  15. Kim, G., Park, B., Kim, A.: 1-day learning, 1-year localization: long-term lidar localization using scan context image. IEEE Robot. Autom. Lett. 4(2), 1948–1955 (2019)

    Article  Google Scholar 

  16. Kim, K., et al.: Development of docking system for mobile robots using cheap infrared sensors. In: Proceedings of the 1st International Conference on Sensing Technology, pp. 287–291. Citeseer (2005)

    Google Scholar 

  17. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017)

    Article  Google Scholar 

  18. Kurauchi, K., Tanaka, K., Yamamoto, R., Yoshida, M.: Active domain-invariant self-localization using ego-centric and world-centric maps. In: Tistarelli, M., Dubey, S.R., Singh, S.K., Jiang, X. (eds.) Computer Vision and Machine Intelligence, pp. 475–487. Springer Nature Singapore, Singapore (2023). https://doi.org/10.1007/978-981-19-7867-8_38

    Chapter  Google Scholar 

  19. Kurland, O., Culpepper, J.S.: Fusion in information retrieval: Sigir 2018 half-day tutorial. In: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, pp. 1383–1386 (2018)

    Google Scholar 

  20. Laine, S., Aila, T.: Temporal ensembling for semi-supervised learning. arXiv preprint arXiv:1610.02242 (2016)

  21. Liu, X., Zhang, F., Hou, Z., Mian, L., Wang, Z., Zhang, J., Tang, J.: Self-supervised learning: generative or contrastive. IEEE Trans. Knowl. Data Eng. 35(1), 857–876 (2021)

    Google Scholar 

  22. Lowry, S., et al.: Visual place recognition: a survey. IEEE Trans. Robot. 32(1), 1–19 (2015)

    Google Scholar 

  23. Mancini, M., Bulo, S.R., Ricci, E., Caputo, B.: Learning deep NBNN representations for robust place categorization. IEEE Robot. Autom. Lett. 2(3), 1794–1801 (2017)

    Article  Google Scholar 

  24. Masone, C., Caputo, B.: A survey on deep visual place recognition. IEEE Access 9, 19516–19547 (2021)

    Article  Google Scholar 

  25. Mo, N., Gan, W., Yokoya, N., Chen, S.: Es6d: a computation efficient and symmetry-aware 6d pose regression framework. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6718–6727 (2022)

    Google Scholar 

  26. Ohta, T., Tanaka, K., Yamamoto, R.: Scene graph descriptors for visual place classification from noisy scene data. In: ICT Express (2023)

    Google Scholar 

  27. Ragab, M., et al.: ADATIME: a benchmarking suite for domain adaptation on time series data. arXiv preprint arXiv:2203.08321 (2022)

  28. Roy, A., Todorovic, S.: Monocular depth estimation using neural regression forest. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5506–5514 (2016)

    Google Scholar 

  29. Schönberger, J.L., Pollefeys, M., Geiger, A., Sattler, T.: Semantic visual localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6896–6906 (2018)

    Google Scholar 

  30. Shah, D., Xie, Q.: Q-learning with nearest neighbors. In: Advances in Neural Information Processing Systems, vol. 31 (2018)

    Google Scholar 

  31. Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Advances in Neural Information Processing Systems, vol. 27 (2014)

    Google Scholar 

  32. Song, Y., Soleymani, M.: Polysemous visual-semantic embedding for cross-modal retrieval. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1979–1988 (2019)

    Google Scholar 

  33. Sünderhauf, N., Shirazi, S., Dayoub, F., Upcroft, B., Milford, M.: On the performance of convnet features for place recognition. In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 4297–4304. IEEE (2015)

    Google Scholar 

  34. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An introduction. MIT Press, Cambridge (2018)

    Google Scholar 

  35. Szot, A., et al.: Habitat 2.0: training home assistants to rearrange their habitat. In: Advances in Neural Information Processing Systems (NeurIPS) (2021)

    Google Scholar 

  36. Toft, C., Olsson, C., Kahl, F.: Long-term 3d localization and pose from semantic labellings. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 650–659 (2017)

    Google Scholar 

  37. Tsintotas, K.A., Bampis, L., Gasteratos, A.: The revisiting problem in simultaneous localization and mapping: a survey on visual loop closure detection. IEEE Trans. Intell. Transp. Syst. 23(11), 19929–19953 (2022)

    Article  Google Scholar 

  38. Wang, H., Wang, W., Liang, W., Xiong, C., Shen, J.: Structured scene memory for vision-language navigation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8455–8464 (2021)

    Google Scholar 

  39. Wang, L., Li, D., Liu, H., Peng, J., Tian, L., Shan, Y.: Cross-dataset collaborative learning for semantic segmentation in autonomous driving. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 2487–2494 (2022)

    Google Scholar 

  40. Wang, M., et al.: Deep graph library: towards efficient and scalable deep learning on graphs. CoRR abs/1909.01315 (2019). http://arxiv.org/abs/1909.01315

  41. Weyand, T., Kostrikov, I., Philbin, J.: PlaNet - photo geolocation with convolutional neural networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 37–55. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_3

    Chapter  Google Scholar 

  42. Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., Philip, S.Y.: A comprehensive survey on graph neural networks. IEEE Trans. Neural Netw. Learn. Syst. 32(1), 4–24 (2020)

    Article  MathSciNet  Google Scholar 

  43. Xu, B., Zeng, Z., Lian, C., Ding, Z.: Few-shot domain adaptation via mixup optimal transport. IEEE Trans. Image Process. 31, 2518–2528 (2022)

    Article  Google Scholar 

  44. Xu, P., Chang, X., Guo, L., Huang, P.Y., Chen, X., Hauptmann, A.G.: A survey of scene graph: generation and application. IEEE Trans. Neural Netw. Learn. Syst 1 (2020)

    Google Scholar 

  45. Ye, J., Batra, D., Wijmans, E., Das, A.: Auxiliary tasks speed up learning point goal navigation. In: Conference on Robot Learning, pp. 498–516. PMLR (2021)

    Google Scholar 

  46. Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2881–2890 (2017)

    Google Scholar 

  47. Zhou, B., Zhao, H., Puig, X., Fidler, S., Barriuso, A., Torralba, A.: Scene parsing through ade20k dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 633–641 (2017)

    Google Scholar 

  48. Zhou, B., Krähenbühl, P.: Cross-view transformers for real-time map-view semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13760–13769 (2022)

    Google Scholar 

  49. Zhu, G., et al.: Scene graph generation: a comprehensive survey. arXiv preprint arXiv:2201.00443 (2022)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kanji Tanaka .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yoshida, M., Tanaka, K., Yamamoto, R., Iwata, D. (2023). Active Semantic Localization with Graph Neural Embedding. In: Lu, H., Blumenstein, M., Cho, SB., Liu, CL., Yagi, Y., Kamiya, T. (eds) Pattern Recognition. ACPR 2023. Lecture Notes in Computer Science, vol 14406. Springer, Cham. https://doi.org/10.1007/978-3-031-47634-1_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-47634-1_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-47633-4

  • Online ISBN: 978-3-031-47634-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics