Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Advertisement

Investigating the Role of Image Retrieval for Visual Localization

An Exhaustive Benchmark

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Visual localization, i.e., camera pose estimation in a known scene, is a core component of technologies such as autonomous driving and augmented reality. State-of-the-art localization approaches often rely on image retrieval techniques for one of two purposes: (1) provide an approximate pose estimate or (2) determine which parts of the scene are potentially visible in a given query image. It is common practice to use state-of-the-art image retrieval algorithms for both of them. These algorithms are often trained for the goal of retrieving the same landmark under a large range of viewpoint changes which often differs from the requirements of visual localization. In order to investigate the consequences for visual localization, this paper focuses on understanding the role of image retrieval for multiple visual localization paradigms. First, we introduce a novel benchmark setup and compare state-of-the-art retrieval representations on multiple datasets using localization performance as metric. Second, we investigate several definitions of “ground truth” for image retrieval. Using these definitions as upper bounds for the visual localization paradigms, we show that there is still significant room for improvement. Third, using these tools and in-depth analysis, we show that retrieval performance on classical landmark retrieval or place recognition tasks correlates only for some but not all paradigms to localization performance. Finally, we analyze the effects of blur and dynamic scenes in the images. We conclude that there is a need for retrieval approaches specifically designed for localization paradigms. Our benchmark and evaluation protocols are available at https://github.com/naver/kapture-localization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Notes

  1. \({\mathbf {q}}_q = \sum _i w_i {\mathbf {q}}_i\) is re-normalized to be a unit quaternion.

  2. Note that compared to the query pose, the 3D points are very seldom co-linear with the reference poses and can thus be accurately triangulated.

  3. Note that only datasets with publicly available ground truth are used to generate these results. For Aachen Day-Night and InLoc, no GT poses were available (see Sect. 4.1).

  4. Code available at http://www.ok.ctrl.titech.ac.jp/~torii/project/247/.

  5. Matlab code and pretrained models are available at https://github.com/Relja/netvlad. We used the VGG-16-based NetVLAD model trained on Pitts30k (Arandjelović et al. 2016).

  6. Pytorch implementation and models are available at https://europe.naverlabs.com/Research/Computer-Vision/Learning-Visual-Representations/Deep-Image-Retrieval/.

  7. We used the TensorFlow code publicly available at https://github.com/tensorflow/models/tree/master/research/delf/delf/python/delg.

  8. Note that this is different from the model used in our 3DV paper (Pion et al. 2020), where we used a model with ResNet50 backbone trained on GLD v1.

  9. For InLoc, the viewpoint difference between the reference images is too large to allow robust feature matching and point triangulation. Using the available depth maps to obtain the 3D points for all features for local SFM is identical to the way we perform global SFM on InLoc. That is why for InLoc, we do not show results for local SFM.

  10. For the other datasets, GT camera poses are not available for the test images, which makes it hard to generate retrieval GT.

  11. https://www.pyimagesearch.com/2020/06/15/opencv-fast-fourier-transform-fft-for-blur-detection-in-images-and-video-streams/.

References

  • Arandjelović, R., Gronát, P., Torii, A., Pajdla, T., & Sivic, J. (2016). NetVLAD: CNN architecture for weakly supervised place recognition. In CVPR.

  • Arandjelović, R., & Zisserman, A. (2012). Three things everyone should know to improve object retrieval. In CVPR.

  • Arandjelović, R., & Zisserman, A. (2013). All about VLAD. In CVPR.

  • Arandjelović, R., & Zisserman, A. (2014) DisLocation: Scalable descriptor distinctiveness for location recognition. In ACCV (pp. 188–204). Springer.

  • Arth, C., Wagner, D., Klopschitz, M., Irschara, A., & Schmalstieg, D. (2009) Wide area localization on mobile phones. In IEEE International Symposium on Mixed and Augmented Reality.

  • Avrithis, Y., Kalantidis, Y., Tolias, G., & Spyrou, E. (2010). Retrieving landmark and non-landmark images from community photo collections. In ACMMM.

  • Babenko, A., & Lempitsky, V. (2015). Aggregating deep convolutional features for image retrieval. In ICCV.

  • Babenko, A., Slesarev, A., Chigorin, A., & Lempitsky, V. (2014). Neural codes for image retrieval. In ECCV.

  • Balntas, V., Li, S., & Prisacariu, V. (2018). RelocNet: Continuous metric learning relocalisation using neural nets. In ECCV.

  • Brachmann, E., Humenberger, M., Rother, C., & Sattler, T. (2021). On the limits of pseudo ground truth in visual camera re-localisation. In ICCV.

  • Brachmann, E., & Rother, C. (2018). Learning less is more—6D camera localization via 3D surface regression. In CVPR.

  • Brachmann, E., & Rother, C. (2019). Expert sample consensus applied to camera re-localization. In ICCV.

  • Brahmbhatt, S., Gu, J., Kim, K., Hays, J., & Kautz, J. (2018). Geometry-aware learning of maps for camera localization. In CVPR.

  • Brejcha, J., & Čadík, M. (2017). State-of-the-art in visual geo-localization. Pattern Analysis and Applications (PAA), 20(3), 613–637.

    Article  MathSciNet  Google Scholar 

  • Cao, B., Araujo, A., & Sim, J. (2020). Unifying deep local and global features for image search. In ECCV.

  • Cao, S., & Snavely, N. (2013). Graph-based discriminative learning for location recognition. In CVPR.

  • Castle, R., Klein, G., & Murray, D. (2008). Video-rate localization in multiple maps for wearable augmented reality. In IEEE international symposium on wearable computers.

  • Cavallari, T., Bertinetto, L., Mukhoti, J., Torr, P., & Golodetz, S. (2017). Let’s take this online: Adapting scene coordinate regression network predictions for online RGB-D camera relocalisation. In 3DV.

  • Cavallari, T., Golodetz, S., Lord, N., Valentin, J., Prisacariu, V., Di Stefano, L., & Torr, P. (2019). Real-time RGB-D camera pose estimation in novel scenes using a relocalisation cascade. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 42(10), 2465–2477.

    Article  Google Scholar 

  • Chen, D., Baatz, G., Köser, K., Tsai, S., Vedantham, R., Pylvänäinen, T., Roimela, K., Chen, X., Bach, J., Pollefeys, M., Girod, B., & Grzeszczuk, R. (2011). City-scale landmark identification on mobile devices. In CVPR.

  • Chum, O., & Matas, J. (2008). Optimal randomized RANSAC. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 30(8), 1472–1482.

    Article  Google Scholar 

  • Crandall, D., Backstrom, L., Huttenlocher, D., & Kleinberg, J. (2009). Mapping the world’s photos. In WWW.

  • Csurka, G., Dance, C., Fan, L., Willamowski, J., & Bray, C. (2004). Visual categorization with bags of keypoints. In ECCV Workshops.

  • Csurka, G., Dance, C., & Humenberger, M. (2018). From handcrafted to deep local invariant features. arXiv:1807.10254

  • Cui, Q., Fragoso, V., Sweeney, C., & Sen, P. (2017). GraphMatch: Efficient large-scale graph construction for structure from motion. In 3DV.

  • Deng, J., Guo, J., & Zafeiriou, S. (2019). ArcFace: Additive angular margin loss for deep face recognition. In CVPR.

  • Ding, M., Wang, Z., Sun, J., Shi, J., & Luo, P. (2019). CamNet: Coarse-to-fine retrieval for camera re-localization. In ICCV.

  • Dusmanu, M., Rocco, I., Pajdla, T., Pollefeys, M., Sivic, J., Torii, A., & Sattler, T. (2019). D2-Net: A trainable CNN for joint description and detection of local features. In CVPR

  • Fischler, M., & Bolles, R. (1981). Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, 24(6), 381–395.

    Article  MathSciNet  Google Scholar 

  • Garcia-Fidalgo, E., & Ortiz, A. (2015). Vision-based topological mapping and localization methods: A survey. Robotics and Autonomous Systems (RAS), 64(2), 1–20.

    Google Scholar 

  • Germain, H., Bourmaud, G., & Lepetit, V. (2019). Sparse-to-dense hypercolumn matching for long-term visual localization. In 3DV.

  • Gordo, A., Almazán, J., Revaud, J., & Larlus, D. (2017). End-to-end learning of deep visual representations for image retrieval. International Journal of Computer Vision (IJCV), 124, 237–254.

    Article  MathSciNet  Google Scholar 

  • Hausler, S., Garg, S., Xu, M., Milford, M., & Fischer, T. (2021). Patch-NetVLAD: Multi-scale fusion of locally-global descriptors for place recognition. In CVPR.

  • Hays, J., & Efros, A. (2008). IM2GPS: Estimating geographic information from a single image. In CVPR.

  • He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask R-CNN. In ICCV.

  • Heinly, J., Schönberger, J., Dunn, E., & Frahm, J. M. (2015). Reconstructing the world in six days as captured by the Yahoo 100 million image dataset. In CVPR.

  • Heng, L., Choi, B., Cui, Z., Geppert, M., Hu, S., Kuan, B., Liu, P., Nguyen, R., Yeo, Y., Geiger, A., Lee, G., Pollefeys, M., & Sattler, T. (2019). Project AutoVision: Localization and 3D scene perception for an autonomous vehicle with a multi-camera system. In ICRA.

  • Humenberger, M., Cabon, Y., Guerin, N., Morat, J., Revaud, J., Rerole, P., Pion, N., de Souza, C., Leroy, V., & Csurka, G. (2020). Robust image retrieval-based visual localization using Kapture. arXiv:2007.13867

  • Irschara, A., Zach, C., Frahm, J. M., & Bischof, H. (2009). From structure-from-motion point clouds to fast location recognition. In CVPR.

  • Jégou, H., & Chum, O. (2012). Negative evidences and co-occurrences in image retrieval: The benefit of PCA and whitening. In ECCV.

  • Jégou, H., Douze, M., Schmid, C., & Pérez, P. (2010). Aggregating local descriptors into a compact image representation. In CVPR.

  • Kalantidis, Y., Mellina, C., & Osindero, S. (2016). Cross-dimensional Weighting for aggregated deep convolutional features. In ECCV Workshops.

  • Kalantidis, Y., Tolias, G., Avrithis, Y., Phinikettos, M., Spyrou, E., Mylonas, P., & Kollias, S. (2011). VIRaL: Visual image retrieval and localization. Multimedia Tools and Applications (MTA), 74(9), 3121–3135.

    Google Scholar 

  • Kendall, A., & Cipolla, R. (2017). Geometric loss functions for camera pose regression with deep learning. In CVPR.

  • Kendall, A., Grimes, M., & Cipolla, R. (2015). PoseNet: A convolutional network for real-time 6-DOF camera relocalization. In ICCV.

  • Kim, H., Dunn, E., & Frahm, J. M. (2017). Learned contextual feature reweighting for image geo-localization. In CVPR.

  • Kneip, L., Scaramuzza, D., & Siegwart, R. (2011). A novel parametrization of the perspective-three-point problem for a direct computation of absolute camera position and orientation. In CVPR.

  • Knopp, J., Sivic, J., & Pajdla, T. (2010). Avoiding confusing features in place recognition. In ECCV.

  • Kukelova, Z., Bujnak, M., & Pajdla, T. (2013). Real-time solution to the absolute pose problem with unknown radial distortion and focal length. In ICCV.

  • Larsson, V., Kukelova, Z., & Zheng, Y. (2017). Making minimal solvers for absolute pose estimation compact and robust. In ICCV.

  • Laskar, Z., Melekhov, I., Kalia, S., & Kannala, J. (2017). Camera relocalization by computing pairwise relative poses using convolutional neural network. In ICCV Workshops.

  • Lebeda, K., Matas, J., & Chum, O. (2012). Fixing the locally optimized RANSAC. In BMVC.

  • Lee, D., Ryu, S., Yeon, S., Lee, Y., Kim, D., Han, C., Cabon, Y., Weinzaepfel, P., Guerin, N., Csurka, G., & Humenberger, M. (2021). Large-scale localization datasets in crowded indoor spaces. In CVPR.

  • Li, X., Wang, S., Zhao, Y., Verbeek, J., & Kannala, J. (2020). Hierarchical scene coordinate classification and regression for visual localization. In CVPR.

  • Li, Y., Crandall, D., & Huttenlocher, D. (2009). Landmark classification in large-scale image collections. In ICCV.

  • Li, Y., Snavely, N., & Huttenlocher, D. (2010). Location recognition using prioritized feature matching. In ECCV.

  • Li, Y., Snavely, N., Huttenlocher, D., & Fua, P. (2012) Worldwide pose estimation using 3D point clouds. In ECCV.

  • Lim, H., Sinha, S., Cohen, M., Uyttendaele, M., & Kim, H. (2015). Real-time monocular image-based 6-DoF localization. International Journal of Robotics Research, 34(4–5), 476–492.

    Article  Google Scholar 

  • Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. (2014). Microsoft COCO: Common objects in context. In ECCV.

  • Liu, L., Li, H., & Dai, Y. (2019). Stochastic attraction-repulsion embedding for large scale image localization. In ICCV.

  • Liu, R., Li, Z., & Jia, J. (2008). Image partial blur detection and classification. In CVPR.

  • Lowe, D. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision (IJCV), 60(2), 91–110.

    Article  Google Scholar 

  • Lowry, S., Sünderhauf, N., Newman, P., Leonard, J., Cox, D., Corke, P., & Milford, M. (2016). Visual place recognition: A survey. IEEE Transactions on Robotics, 32(1), 1–19.

    Article  Google Scholar 

  • Lu, F., & Milios, E. (1997). Globally consistent range scan alignment for environment mapping. Autonomous Robots, 4, 333–34.

    Article  Google Scholar 

  • Lynen, S., Sattler, T., Bosse, M., Hesch, J., Pollefeys, M., & Siegwart, R. (2015). Get out of my Lab: Large-scale, real-time visual-inertial localization. In RSS.

  • Maddern, W., Pascoe, G., Linegar, C., & Newman, P. (2017). 1 Year, 1000 km: The Oxford RobotCar dataset. International Journal of Robotics Research, 36(1), 3–15.

    Article  Google Scholar 

  • Massiceti, D., Krull, A., Brachmann, E., Rother, C., & Torr, P. (2017). Random forests versus neural networks—What’s best for camera localization? In ICRA.

  • Middelberg, S., Sattler, T., Untzelmann, O., & Kobbelt, L. (2014). Scalable 6-DoF localization on mobile devices. In ECCV.

  • Myers, J., & Well, A. (2003). Research design and statistical analysis. Lawrence Erlbaum Associates.

  • Noh, H., Araujo, A., Sim, J., Weyand, T., & Han, B. (2017) Large-scale image retrieval with attentive deep local features. In ICCV.

  • Pearson, K. (1985). Notes on regression and inheritance in the case of two parents. Proceedings of the Royal Society of London, 58, 240–242.

    Google Scholar 

  • Perronnin, F., & Dance, C. (2007). Fisher kernels on visual vocabularies for image categorization. In CVPR.

  • Philbin, J., Chum, O., Isard, M., Sivic, J., & Zisserman, A. (2007). Object retrieval with large vocabularies and fast spatial matching. In CVPR.

  • Philbin, J., Chum, O., Isard, M., Sivic, J., & Zisserman, A. (2008). Lost in quantization: Improving particular object retrieval in large scale image databases. In CVPR.

  • Piasco, N., Sidibé, D., Demonceaux, C., & Gouet-Brunet, V. (2018). A survey on visual-based localization: On the benefit of heterogeneous data. Pattern Recognition, 74(2), 90–109.

    Article  Google Scholar 

  • Pion, N., Humenberger, M., Csurka Khedari, G., Cabon, Y., & Torsten, S. (2020). Benchmarking image retrieval for visual localization. In 3DV.

  • Radenović, F., Iscen, A., Tolias, G., & Avrithis Yannis Chum, O. (2018). Revisiting Oxford and Paris: Large-scale image retrieval benchmarking. In CVPR.

  • Radenović, F., Tolias, G., & Chum, O. (2019). Fine-tuning CNN image retrieval with no human annotation. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 41(7), 1655–1668.

    Article  Google Scholar 

  • Razavian, A., Sullivan, J., Carlsson, S., & Maki, A. (2015). Visual instance retrieval with deep convolutional networks. ITE Transactions on Media Technology and Applications, 4(3), 251–258.

    Article  Google Scholar 

  • Revaud, J., Almazan, J., de Rezende, R. S., & de Souza, C. R. (2019a). Learning with average precision: Training image retrieval with a listwise loss. In ICCV.

  • Revaud, J., Weinzaepfel, P., De Souza, C., & Humenberger, M. (2019b). R2D2: Reliable and repeatable detectors and descriptors. In NeurIPS.

  • Revaud, J., Weinzaepfel, P., De Souza, C., Pion, N., Csurka, G., Cabon, Y., & Humenberger, M. (2019c). R2D2: Reliable and repeatable detectors and descriptors for joint sparse keypoint detection and local feature extraction. arXiv:1906.06195

  • Sarlin, P. E., Cadena, C., Siegwart, R., & Dymczyk, M. (2019). From coarse to fine: Robust hierarchical localization at large scale. In CVPR.

  • Sarlin, P. E., Unagar, A., Larsson, M., Germain, H., Toft, C., Larsson, V., Pollefeys, M., Lepetit, V., Hammarstrand, L., Kahl, F., & Sattler, T. (2021). Back to the feature: Learning robust camera localization from pixels to pose. In CVPR.

  • Sattler, T., Havlena, M., Radenović, F., Schindler, K., & Pollefeys, M. (2015). Hyperpoints and fine vocabularies for large-scale location recognition. In ICCV.

  • Sattler, T., Havlena, M., Schindler, K., & Pollefey, M. (2016). Large-scale location recognition and the geometric burstiness problem. In CVPR.

  • Sattler, T., Leibe, B., & Kobbelt, L. (2017). Efficient & effective prioritized matching for large-scale image-based localization. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 39(9), 1744–1756.

    Article  Google Scholar 

  • Sattler, T., Maddern, W., Toft, C., Torii, A., Hammarstrand, L., Stenborg, E., Safari, D., Okutomi, M., Pollefeys, M., Sivic, J., Kahl, F., & Pajdla, T. (2018). Benchmarking 6DoF outdoor visual localization in changing conditions. In CVPR.

  • Sattler, T., Weyand, T., Leibe, B., & Kobbelt, L. (2012). Image retrieval for image-based localization revisited. In BMVC.

  • Sattler, T., Zhou, Q., Pollefeys, M., & Leal-Taixé, L. (2019). Understanding the limitations of CNN-based absolute camera pose regression. In CVPR.

  • Schindler, G., Brown, M., & Szeliski, R. (2007). City-scale location recognition. In CVPR.

  • Schönberger, J., & Frahm, J. M. (2016). Structure-from-motion revisited. In CVPR.

  • Schönberger, J., Hardmeier, H., Sattler, T., & Pollefeys, M. (2017). Comparative evaluation of hand-crafted and learned local features. In CVPR.

  • Se, S., Lowe, D., & Little, J. (2002). Global localization using distinctive visual features. In IROS.

  • Shotton, J., Glocker, B., Zach, C., Izadi, S., Criminisi, A., & Fitzgibbon, A. (2013). Scene coordinate regression forests for camera relocalization in RGB-D images. In CVPR.

  • Sivic, J., & Zisserman, A. (2003). Video Google: A text retrieval approach to object matching in videos. In ICCV.

  • Snavely, N., Seitz, S., & Szeliski, R. (2008). Modeling the world from internet photo collections. International Journal of Computer Vision (IJCV), 80(2), 189–210.

    Article  Google Scholar 

  • Sun, X., Xie, Y., Luo, P., & Wang, L. (2017). A dataset for benchmarking image-based localization. In CVPR.

  • Taira, H., Okutomi, M., Sattler, T., Cimpoi, M., Pollefeys, M., Sivic, J., Pajdla, T., & Akihiko, T. (2018). InLoc: Indoor visual localization with dense matching and view synthesis. In CVPR.

  • Taira, H., Okutomi, M., Sattler, T., Cimpoi, M., Pollefeys, M., Sivic, J., Pajdla, T., & Akihiko, T. (2019a). InLoc: Indoor visual localization with dense matching and view synthesis. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI). (Early Acces).

  • Taira, H., Rocco, I., Sedlar, J., Okutomi, M., Sivic, J., Pajdla, T., Sattler, T., & Torii, A. (2019b). Is This the Right Place? Geometric-semantic pose verification for indoor visual localization. In ICCV.

  • Tang, S., Tang, C., Huang, R., Zhu, S., & Tan, P. (2021). Learning camera localization via dense scene matching. In CVPR.

  • Tolias, G., & Jégou, H. (2014). Visual query expansion with or without geometry: Refining local descriptors by feature aggregation. Computer Vision and Image Understanding (CVIU), 47(10), 3466–3476.

    Google Scholar 

  • Tolias, G., Sicre, R., & Jégou, H. (2016). Particular object retrieval with integral maxpooling of CNN activations. In ICLR.

  • Torii, A., Arandjelović, R., Sivic, J., Okutomi, M., & Pajdla, T. (2015a). 24/7 Place recognition by view synthesis. In CVPR.

  • Torii, A., Arandjelović, R., Sivic, J., Okutomi, M., & Pajdla, T. (2018). 24/7 Place recognition by view synthesis. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 40(2), 257–271.

    Article  Google Scholar 

  • Torii, A., Sivic, J., Okutomi, M., & Pajdla, T. (2015b). Visual place recognition with repetitive structures. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 37(11), 2346–2359.

  • Torii, A., Sivic, J., & Pajdla, T. (2011). Visual localization by linear combination of image descriptors. In ICCV Workshops.

  • Torii, A., Taira, H., Sivic, J., Pollefeys, M., Okutomi, M., Pajdla, T., & Sattler, T. (2021). Are large-scale 3D models really necessary for accurate visual localization? IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 43(3), 814–829.

    Article  Google Scholar 

  • Ventura, J., Arth, C., Reitmayr, G., & Schmalstieg, D. (2014). Global localization from monocular SLAM on a mobile phone. IEEE Transactions on Visualization and Computer Graphics, 20(4), 531–539.

    Article  Google Scholar 

  • Vo, N., Jacobs, N., & Hays, J. (2017). Revisiting IM2GPS in the deep learning era. In ICCV.

  • Walch, F., Hazirbas, C., Leal-Taixé, L., Sattler, T., Hilsenbeck, S., & Cremers, D. (2017). Image-based localization using LSTMs for structured feature correlation. In ICCV.

  • Weinzaepfel, P., Csurka, G., Cabon, Y., & Humenberger, M. (2019). Visual localization by learning objects-of-interest dense match regression. In CVPR.

  • Weyand, T., Araujo, A., Cao, B., & Sim, J. (2020). Google Landmarks dataset v2– A large-scale benchmark for instance-level recognition and retrieval. In CVPR.

  • Wijmans, E., & Furukawa, Y. (2017). Exploiting 2D floorplan for building-scale panorama RGB-D alignment. In CVPR.

  • Yang, L., Bai, Z., Tang, C., Li, H., Furukawa, Y., & Tan, P. (2019). SANet: Scene agnostic network for camera localization. In ICCV.

  • Zamir, A., Hakeem, A., Gool, L., Shah, M., & Richard, S. (2016). Large-scale visual geo-localization. In Advances in computer vision and pattern recognition. Springer.

  • Zamir, A. R., & Shah, M. (2010). Accurate image localization based on google maps street view. In ECCV.

  • Zhang, W., & Kosecka, J. (2006). Image based localization in urban environments. In International symposium on 3D data processing, visualization, and transmission.

  • Zhang, Z., Sattler, T., & Scaramuzza, D. (2021). Reference pose generation for long-term visual localization via learned features and view synthesis. International Journal of Computer Vision (IJCV), 129, 821–844.

    Article  Google Scholar 

  • Zheng, E., & Wu, C. (2015). Structure from motion using structure-less resection. In ICCV.

  • Zheng, L., Zhao, Y., Wang, S., Wang, J., & Tian, Q. (2016). Good practice in CNN feature transfer. arXiv:1604.00133

  • Zhou, Q., Sattler, T., Pollefeys, M., & Leal-Taixé, L. (2020). To learn or not to learn: Visual localization from essential matrices. In ICRA.

Download references

Acknowledgements

This work received funding through the EU Horizon 2020 research and innovation programme under Grant agreement No. 857306 (RICAIP) and the European Regional Development Fund under IMPACT No. CZ.02.1.01/0.0/0.0/15_003/0000468.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Martin Humenberger.

Additional information

Communicated by Jun Sato.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Noé Pion work was done during an appointment at NAVER LABS Europe, Meylan, France.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Humenberger, M., Cabon, Y., Pion, N. et al. Investigating the Role of Image Retrieval for Visual Localization. Int J Comput Vis 130, 1811–1836 (2022). https://doi.org/10.1007/s11263-022-01615-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-022-01615-7

Keywords