GeoReasoner: Geo-localization with Reasoning in Street Views using a Large Vision-Language Model

Li, Ling; Ye, Yu; Jiang, Bingchuan; Zeng, Wei

Computer Science > Computer Vision and Pattern Recognition

arXiv:2406.18572 (cs)

[Submitted on 3 Jun 2024]

Title:GeoReasoner: Geo-localization with Reasoning in Street Views using a Large Vision-Language Model

Authors:Ling Li, Yu Ye, Bingchuan Jiang, Wei Zeng

View PDF HTML (experimental)

Abstract:This work tackles the problem of geo-localization with a new paradigm using a large vision-language model (LVLM) augmented with human inference knowledge. A primary challenge here is the scarcity of data for training the LVLM - existing street-view datasets often contain numerous low-quality images lacking visual clues, and lack any reasoning inference. To address the data-quality issue, we devise a CLIP-based network to quantify the degree of street-view images being locatable, leading to the creation of a new dataset comprising highly locatable street views. To enhance reasoning inference, we integrate external knowledge obtained from real geo-localization games, tapping into valuable human inference capabilities. The data are utilized to train GeoReasoner, which undergoes fine-tuning through dedicated reasoning and location-tuning stages. Qualitative and quantitative evaluations illustrate that GeoReasoner outperforms counterpart LVLMs by more than 25% at country-level and 38% at city-level geo-localization tasks, and surpasses StreetCLIP performance while requiring fewer training resources. The data and code are available at this https URL.

Comments:	ICML 2024
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as:	arXiv:2406.18572 [cs.CV]
	(or arXiv:2406.18572v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2406.18572

Submission history

From: Ling Li [view email]
[v1] Mon, 3 Jun 2024 18:08:56 UTC (32,435 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:GeoReasoner: Geo-localization with Reasoning in Street Views using a Large Vision-Language Model

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:GeoReasoner: Geo-localization with Reasoning in Street Views using a Large Vision-Language Model

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators