Map It Anywhere (MIA): Empowering Bird's Eye View Mapping using Large-scale Public Data

Ho, Cherie; Zou, Jiaye; Alama, Omar; Kumar, Sai Mitheran Jagadesh; Chiang, Benjamin; Gupta, Taneesh; Wang, Chen; Keetha, Nikhil; Sycara, Katia; Scherer, Sebastian

Computer Science > Computer Vision and Pattern Recognition

arXiv:2407.08726 (cs)

[Submitted on 11 Jul 2024 (v1), last revised 5 Dec 2024 (this version, v2)]

Title:Map It Anywhere (MIA): Empowering Bird's Eye View Mapping using Large-scale Public Data

Authors:Cherie Ho, Jiaye Zou, Omar Alama, Sai Mitheran Jagadesh Kumar, Benjamin Chiang, Taneesh Gupta, Chen Wang, Nikhil Keetha, Katia Sycara, Sebastian Scherer

View PDF HTML (experimental)

Abstract:Top-down Bird's Eye View (BEV) maps are a popular representation for ground robot navigation due to their richness and flexibility for downstream tasks. While recent methods have shown promise for predicting BEV maps from First-Person View (FPV) images, their generalizability is limited to small regions captured by current autonomous vehicle-based datasets. In this context, we show that a more scalable approach towards generalizable map prediction can be enabled by using two large-scale crowd-sourced mapping platforms, Mapillary for FPV images and OpenStreetMap for BEV semantic maps. We introduce Map It Anywhere (MIA), a data engine that enables seamless curation and modeling of labeled map prediction data from existing open-source map platforms. Using our MIA data engine, we display the ease of automatically collecting a dataset of 1.2 million pairs of FPV images & BEV maps encompassing diverse geographies, landscapes, environmental factors, camera models & capture scenarios. We further train a simple camera model-agnostic model on this data for BEV map prediction. Extensive evaluations using established benchmarks and our dataset show that the data curated by MIA enables effective pretraining for generalizable BEV map prediction, with zero-shot performance far exceeding baselines trained on existing datasets by 35%. Our analysis highlights the promise of using large-scale public maps for developing & testing generalizable BEV perception, paving the way for more robust autonomous navigation. Website: this https URL

Comments:	Accepted at the 38th Conference on Neural Information Processing Systems (NeurIPS 2024) Track on Datasets and Benchmarks. Website: this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2407.08726 [cs.CV]
	(or arXiv:2407.08726v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2407.08726

Submission history

From: Cherie Ho [view email]
[v1] Thu, 11 Jul 2024 17:57:22 UTC (8,816 KB)
[v2] Thu, 5 Dec 2024 18:54:08 UTC (13,560 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Map It Anywhere (MIA): Empowering Bird's Eye View Mapping using Large-scale Public Data

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Map It Anywhere (MIA): Empowering Bird's Eye View Mapping using Large-scale Public Data

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators