A Billion-scale Foundation Model for Remote Sensing Images

Cha, Keumgang; Seo, Junghoon; Lee, Taekyung

doi:10.1109/JSTARS.2024.3401772

Computer Science > Computer Vision and Pattern Recognition

arXiv:2304.05215 (cs)

[Submitted on 11 Apr 2023 (v1), last revised 12 Aug 2024 (this version, v4)]

Title:A Billion-scale Foundation Model for Remote Sensing Images

Authors:Keumgang Cha, Junghoon Seo, Taekyung Lee

View PDF HTML (experimental)

Abstract:As the potential of foundation models in visual tasks has garnered significant attention, pretraining these models before downstream tasks has become a crucial step. The three key factors in pretraining foundation models are the pretraining method, the size of the pretraining dataset, and the number of model parameters. Recently, research in the remote sensing field has focused primarily on the pretraining method and the size of the dataset, with limited emphasis on the number of model parameters. This paper addresses this gap by examining the effect of increasing the number of model parameters on the performance of foundation models in downstream tasks such as rotated object detection and semantic segmentation. We pretrained foundation models with varying numbers of parameters, including 86M, 605.26M, 1.3B, and 2.4B, to determine whether performance in downstream tasks improved with an increase in parameters. To the best of our knowledge, this is the first billion-scale foundation model in the remote sensing field. Furthermore, we propose an effective method for scaling up and fine-tuning a vision transformer in the remote sensing field. To evaluate general performance in downstream tasks, we employed the DOTA v2.0 and DIOR-R benchmark datasets for rotated object detection, and the Potsdam and LoveDA datasets for semantic segmentation. Experimental results demonstrated that, across all benchmark datasets and downstream tasks, the performance of the foundation models and data efficiency improved as the number of parameters increased. Moreover, our models achieve the state-of-the-art performance on several datasets including DIOR-R, Postdam, and LoveDA.

Comments:	This manuscript is the accepted version for IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (IEEE J-STARS)
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2304.05215 [cs.CV]
	(or arXiv:2304.05215v4 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2304.05215
Related DOI:	https://doi.org/10.1109/JSTARS.2024.3401772

Submission history

From: Junghoon Seo [view email]
[v1] Tue, 11 Apr 2023 13:33:45 UTC (25,734 KB)
[v2] Mon, 13 May 2024 05:00:58 UTC (33,114 KB)
[v3] Tue, 14 May 2024 06:33:02 UTC (33,114 KB)
[v4] Mon, 12 Aug 2024 03:33:12 UTC (33,114 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:A Billion-scale Foundation Model for Remote Sensing Images

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:A Billion-scale Foundation Model for Remote Sensing Images

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators