Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content

macdonaldezra/VistaFormer

Repository files navigation

VistaFormer

arXiv

This repository is the official code repository for the paper "VistaFormer: Scalable Vision Transformers for Satellite Image Time Series Segmentation"

Installation

To run code from this repository you will first need to install the project dependencies which can be done either using the requirements.txt file or the Poetry configuration. To install this project using the requirements.txt file, you can execute the following commands:

pip install -r requirements.txt
# To train the Neighbourhood Attention-based VistaFormer model, run the following
pip3 install natten==0.14.6 -f https://shi-labs.com/natten/wheels/cu117/torch1.13/index.html

Train Models

To use this repository to run inference on pre-trained weights or train a model on one of the datasets, please see the documentation in the datasets directory for instructions on how to download and create these datasets.

Once you have created a dataset you can train a model using the following commands:

export MODEL_CONFIG="very-real-model-config-path"
python -m vistaformer.train_and_evaluate.train

To evaluate the performance of a pre-trained model on a given dataset, please refer to the notebooks/inference.ipynb file to compute complete metrics.

Please note that pre-trained model weights and training logs for each trial that was reported in the results section of the accompanying paper will be released once an unanonymized name can accompany this repository.

Results on PASTIS (Optical only) Semantic Segmentation Benchmark

Model Name mIoU oA #Params (M) GFLOPs
U-TAE 63.1 83.2 1.1 23.06
TSViT † 65.4 83.4 1.6 91.88
VistaFormer(Neighourhood) 65.3 83.7 1.1 9.82
VistaFormer 65.5 84.0 1.3 7.7

Results on MTLCC Semantic Segmentation Benchmark

Model Name mIoU oA #Params (M) GFLOPs
U-TAE 77.1 93.1 1.1 23.06
TSViT 84.8 95.0 1.6 91.88
VistaFormer(Neighourhood) 88.5 96.1 1.1 9.82
VistaFormer 87.8 95.9 1.3 7.7

Note that the GFLOPS and parameter measurements are based on inputs with input dimensions (B, C, T, H, W) = (4, 10, 60, 32, 32).

Citation

If you find this work or code useful in your research, please consider citing using the following citation:

@article{macdonald_2024_vistaformer,
  title={VistaFormer: Scalable Vision Transformers for Satellite Image Time Series Segmentation},
  author={MacDonald, Ezra and Jacoby, Derek and Coady, Yvonne},
  journal={arXiv preprint arXiv:2409.08461},
  year={2024}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published