we design a new architecture called TESR (Two-stage approach for Enhancement and Super Resolution) leveraging the power of Vision Transformers (ViT) and Diffusion Model (DM) to increase the resolution of RS images artificially. The first stage is Vision Transformer (ViT) based model, which serves for resolution increase. The second stage is an iterative Diffusion Model (DM) pre-trained on a larger dataset and that serves for quality image enhancement. Every stage is trained separately on the given task using a separate dataset.
link of UC Merced Land Use Dataset
SwinIR pre-train weights swinIR_weights, Diffusion model pre-train weights DM_weights
# Result