A deformable attention network for high-resolution remote sensing images semantic segmentation

R Zuo, G Zhang, R Zhang, X Jia - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
R Zuo, G Zhang, R Zhang, X Jia
IEEE Transactions on Geoscience and Remote Sensing, 2021ieeexplore.ieee.org
Deformable convolutional networks (DCNs) can mitigate the inherent limited geometric
transformation. We reformulate the spatialwise attention mechanism using DCNs in this
article for semantic segmentation of high-resolution remote sensing (HRRS) images. It
combines the sparse spatial sampling strategy and the long-range relationship modeling
capability, namely, deformable attention module (DAM). Such locality awareness, more
adaptable to HRRS image structures, can capture each pixel's neighboring structural …
Deformable convolutional networks (DCNs) can mitigate the inherent limited geometric transformation. We reformulate the spatialwise attention mechanism using DCNs in this article for semantic segmentation of high-resolution remote sensing (HRRS) images. It combines the sparse spatial sampling strategy and the long-range relationship modeling capability, namely, deformable attention module (DAM). Such locality awareness, more adaptable to HRRS image structures, can capture each pixel’s neighboring structural information. A reasonable multiscale deformable attention net (MDANet) is designed for the HRRS image semantic segmentation with a slightly increased computational cost based on the proposed DAM. Specifically, standard convolutional layers in the raw ResNet50 are equipped with a DAM to control sampling over a broader range of feature levels and aggregate multiscale context information. The experimental results evaluated on Vaihingen and DeepGlobe Land Cover Classification datasets show that the performance accuracy of MDANet is improved by 7.77% and 8.45% compared with the backbone network (ResNet50) in terms of Miou evaluation, respectively. Furthermore, a DAM can perform better than a global spatial attention mechanism with less computation on the feature map. In addition, the added ablation studies demonstrate the effectiveness and efficiency of the DAM and multiscale strategy, respectively. Moreover, the sensitivity of critical hyperparameters is analyzed.
ieeexplore.ieee.org