MCHA-Net: A multi-end composite higher-order attention network guided with hierarchical supervised signal for high-resolution remote sensing image change …

H Zhang, G Ma, Y Zhang, B Wang, H Li… - ISPRS Journal of …, 2023 - Elsevier
H Zhang, G Ma, Y Zhang, B Wang, H Li, L Fan
ISPRS Journal of Photogrammetry and Remote Sensing, 2023Elsevier
Change detection (CD) is the main way to detect changes in the Earth's surface features in a
timely and accurate manner and to understand the interactions between humans and
nature, CD is also an important scientific tool for supporting decision-making. Many
convolution-based methods and Transformer-based methods have gained remarkable
success in the field of CD with high-resolution remote sensing images (HRSIs) due to their
powerful feature extraction capability and global information modeling capability …
Abstract
Change detection (CD) is the main way to detect changes in the Earth’s surface features in a timely and accurate manner and to understand the interactions between humans and nature, CD is also an important scientific tool for supporting decision-making. Many convolution-based methods and Transformer-based methods have gained remarkable success in the field of CD with high-resolution remote sensing images (HRSIs) due to their powerful feature extraction capability and global information modeling capability, respectively. However, the diversity and complexity of HRSIs render CD methods constantly challenging, e.g., off-nadir angle imaging, seasonal turnover, and simultaneous changes in multiple feature types. Common convolution-based or Transformer-based encoding–decoding networks have a single way of data modeling and a low degree of feature fusion, resulting in poor applicability to different remote sensing data, poor recognition of different semantic targets, and limited accuracy. To improve the generalizability and detection accuracy of the network, we developed a composite higher-order attention network with multiple encoding paths named MCHA-Net. MCHA-Net has four encoding backbones, namely, the Siamese-learning path, Residual-learning path, and Transformer-learning path. Different encoding ways are equivalent to different thinking ends, and the integration can make the network have a stronger feature representation capability, forming a local–global-cross domain data modeling approach and making the network have powerful data sensing and mining capability. The decoding end aggregates the semantic information of each layer and integrates them into a unified linearized feature mapping module to achieve full modeling of the separability of information in the target domain. In addition, we propose a new higher-order attention mechanism to perform adaptive feature refinement for each layer of each encoding end, to guide the network in focusing on the targeted region and to guarantee the boundary integrity and internal compactness of the detection results. Moreover, we design a hierarchical network supervision schema that adds supervision signals at different feature abstraction levels to impose differentiated soft constraints on each layer of the network, ensuring high semantic consistency of features across layers and facilitating fast network fitting. Experimental results on three benchmark datasets (S2Looking, SVCD, and SYSU-CD) and one transfer application dataset (Google Dataset) show that MCHA-Net outperforms state-of-the-art methods in both visual interpretation and quantitative evaluation, and exhibits strong generalization and robustness against few-shot learning.
Elsevier