1. Introduction
With the availability of increasingly high-resolution (HR) satellite images, remote sensing is extensively utilized in many fields, such as urban planning, forest fire monitoring, and vegetation phenology change [
1,
2,
3,
4]. As an important application of remote sensing technology, change detection, as well as its role in revealing changes in land cover, is now one of the critical research hotspots due to the close relationship between residents and their environment [
5,
6,
7,
8].
Change detection (CD) refers to the process of detecting changes in the land surface from bi-temporal, multi-temporal, and time series images acquired by different types of sensors [
9,
10,
11,
12,
13]. As one of the most important applications of satellite images, CD plays a key role not only in finding change objects, but also in providing further insight into the process of the evolution of the land surface. Recently, due to the global rapid urbanization process, CD is much more important because it provides accurate information on changes in the land cover, for example, damages caused by earthquakes and flooding, the extents of urban expansion, and the areas of forest fires [
14,
15,
16,
17,
18,
19]. Due to the rich spatial characteristics of high-resolution remote sensing images, there are obvious differences between different surface objects and certain spatial heterogeneity within the same object. This hinders applying conventional change detection or semantic detection methods, such as the difference method (DI), log ratio method, and change vector analysis method (CVA), to the change detection of high-resolution remote sensing images. Therefore, accurate and robust change detection or semantic detection approaches are still needed to meet these application requirements in order to obtain a better and deeper understanding of the change in land cover.
According to the gray value, used only as statistical information, the change detection results obtained are often incomplete, and there are several spurious changed areas. A variety of transform-based change detection algorithms were proposed, such as iterative slow feature analysis (ISFA) [
20], iteratively reweighted multivariate alteration detection (IRMAD) [
21], and principal component analysis (PCA) [
22]. The errors caused by the illumination conditions and radiation differences prove the limitation of utilizing spectral information alone. In contrast, texture and structural features are more stable and are not affected by spectral differences. Therefore, the idea of merging multiple features for change detection is widely adopted.
Spectral, texture, structural features, and other changes are widely used in existing studies. Based on spectral characteristics, such as the spectral correlation mapper (SCM) [
23], the spectral gradient difference (SGD) [
23], the Kullback–Leiber divergence [
24], and the neighborhood correlation image (NCI) are used for the change detection of remote sensing images. Based on texture features, for instance, the Markov random field (MRF) texture [
25], grey level co-occurrence matrix (GLCM) [
26,
27], and wavelet based textural features [
28] are used for the change detection and object extraction of remote sensing images with high spatial resolution. Based on structural features, such as extended morphological profiles (EMPs) [
29], the rolling guide filter (RGF), histogram of the orientated gradient (HOG) [
30], and channel characteristics of orientated gradients (CFOGs) [
30], morphological attribute profiles (APs) [
31] are used to detect the land use and land cover. In addition, other change information, such as the morphological building index (MBI) [
32,
33], the normalized difference vegetation index (NDVI) [
34], and the modified normalized difference water index (MNDWI) [
35] can supplement the shortcomings of the change results with the help of image features to optimize the final detection results. In addition, deep learning is receiving much attention in different computer vision research areas, including the analysis of remote sensing images change detection [
36,
37,
38,
39]. Deep features of pixels or objects are extracted through deep learning methods, such as the neural network of spatial–temporal attention [
14], the transformer-based model [
40], and fully convolutional two-stream architecture [
41]. Meanwhile, to better aggregate contextual and detailed information from remote sensing images, some researchers introduced feature fusion networks for change detection [
41,
42,
43]. It should be noted that deep learning-based methods require a certain number of labelled training samples. Unfortunately, there are often not enough training data that represent the real change information of land cover objects [
39]. In summary, the ability to fuse multi-information features for change detection in order to obtain reliable land cover change results is very necessary for high-resolution remote sensing images.
It is noted that some of the algorithms were developed for medium or coarse spatial resolution multispectral images. The abundance of spectral features in the multispectral images makes it much easier to implement these methods to detect land cover changes. Meanwhile, with the availability of higher resolution or very high-resolution images, the requirement of developing CD algorithms for HR images becomes much more pressing. As noticed, the pixel-based CD algorithms developed for the medium spatial resolution multispectral images are not fully appropriate for HR images due to the heterogeneity of a pixel-based semantic image, meaning it does not account for the spatial context of an image [
22]. Two kinds of strategies are used to detect changes in HR images. The first strategy is to extract as many features as possible from multiple scale images in order to compensate for the scarcity of spectral features, so that the CD algorithms developed on the multispectral features can be used at pixel-level CD in HR images [
44,
45]. The second strategy is to develop object-based CD algorithms [
46,
47] by segmenting an HR image into many non-overlapping objects. More robust change detection results are obtained by generating and processing superpixels for optical and SAR images [
48,
49,
50,
51,
52]. In addition, the organic combination of the above two strategies provide a new idea for high-resolution image change detection.
Compared with pixel-level methods, object-level change detection approaches can effectively integrate change information from remote sensing images and avoid the influence of salt and pepper noise. However, the detection accuracy depends on the quality of the segmentation results [
53], so it is worth pondering how to choose the optimal segmentation scale. Moreover, compared to the cumbersome process of the direct object comparison method and the object classification post-comparison method, the idea of directly combining the segmentation result with the initial detection results, for example, the Dempster–Shafer fusion theory [
23,
54,
55], weighted Dempster–Shafer fusion theory [
22], and majority voting fusion [
29,
56,
57] can greatly save time and efficiency. As reported in many references, the effectiveness of providing accurate results is different for different types of CD approaches, and the ensemble idea is considered as a key solution to reach a high CD accuracy. Du et al. [
5] discussed that the change detection effects of different fusion strategies, i.e., the feature-level fusion, the decision-level fusion, and the improved CD results could be achieved when compared to the CD results of a single approach. Much more effort was made in this direction to find an improved fusion strategy [
57,
58,
59,
60]. In high-resolution remote sensing images, the spectral characteristics of ground objects can reflect the rich information on the categories and attributes of objects, while the spatial features can help identify buildings and roads. They complement each other and jointly reveal the rich information on land cover contained in the HR remote sensing images [
61]. Furthermore, using other change information to optimize and supplement the detection results based on image features is also a new idea for improving the accuracy of change detection. The use of multi-information and decision fusion strategies is verified to be helpful in obtaining accurate change detection results. The object-oriented method can overcome the uncertainty of the ground targets and further improve the accuracy of change detection [
38,
62].
Inspired by such research, in this paper we propose an object-oriented change detection algorithm to make a comprehensive application of various forms of information, to convert from a single detection method to a multi-method fusion, and to convert from the pixel level to the object level by decision fusion. Three main characteristics can be found in the proposed algorithm. First, the co-saliency change map of bi-temporal remote sensing images not only considers the contrast cues, spatial cues, and correlation cues, but also supplements the insufficiency of image features. Second, unlike other traditional methods that apply only a single feature, spectral–spatial–saliency change information is utilized comprehensively to overcome the shortcomings of a single factor. Third, in the proposed approach, the combination of feature-level and decision-level fusion is used. The most important contribution of the suggested framework lies in constructing a new object-based configuration based on spectral–spatial–saliency change information and fusion using the fuzzy integral decision theory, which plays a key role in the transition from pixel-level detection to object-level recognition. It should be noted that the initial pixel-based change results and object-based segmentations can be organically fused according to the fuzzy integral strategy, which can determine the change probability of land objects regardless of interference factors to achieve reliable detection results.
The remainder of this paper is organized as follows:
Section 2 presents the proposed change detection approach.
Section 3 shows the experimental datasets and configuration. The experimental results are described in
Section 4. A detailed discussion is addressed in
Section 5 and the conclusion is drawn in
Section 6.
5. Discussion
The results of the accuracy evaluation show that the overall accuracy of the proposed method was above 95% and the kappa coefficient and the F1 score were the highest in the three datasets. Furthermore, the accuracy evaluation results were also consistent with the visual interpretation analysis. Through the change detection experiments of three datasets of different sensors and different resolutions, it is observed that the proposed algorithm can effectively integrate the advantages of multiple features, and retain more accurate land cover change information. To accelerate the application and robustness of the proposed framework in practical problems, this section discusses the major achievements of this research.
First, scholars noticed and applied the idea of employing multiple pieces of information to improve the accuracy of change detection in the past ten years. Three fusion levels, i.e., data-level, feature-level, and decision-level, were discussed in some literature. However, there was no deterministic strategy to find the most appropriate method to implement the change detection process. Du et al. [
5] found that the fusion of feature levels and decision-level fusion led to an increase in overall accuracy, which is consistent with our research. In addition, feature-level fusion can effectively reduce omission errors, and decision-level fusion is good at restraining commission errors [
65]. Different fusion strategies are still necessary to find the appropriate algorithm for the detection of heterogeneous situations, such as building extraction during the urban expansion process, and to improve the accuracy of the change detection results.
In the first experiment, there were differences in the change detection results of different types of regions, such as wild, rural, and urban. It is observed that the change detection results in rural areas are closest to the reference change map, with the highest overall accuracy and the lowest false detection rate. In particular, with a 1.163% to 3.211% increase in overall accuracy and a 0.006 to 0.011 decrease in false alarm rate. False detections occurred on country trails in wild areas due to the difference in illumination conditions, resulting in an increased false alarm rate. However, other changes in wild regions can be detected and the boundary of ground objects was relatively complete, resulting in the missed detection rate being at its lowest, and the kappa coefficient, as well as the F1 score, being higher than 0.88. In urban areas, false detection and missed detection existed at the same time, and changes in several small buildings were ignored. Specifically, lower accuracy can be reached in urban area detection results, with a 2.048% to 3.211% decrease in overall accuracy and a 0.005 to 0.109 increase in missed detection rate and false alarm rate. In terms of high-resolution remote sensing images, the spectral characteristics of ground objects can reflect the rich information of land cover, the texture features can reflect the relationship between neighborhood pixels, and the attributes of structural features to identify buildings and roads. Therefore, the spatial feature and the spectral feature complement each other and jointly reveal the information on land cover from remote sensing images. In the third experiment, compared to the raw spectral feature, the addition of spatial information, such as texture and structure features, can eliminate the phenomenon of salt and pepper noise and obtain accurate changed land objects, resulting in an improvement in OA, the kappa coefficient, and the F1 score. Furthermore, to overcome the insufficiency of spectral–spatial feature extraction methods, the co-saliency detection algorithm that considers contrast cues, spatial cues, and correlation cues, plays an important role in optimizing the feature extraction results and improving the accuracy of the final detection results. The comprehensive use of multi-feature and multiple pieces of change information showed extraordinary advantages in the application of three remote sensing images with different resolutions and different changes.
As proven in the previous sections, the proposed method achieved the best change detection accuracies for high-resolution remote sensing images. However, the detection performance of remote sensing images with different resolutions should also be considered in this paper. The experimental results of the SPOT images (DS1) show that when relatively complete changed areas were detected, some false detection areas were also generated, which affected the accuracy of the change detection. Aerial images with a spatial resolution of 1.5 m (DS2) presented less noise than the others and had the best visual interpretation effect of change detection. However, when the resolution of the images was further improved, due to the influence of shadows and spectral differences, there were several omissions and errors in the detection results of DS3. Compared to the reference change map, the unchanged buildings were incorrectly detected and the internal compactness of the buildings was not high. On the contrary, the change information in DS1 and DS2 can be obtained correctly. It can be seen from the quantitative evaluation that the overall accuracy of DS2 was the highest, and the false detection rate was the lowest.
The scale of multi-scale segmentation has an important influence on the result of object-level recognition. Taking a small area in experimental dataset 1 as an example, three segmentation scales (58, 87, and 124) were used to perform a comparative analysis on the influence of different scale parameters on the recognition of changed objects, as shown in
Figure 7.
False alarms were detected for small segmentation scales, that is, non-changed ground objects were wrongly identified as changed objects, as shown in the green box in
Figure 7. The reason is that a segmentation scale that is too small leads to fragmentation of ground object segmentation. However, due to the high false alarm rate of pixel-level change detection in high-resolution remote sensing images and the high proportion of changed pixels in the segmented objects, the use of segmented objects to screen changed land cover objects is ineffective. When comparing the performance of different methods, a segmentation scale that is too large can easily lead to a missed detection, that is, the changed ground objects were not correctly identified, as shown in the red box in
Figure 7. The main reason is that the large segmentation scale leads to an over-segmentation of the surface objects, so those with a relatively small area and similar spectral characteristics to the neighboring ground objects are merged into the adjacent objects, reducing the pixel proportion of the sub-target level changed objects. The proposed approach takes advantage of the optimal scale estimation strategy to select the appropriate segmentation parameter, which provides a good basis for the final results of object-level change detection.
Regarding the change detection post-processing, the proposed fusion procedure provides a new idea; that is, multi-scale segmentation of the first principal component superimposed image is selected to combine the initial pixel-level change information and generate the final object-oriented change detection map. To verify the advantage of the proposed fusion procedure, morphology post-processing was carried out on the images fused with multiple pieces of change information in the second experiment and the images only using spectral features in the third experiment. In the morphological processing, the opening and closing convolution kernels are set to 7 × 7 pixels and 5 × 5 pixels.
In general, post-processing of change detection eliminates or reduces the interference of “noise detection” by utilizing relevant knowledge of mathematical morphology. From the basic erosion and dilate tool of mathematical morphology,
Figure 8 illustrates that this type of post-processing method will destroy the boundary of the actual ground object while removing the noise. To optimize the effect of pixel-based change detection results, the fuzzy integral strategy, under the restriction of multi-scale segmentation, was then applied. As shown in
Figure 8, based on the advantages of multiple methods of extracting initial information, the proposed decision fusion procedure can remove the interference of “salt and pepper noise” and maintain the internal and boundary integrity of the actual ground objects. Furthermore, under the condition of morphological post-processing, the effect of multi-change information fusion was better than that of spectral features alone, which also proved the advantages of spectral–spatial–saliency change information fusion.
The quantitative evaluation results in
Table 5 were consistent with the visual interpretation in
Figure 8. Specifically, the detection accuracies of the proposed method achieved the best accuracies in terms of MR, FAR, OA, kappa coefficient, and F1 score. It can be seen from
Table 5 that compared to the morphology operation methods, there is a 1.933 % to 9.095% increase in overall accuracy and a 0.008 to 0.185 decrease in the false alarm rate and missed detection rate, further supporting the effectiveness and feasibility of the proposed post-processing framework. In summary, in the actual change detection application, pixel-based and object-based change detection processes can be organically combined according to different detection purposes. Therefore, the final change detection results not only correspond to the meaningful geographic entities but also effectively integrate the advantages of both strategies to obtain the best detection accuracy.
The time complexity of the proposed method was also investigated.
Table 6 shows the processing time of different components of the proposed framework. It can be observed that the acquisition of spatial change information demanded more time, as the spatial feature sets had to be constructed and the optimal features selected. Furthermore, the multi-scale segmentation processing time increased due to the determination of the optimal segmentation scale. DS3 required the longest processing time (
Table 6), which may also be related to the larger image size (1024 × 1024 pixels).
The main contributions of the proposed framework are as follows: First of all, it should be noted that the comprehensive use of multiple pieces of change information can overcome the uncertainty of any single method. Unlike other traditional methods that use the raw spectral feature alone, spectral–spatial–saliency change information is employed comprehensively in this paper. The co-saliency detection can supplement the insufficiency of image features, and the advantages of different change maps are integrated to enhance the accuracy of change detection. Second, in the process of extracting spectral feature changes, the idea of integrating multiple spectral change detection methods (IRMAD, ISFA, and PCA) is adopted to overcome the limitation of a single operator and the influence of false alarms, such as salt and pepper noise, as well as obtain the optimal spectral difference information. Third, the combination strategies of both feature-level and decision-level fusion are utilized and verified in this article. It is worth noting that the fuzzy integral decision theory, which can determine the change probability of land objects by integrating the advantages of initial change results, improves the change detection accuracy.
6. Conclusions
Generally, a large amount of salt and pepper noise and the low accuracy of the detection of artificial objects frequently appear in methods based on a single spectral difference. In this paper, an object-level change detection approach was proposed that combines spectral–spatial–saliency change information and the fuzzy integral decision fusion algorithm. By combining three independent change results with the decision analysis strategy, real land cover change information was obtained. The proposed approach not only overcame the salt and pepper noise caused by illumination conditions or radiation differences, but also acquired the whole change object with a distinguishable boundary. The results of the three experiments showed that the proposed method could effectively obtain the changed objects. The overall accuracy of the proposed method was greater than 95%, the false alarm rate was lower than 0.016, and the kappa coefficient, as well as the F1 score, were higher than 0.78 in the three datasets. In addition, the detection accuracy of the proposed method improved significantly compared to other state-of-the-art methods.
The proposed method has the following findings: (1) The fusion of three spectral change detection results can overcome the influence of speckle noise and obtain optimal spectral difference information. (2) Spectral characteristics can reflect the rich land cover information, spatial features can display the domain and spatial relationship between pixels, while co-saliency detection considers contrast, spatial, and correlation information. The joint application of multiple pieces of change information can take advantage of the complementary features, which is useful for more robust change results and improves detection accuracy. (3) The fuzzy integral decision fusion strategy integrates the initial pixel-level results and determines the change probability of objects under the restriction of the multi-scale segmentation, which plays a key role in generating the final results.
However, regarding the change detection effect on buildings, the proposed framework was not satisfactory compared to other land cover objects. The limitation of the proposed method is the selection of optimized parameters, which will have some influence on the final change detection results. In other respects, the time complexity of the proposed method should be taken into account since the method is composed of different components. Therefore, how to improve the applicability of the proposed algorithm for buildings change detection and effectively select optimal spatial features under appropriate running time for subsequent experiments remains a research topic that should be focused in the future.