1. Introduction
The Qinghai–Tibet Plateau (QTP), referred to as the ‘water tower of Asia’, is the birthplace of many major rivers, such as the Yangtze River and the Yellow River. It is about 2.6 million km
2 in area, most of which lies at an elevation of more than 4 km above sea level [
1]. The unique alpine terrain of the QTP blocked and raised the warm and humid South Asian monsoon [
2], forming rich water resource reserves in the region. The QTP contains approximately 1400 lakes of an area larger than 1 km
2, with a total area of about 50,000 km
2. The area of lakes on the QTP affected by runoff and precipitation reaches its maximum during a water-rich period around September. It then enters a plateau after October until the subtidal period decreases to a minimum around April of the following year [
3,
4,
5,
6]. Studies show that the QTP is one of the most sensitive regions to global climate change. The lakes, located in endorheic basins, are less affected by human activities and are outposts of the cryosphere and climate change [
7,
8]. Accurately depicting the lake area of the whole QTP is one of the concerns of the Intergovernmental Panel on Climate Change (IPCC,
https://www.ipcc.ch/srocc/, accessed on 20 February 2022), which helps to enhance the understanding of climate change under the background of global warming through the temporal and spatial changes of lake waterbodies in the plateau. It provides scientific support for protecting and developing ecological resources in the QTP lake area and adopting climate change countermeasures. However, the traditional method of extracting the lake area by directly measuring water level and shoreline has a high cost, poor timeliness, and spatial accessibility. Due to the way monitoring data are acquired and stored varies from site to site, it is challenging to meet the requirements of current global change research on the integrity of hydrological monitoring data [
9,
10]. Because of its advantages of comprehensive spatial coverage, strong cyclicality, and low cost, remote sensing became an important means of lake waterbody area extraction. It has also extensively promoted the dynamic monitoring of lake areas on the QTP.
From the data perspective, satellite remote sensing provides many multi-source data with long-term series and high-spatial-resolution. The data sources commonly used in the research mainly include MODIS [
11,
12], Landsat [
13,
14,
15], SPOT [
16,
17], ALOSE [
18], ASTER [
19,
20], HJ-1A/1B [
21,
22], WorldView [
23,
24], QuickBird [
25], IKONOS [
26], GaoFen [
27,
28], ZY-3 [
29,
30], SAR [
31], and hyperspectral data [
32,
33]. To better monitor global climate change, studies usually require an image covering the entire QTP at least once a year, with an image resolution preferably no greater than 32 m (capable of identifying lakes with an area of 1 km
2). In contrast, the image acquisition time is best in October, when the lakes are more stable. The Landsat series provided by the USG (USGS,
http://glovis.usgs.gov/, accessed on 5 April 2022), due to its relatively high spatial resolution, long time series, free access, and rich data, became the first choice for lake area extraction of the QTP [
34]. Landsat-9 data, accessible by the USGS on 10 February 2022, are identical in band setting to the Landsat 8 sensors, and include higher radiometric resolution (14-bit quantization increased from 12 bits for Landsat-8). Since Landsat-9 is matched with Landsat-8 in track design, if the two data are combined for analysis, the temporal resolution could be improved from 16 days to 8 days [
35]. For large-scale research, the acquisition of images with good consistency and target recognition ability in the time window is pretty important. Landsat-9 data and Landsat-8 data, which are proved to be able to effectively identify waterbodies, have good complementarity in the revisit cycle and consistency in parameter settings. Therefore, Landsat-9 has great application potential in the field of large-scale lake mapping. However, at present, there is no comprehensive research on water area extraction in QTP based on Landsat-9 data. The research on water extraction based on Landsat-9 data is of great significance to the protection and development of plateau ecological resources and the research on global climate change.
Many scholars proposed various lake water feature extraction models for different application scenarios with multiple data, mainly divided into the single-band threshold method, spectral relation method, water index method, machine learning, and multiple models assisted synthesis method [
36]. The single-band threshold method mainly uses the difference in reflectance between waterbodies and other ground objects in the infrared band. It selects water features with thresholds, and is widely used when early remote sensing data are scarce. For example, Frazier et al. [
37] used the threshold method to extract the waterbody of Wagga Lake with the TM4, TM5, and TM7 bands, and the results show that the extraction method with TM5 band was better. The spectral relation method mainly extracts the waterbodies by analyzing the spectral curves of the characteristic band on multiple ground objects and constructing logical classification rules. For example, Du et al. [
38] and Yang et al. [
39] extracted the waterbody information by building a logical relation based on the characteristics of the gray value of the waterbody and the difference in reflectivity reduction between the waterbody and shadow in the blue-green band, respectively. The spectral relation method can better distinguish between waterbodies and mountain shadows, but the anti-noise interference ability is poor, and it is susceptible to the influence of non-water noise, such as vegetation and buildings, which is the same as the single-band threshold method [
4,
40].
The water index method uses the bands with water reflection differences to construct the ratio operation, which suppresses the vegetation information, weakens the influence of soil, buildings, and shadows, and highlights the water information. Since Mcfeeters [
41] proposed the normalized difference water index (NDWI), many scholars proposed a variety of improved water indexes according to different background features and water characteristics, such as the modified normalized difference water index (MNDWI) [
42], enhanced water index (EWI) [
43], automated water extraction index (AWEI) [
14], shadow water index (SWI) [
44], and land surface water index (LSWI) [
45]. They drove the rapid development of water information extraction research [
46,
47,
48]. Although the water index methods are simple to operate and can eliminate shadows cast from mountains, buildings, and vegetation, recognition errors of small waterbodies and the boundaries between water and land are large. The machine learning methods regard waterbodies as a category, using specific classification rules to classify them to obtain water features. Machine learning methods, such as minimum distance (MD) [
49], decision tree (DT) [
50], support vector machine (SVM) [
51], neural network (NN) [
52], random forest (RF) [
53], and deep learning (DL) [
54], are widely used. Machine learning algorithms realize the effective use of the spatial and texture information of high-spatial-resolution images and have a good effect on the extraction of small waterbodies, but the process is relatively complex. For example, Sui et al. [
55] integrated three modules of initial extraction of water, iterative segmentation, and change detection with the help of GIS technology to realize the automation of the water extraction process; and Qiao et al. [
56] proposed an adaptive extraction method of “whole-local” progressive spatial scale transformation based on NDWI and combined with the spectral feature fitting method and iterative algorithm to accurately extract the lake range. The multiple model-assisted synthesis method is used to synthesize a variety of models and methods to solve the problem of water extraction in large-scale and complex background scenes, but the process is complex, and the generalization ability of the model is poor. By expanding and analyzing the principles, advantages, and disadvantages of different water extraction methods, it is not difficult to see that the water extraction algorithm has no absolute advantages and disadvantages, and the consistency with the data source and applications will also affect the accuracy of the algorithm. The computational volume of large-range water extraction is quite large, so most previous studies focused on the threshold extraction algorithms with lower complexity, while relatively few studies focused on the machine learning algorithms with higher complexity. The previous study shows that the traditional NDWI method performed the best among the threshold extraction algorithms in water extraction of the QTP [
6], but the relevant conclusions are not clear for Landsat-9 imagery, and the study did not include machine learning algorithms. Therefore, it is of certain practical significance to carry out the comparative study of various types of algorithms for plateau lake area extraction based on Landsat-9 images, which can provide a useful reference for the follow-up related research.
To sum up, the research on the remote sensing extraction of spatial distribution information of lakes in the QTP has crucial scientific significance for coping with global climate change. Although water extraction algorithms based on different resolutions and types of remote sensing data combined with other principles are widely used in water information extraction, there is not a comparative study of Plateau Lake area extraction algorithms based on Landsat-9 images. Based on cloud technology, Google Earth Engine (
https://developers.google.cn/earth-engine, accessed on 1 April 2022) stores all kinds of data, with a total amount of more than 5 Pt on the cloud, and allows users to call the computing platform composed of tens of millions of computers through the web client to visually retrieve, process, analyze, and download all kinds of data online. Compared with the traditional data processing method based on local computers, the emergence of GEE dramatically improved the efficiency of big data processing research and reduced the threshold of large-scale analysis [
57,
58,
59]. Therefore, this paper uses GEE to conduct a comparative study of Landsat-9 lake area extraction in combination with ten widely used water extraction algorithms (including a single-band threshold extraction algorithm, two spectral correlation algorithms, four water index algorithms, and two machine learning algorithms). Through comparative research, this study quantitatively evaluates the consistency between different algorithms and Landsat-9 data in the plateau lake extraction scene to determine the algorithm for large-scale plateau lake area extraction suitable for Landsat-9, and provides some references and suggestions for further research in subsequent related fields.
This paper is structured as follows:
Section 2 describes the study areas and the data;
Section 3 provides a detailed and complete description of the experimental procedures and methods;
Section 4 presents and analyzes the waterbody extraction results;
Section 5 highlights the main findings and the implications of this study, and is followed by our conclusion.
3. Methods
The step-wise progression of waterbody area extraction is illustrated in
Figure 2 and encompasses four steps: data collection and processing, classification, and evaluation.
Step 1: Landsat-9 images, lake data, river data, and landcover data were collected as input data.
Step 2: The Landsat-9 images were processed by removing clouds and shadows and mosaicked into a de-cloud map. The remaining data were used to generate lake samples, landcover samples, and non-lake water regions, and finally formed the pre-processing data results.
Step 3: The model parameters and the thresholds were adjusted according to the lake and landcover samples. Based on specific model parameters and points combined with multiple models, waterbody extraction in the QTP study area was realized. Further, the waterbody extraction results were excluded by using non-lake waterbody data, and the lake waterbody extraction results in the study area were obtained.
Step 4: Finally, we evaluated the performance of algorithms with accuracy metrics based on the extraction information derived from step 3.
3.1. Data Preprocessing
Data preprocessing includes image de-cloud mosaic, classification sample construction, and non-lake water area extraction. To obtain high-quality cloud-free images covering the study area, cloud removal becomes a significant preprocessing step and significantly impacts waterbody extraction results. Landsat-9 images provided by GEE are organized in a data structure called “Imagecollection”. Unlike traditional remote sensing images, which emphasize the data organization structure of the scene, “Imagecollection” emphasizes the concept of the real spatial location corresponding to the pixel. L9C2 contains all downloadable Landsat-9 images and forms a “stack”-like structure at the spatial position corresponding to the pixel. Users can use the functions provided by GEE to filter and sort the “stack” representing a specific spatial location based on time, pixel value, and other attributes. The spatial position of the cloud and the sun glint changes with time and is shown as an abnormally high value of pixels on the image. Therefore, we use the function to sort the values of all revisited images in the time range pixel by pixel, and select the median value as the pixel value of the point to build an image without the influence of cloud and sun glint. This study used the “CLOUD_COVER” attribute and the “QA_PIXEL” band included in the L9C2_SR to detect clouds and cloud shadows as much as possible and mask them [
65], and then used functions such as median() and min() to mosaic the masked image in order to synthesize a cloud-free image. Due to L9C2_SR having limited images in the time interval from 1 January 2022 to 30 April 2022, the image cloud coverage in January and February is large. Therefore, directly setting the attributes of “CLOUD_COVER”, “QA_PIXEL”, or “IMAGE_QUALITY_OLI”, simply using functions such as median() and mean() to mosaic cloud-free images, will lead to poor cloud shadow removal results or the mosaic image no being able to cover the study area. Through the analysis of dataset images and many comparative experiments, we found that by using “QA_PIXEL “ to obtain the pixels affected by clouds and cloud shadows detected by the Cfmask algorithm [
66] and masking them without the “CLOUD_COVER” attribute setting, and we can then obtain mosaic images with good cloud removal effect and complete coverage of the study area through the following piecewise mosaic steps: adopting the minimum value in the area with revisit times less than 12; taking the quartile in the area where the number of revisits is greater than or equal to 12 and less than 30; and taking the median value of the area with the number of revisits greater than or equal to 26 for image mosaic. The results are shown in
Figure 3.
Referring to the algorithm proposed by Deng et al. [
57], the ESA_WorldCover (global land cover dataset) and the TPLA_V3 QTP (lake dataset) were used for hierarchical automatic random sampling [
58]. The details of sample construction are shown in
Table 1. ESA_Worldcover divides the land cover into 11 categories. After removing the mangroves that do not exist in the Qinghai–Tibet Plateau, combined with TPLA_ V3 Lake data, stratified random samples were generated. According to Landsat-9 images of Google Earth cloud removal and Sentinel-2 images of the same period with higher spatial resolution, the generated random samples were manually revised to eliminate wrong samples. The samples were randomly divided into a test set, training set, and validation set according to the ratio of 3:4:3. The test set was used for the search and optimization of model hyper-parameters, the training set was used for model training, and the validation set was used for model accuracy verification. In contrast, for non-machine learning models, the test set and training set were used to determine the threshold, and the validation set was used for accuracy evaluation.
A previous study combined the reservoir, dam database, river database, and other non-lake waterbodies to extract the lake waterbody area based on the extraction results of waterbodies [
67]. Therefore, this study combined ESA_WorldCover, TPLA_V3, and HydroRIVERS data to establish a non-lake waterbody dataset for post-processing waterbody extraction. Non-lake permanent waterbodies were obtained by a geometric union of waterbodies between TPLA_V3 and ESA_WorldCover after erasing the intersection waterbodies. The buffer regions were built according to the inter-annual variations of the lake [
6], and the buffer regions were used to erase the HydroRIVERS data to obtain the river dataset. Finally, we formed the non-lake waterbody dataset for post-processing.
3.2. Waterbody Extraction Algorithms
Waterbody extraction algorithms, such as the single-band threshold, spectral relation, water index, and machine learning are widely used, especially the water index method and machine learning [
50]. Based on previous research [
9,
68], we selected the most commonly used four water index models, two machine learning models, two spectral relationship models, and a single-band threshold model as the target algorithm to explore the extraction results of the waterbody and lake waterbody in the QTP based on Landsat-9 data and to determine the best performance of waterbody area extraction algorithms under the background of the large-scale study region. The detailed information of each model is shown in
Table 2. Where
represents the band
, which refers to
Section 2.2.
and
are the experience thresholds.
,
,
,
, and
represent the feature sets (including 8 bands of Landsat-9 image and various water indexes.), regularization parameter (known also as penalty factor), kernel width, the number of decision trees, and the number of input features used to split the nodes, respectively.
3.3. Model Parameters and Thresholds
Previous studies show that the classification accuracy of the machine learning models is mainly dependent on model hyper-parameters [
69]. To effectively adjust the hyper-parameters and optimize the model’s classification accuracy, drawing on the ideas proposed by Porwal et al. [
70], this study used the sklearn package, the open-source machine learning toolkit. First, the RandomizedSearchCV searched for an optimal solution in the large-scale range and then used GridSearchCV to search for a certain floating fine-tuning of the hyper-parameters within the small-scale range. Further, the 5-fold cross-validation method was used to verify the classification performance of the test set on the model, and the average accuracy was regarded as the estimated value for fine-tuning the parameter optimization.
In
Table 2 of
Section 3.2, the SVM model uses radial basis function (RBF) kernel and needs to search and optimize the parameters of
and
. The random forests (RF) model uses classification and regression trees (CART) as the basic algorithm, and
parameters must be adjusted and optimized.
Appropriate thresholds are critical for water extraction models based on threshold segmentation. The selection of points has a certain randomness, which varies with the type of index, the identification scenario, and the subpixel water/non-water components [
71]. The OTSU method [
72] and its improved algorithms [
73,
74,
75] are commonly used for water index threshold automatic extraction. They use waterbodies and near-water land to present a bimodal distribution in the frequency domain of the index to set the optimal threshold in order to split the image into “foreground” and “background” to achieve the classification of waterbodies and non-waterbodies [
76]. However, for the extraction of waterbodies and lake waterbodies in the QTP region, there are two problems with this threshold algorithm. First, there is a huge difference in the proportion of waterbodies and non-waterbodies in the region, which makes it impossible to effectively classify waterbodies in the frequency domain when they are covered by other landcover classes [
73]. Second, the area of the study region exceeds 3 million km
2, and using such threshold segmentation methods requires high computing power. Therefore, this study uses lake samples to adjust the threshold manually.
3.4. Evaluation Metrics
In this study, the extraction accuracy of water and lakes is verified by the confusion matrix with a verification set, and the performance of the above water extraction model is quantitatively evaluated through four evaluation metrics, including overall accuracy, kappa, map-level accuracy, and user’s accuracy. Among them, the overall accuracy reflects the overall effect of the algorithm. Kappa indicates the consistency between the ground truth data and the predicted value. Map-level accuracy represents the probability that the validation sample is correctly classified. Finally, the user’s accuracy means the ratio of the inspection points falling on Category
on the classification diagram to be correctly classified as Category
[
74]. In addition, the error analysis formula is introduced to evaluate the accuracy of area extraction.
where
is the error result.
represents the ground truth area and
is the area extracted by the algorithm.
4. Results and Discussion
4.1. Parameters and Thresholds Selection Results
4.1.1. Optimization Parameters in Machine Learning Methods
For the machine learning methods, we evaluated the importance of the sample features involved in classification. We reduced the redundancy between sample features to improve the efficiency and accuracy of the algorithms. The Gini index, also known as Gini impure, is an indicator to measure the probability of random samples being misclassified in the sample set. The smaller the Gini index, the smaller the probability that the selected samples in the set will be misclassified. When all samples in the set are of one class, the Gini index is 0. Each tree in the random forest algorithm is a CART decision tree. When the tree chooses to use a feature to split down, it needs to calculate the Gini index to measure the purity of the sample set before and after the split. The smaller the Gini index of the left and right branches after splitting, the higher the accuracy of separating using this feature. Suppose that node m on a tree uses a feature to split down, the Gini index before splitting is
, and the Gini index of the left and right branches after splitting is
and
, respectively. On this decision tree, this feature is split
times, and
trees in the whole forest use this feature, so the importance of this feature in the entire forest is
. The importance score of this feature is obtained by dividing the calculated result of this feature’s significance and the sum of the importance of all features. In the sample construction stage, we used the test dataset constructed in
Section 3.1 to construct the waterbody and non-waterbody samples. We referred to [
77] to evaluate the importance of eight bands and seven water indexes that participated in the calculation based on the Gini index. According to the importance score, we successfully put the features into the classifier for cross-accuracy verification to determine the optimal feature parameters for water extraction.
The evaluation and selection results of characteristic parameters are shown in
Figure 4. Among the features represented by the horizontal axis of
Figure 4, the importance score of features gradually decreases from left to right. The highest and lowest important score features are SR_B5 and AEWIns, respectively. The low importance scores of AEWIs, AEWIns, and SR_B6 may be due to these features entering the classifier late and having a significant correlation with the previous parts, which leads to the information being judged as useless information [
77]. Further, according to the overall accuracy of cross-validation, it can be seen that when the total number of features reaches six, the overall accuracy reaches 93.80%, and the remaining features make a less cumulative contribution to the improvement of accuracy. Based on the importance scores and overall accuracy results, the feature sets for water extraction methods (RF and SVM classifiers) are defined as {SR_B5, NDWI, SR_B7, mNDWI, mAEWI, and SR_B6}.
To improve the accuracy of machine learning models, we optimized the critical parameters of SVM and RF algorithms based on the above-constructed feature sets. The key parameters of the SVM algorithm include the
regularization parameter, kernel function types, and their related kernel parameters. Standard kernels with SVM classifiers include the polynomial kernel, radial basis function (RBF) kernel, linear kernel, and more. Previous studies show that the RBF kernel has better performance for image recognition with prominent non-linear characteristics [
78]. Therefore, we selected the RBF kernel as the kernel function and optimized the parameters of
and
.
is the relaxation vector parameter in the SVM classifier. When the
value is small, the interface is smooth.
On the contrary, when it becomes large, the complexity of the model will increase. The kernel parameter
defines the magnitude of the effect of a training sample, which is the reciprocal of the width of the RBF kernel. When the
value is more significant, the influence on the radius is more minor, and overfitting is easy. On the contrary, underfitting easily occurs [
79]. The effective range of the
value and
value is 10
−8~10
8, but in practical application, the possible optimal values are generally in the range of (0.1, 100) and (0.0001, 10) [
80]. Therefore, we used the RandomizedSearchCV function to search for the optimal parameters
and
values in the interval of (0.1, 100) and (0.0001, 10) with the multiple of 10 as the step size and determined that the optimal values appear around 10~100 and 0.01~0.1, respectively. Then, we used the GridSearchCV function to realize grid search in steps 10 and 0.01 between (10, 100) and (0.01, 0.1).
Figure 5a shows that the overall accuracy reaches the maximum value of 95.79% when
and
. Therefore, the experiment will train the SVM classifier with these parameters.
The random forest (RF) classifier is widely used as an ensemble learning classifier in remote sensing information extraction. Randomness in the RF model is mainly reflected in the random selection of datasets and features used in each tree. The corresponding parameters are the number of decision trees (
) and the maximum number of features (
) to be selected for the node split when growing the trees [
81]. When
is larger, more decision trees are involved, and the algorithm is more complex. The
parameter allows each tree to be trained to use only some features at random, reducing the overall operation and allowing each tree to focus on its chosen features. The research shows that a larger
and
will reduce the randomness and operation efficiency of the RF model and contribute less to the improvement of accuracy. Generally, the values of
and
will not exceed 1000 and 50, respectively [
81,
82]. As mentioned above, two parameters need to be set to produce the forest trees in this study. According to optimization methods of the parameters in SVM, we determine that the optimal values of
and
appear from 80 to 100 and from 5 to 15, respectively. We performed a grid search on the parameters in steps 2 and 1, respectively.
Figure 5b shows that the overall accuracy reaches the maximum value of 94.97% when
and
, or when
and
. Considering that a larger
and
will increase the complexity of the RF model and reduce the generalization ability of the model,
and
are selected as RF parameters.
4.1.2. Selection of Waterbody Extraction Thresholds
We extracted the corresponding waterbody index information for the samples of the test set and training set constructed in
Section 3. We combined the sample labels and waterbody index information to generate a confusion matrix, quantitatively verify the accuracy of waterbody extraction with different thresholds, and explore the best segmentation threshold for other waterbody indexes to improve the accuracy of waterbody extraction and reduce the subjectivity and contingency of the artificial threshold. To divide the threshold, we rewrite the formula of the two spectral relationship algorithms into the equation form:
and
. Among the eight waterbody extraction models based on threshold segmentation, except for the single band threshold (SBT) extraction algorithm, which defines the area less than the threshold as waterbodies, the other models represent the area greater than the threshold as waterbodies. The variation in the waterbody extraction accuracy of each algorithm model with the thresholds is shown in
Figure 6.
In the overall accuracy of waterbody extraction results based on the threshold segmentation models, it can be found that except for the SBT model, the best overall accuracy of other threshold segmentation models for water extraction reached more than 70%. The NDWI threshold segmentation model has the highest overall accuracy for waterbody extraction at 83.62%, followed by mAEWI (82.02), AEWIs (76.89%), AEWIns (75.21%), mNDWI (74.90%), SR(74.51), and mSR (71.32%). The ranking of the overall accuracy of waterbody extraction results is highly consistent with the conclusions obtained in the previous study [
6]. The SBT model relies on the low reflection of water at the 1.5~1.6 μm spectrum to extract waterbody, but the 1.5~1.6 μm spectrum is also at the absorption peak of vegetation [
38]. As shown in
Figure 6a, it can be seen that the threshold selection of −0.0137 cannot effectively distinguish water from landcover class 1 (tree) and class 3 (grassland), so its overall accuracy is only 58.09%. In addition, when the threshold value is −0.06, the overall accuracy of the SBT extraction result reaches its highest, at 63.77%. However, there are many true negative (TN) samples in this waterbody extraction result, and all samples are classified as non-waterbodies, so the threshold segmentation results fail to reach the requirements of waterbody extraction.
Additionally, the optimal segmentation thresholds selected by the overall accuracy can effectively distinguish waterbodies from other landcover classes except for snow and ice, which can be mutually confirmed with some conclusions of reaches [
6,
83]. At the same time, it also shows that the threshold selection method, based on the overall accuracy, has high accuracy. By analyzing the selection of threshold values of each model, it can be seen that the thresholds of mAEWI, SR, mSR, AEWIs, and AEWIns models are close to zero, which are −0.0085, 0.0086, 0.0189, 0.0397, and 0.0233, respectively, with strong anti-interference, consistent with the principle of each algorithm design [
14,
38,
39,
67].
Finally, comparing the blue boxes in
Figure 6e,f, it can be seen that when the thresholds of NDWI and mNDWI models are close to −1 and 1, the overall accuracy is significantly distorted. This is because there are many abnormal values higher than one and some lower than −1 in the waterbody samples when using the surface reflection data for the division band operation. During the abnormal value processing, the values more significant than 1 and less than −1 are reclassified as 1 and −1, resulting in the accumulation of many waterbody samples at the thresholds of 1 and −1. A large number of outliers may be due to the influence of lake ice and the use of the min function in major areas during band cloud removal synthesis, which expands the shadow effect and increases abnormal low values.
4.2. Analysis of Waterbody Extraction Results
Based on the above optimization results of the machine learning model parameters, we used the training samples to train and classify the SVM classifier and RF model, reclassified the results into waterbodies and non-waterbodies, and calculated the confusion matrix on the validation set to evaluate the accuracy of waterbody extraction. Based on the optimal thresholds determined using the test set in 4.1.2, the whole study area was subjected to threshold segmented water extraction, and the extraction results of each algorithm were validated with accuracy using the validation set.
The accuracy validation results and the running time for various algorithms are shown in
Table 3. By analyzing the overall accuracy of different algorithms, it can be seen that the accuracy validation results are highly consistent with parameter optimization and threshold selection results. On the one hand, RF and SVM algorithms have the highest classification accuracy, achieving an overall accuracy of over 95%. This is significantly better than other threshold segmentation waterbody extraction algorithms, which are essentially linear in the spectral domain. At the same time, the RF and SVM with RBF kernel can realize the non-linear segmentation extraction in the spectral domain by using ensemble learning and non-linear kernels, which have more robust adaptability and generalization to the non-linear spectral domain features of different landcover classes. On the other hand, the order of the overall accuracy of each threshold segmentation waterbody extraction algorithm is consistent with the threshold selection results. Generally, it has an improvement of about 5%, and the overall accuracy of SBT model increased by about 10%. This is because in the threshold segmentation waterbody extraction algorithms, the landcover classes of ice and snow, moss and lichen, tree cover (as shown in
Figure 6), which are greatly confused with the waterbody threshold, account for a large proportion in the samples. Therefore, we selected 70% of the samples to calculate the overall accuracy when selecting thresholds. As a result, the sample imbalance is large, and the overall accuracy is low. The data set used for validation accounts for 30% of the samples, and the sample imbalance is reduced, so the overall accuracy is improved.
Secondly, analyzing the user’s accuracy and producer’s accuracy of different models, it can be seen that almost all models have relatively high producer accuracy and low user accuracy of waterbody extraction. In contrast, the user accuracy of non-waterbodies is high, and the producer accuracy is low. This shows that in large-scale regions of the QTP, the omission error of waterbody extraction is not high, and the commission error is mainly concentrated in the misclassification of a large number of non-waterbodies (ice, snow, vegetation, and shadows) into waterbodies. Comparing the kappa coefficients of different models, it can be seen that the prediction results of machine learning models are highly consistent with the actual results, indicating that the models are relatively stable. In the threshold segmentation waterbody extraction algorithms (except NDWI and mAEWI), the predicted results are less consistent with the actual results, indicating that the models are highly random and unstable.
Additionally, compared with other threshold segmentation waterbody extraction studies [
6,
14,
37,
38,
39,
41], it can be seen that the accuracy of threshold segmentation waterbody extraction models in this paper is relatively low. Comparing the experimental design and process of each study, we consider that there are two main reasons for this phenomenon. The first reason is the scale. Compared with other studies, the range of threshold segmentation waterbody extraction in this study exceeds 3 million km
2, is more extensive, and the background structure is more complex, so the accuracy of waterbody extraction is reduced. The second reason is the data pre-processing in this study. To reduce the cloud cover and improve the coverage of the Landsat-9 images in the study region, the method of segmented synthesis is used in this experiment by using the min function, which produces some low-value noise and reduces the data quality, thus affecting the accuracy of the threshold segmentation of waterbody extraction. It should be emphasized that the generation of low-value noise is balanced for various models. During waterbody extraction, the lake ice is in a state of melting, but not completely melting, making the transmission process of electromagnetic waves more complex and having a certain impact on the extraction of lakes. Although it will affect the accuracy of waterbody extraction, it does not affect the reliability of algorithm comparison results.
Finally, to better evaluate each algorithm’s efficiency, we conducted four experiments on each algorithm and recorded its running time. Comparing the running time of different algorithms, it can be seen that the running time of the threshold extraction algorithm is the shortest, all of which are 22 ± 3 min. RF is also faster and can control the running time within one and a half hours. SVM has the lowest efficiency, and the longest running time is more than three hours. The threshold extraction algorithm can be divided into three steps: water index calculation, threshold determination, and water extraction. Water index calculation and extraction are completed on the GEE platform, which can be completed in 2 to 5 min according to the network speed and the GEE’s computing power distribution. Threshold determination needs to import the sample into Excel and be calculated by a formula. The time is generally controlled at 19 min. Machine learning algorithms can also be divided into three steps: classification feature calculation, parameter adjustment, and water extraction. Similar to the threshold extraction algorithm, parameter adjustment must be calculated by importing samples into Python. The time required for RF and SVM is about 25 and 42 min, respectively. In comparison, the running time for classification feature calculation and water extraction is controlled at 36–66 min and 72–152 min, respectively, according to the network speed and computing power distribution. For large-scale research, such as lake extraction on the QTP, which has less strict requirements on running time, a machine learning algorithm with higher accuracy is a better choice.
To more intuitively show the effect and performance of each model for water extraction, this study selects Qinghai Lake Yamzho (the largest lake in the QTP), Yumco Lake (complex lake bay morphology), Yellow River (complex river morphology and background), and Brahmaputra River (complex river morphology and background) to compare the results of lake waterbody extraction. From left to right, the white oval areas on the imageries in
Figure 7 are A1, A2, A3, a4, A5, A6, A7, A8, and A9. Dark areas are evident in large lakes with more extensive water surfaces, and the following three reasons most likely explain the occurrence of these areas: first, shadowing is enhanced by the min () function used in cloud-free image synthesis; second, the reflectance in the deep water zones of large lakes is intrinsically low; and third, lake ice within the lake is unfused. The occurrence of black areas is also one of the important reasons for the low extraction accuracy in waterbodies. Comparing the marine areas from A1 to A5 prone to omission error, it can be seen that SVM and RF algorithms can identify and extract these waterbody areas well.
In contrast, the threshold segmentation waterbody extraction algorithm has more omission errors. However, NDVI and AEWIs perform relatively better as threshold segmentation waterbody extraction algorithms. There are small waterbodies and prominent shadows in the A6 area. Except for the NDVI and AEWIs methods, all algorithms can better identify the shades in this area, but MSR and mNDWI models perform poorly in identifying small waterbodies. There are apparent floodplain wetlands in A7, A8, and their upstream areas due to the lateral movement of rivers. Except for the SVM, RF, NDWI, and AEWIns algorithms, other algorithms have commission errors to recognize wetlands as waterbodies. Finally, the A9 area is an abnormal area caused by the image synthesis algorithm. Different algorithms can better resist these abnormal values, except the mAEWI, mNDWI, MSR, and SBT models. In conclusion, we found that the SVM, RF, and NDWI models can better recognize waterbodies in different lake waterbody regions.
4.3. Analysis of Lake Waterbody Area Extraction Results
The non-lake waterbody dataset produced the lake waterbody dataset, and the SRTM DEM had this research. The former was constructed in
Section 3.1 and was used to eliminate the non-lake waterbodies in the extraction results [
68], and the latter was used to correct the shadow influence [
44]. Referring to the classification of lake waterbody, the area distribution of lake waterbody in the QTP region the resolution of lake waterbody extraction in the previous study [
3], the lake waterbody area in this study is divided into 30–100 km
2 (type 1), 100–500 km
2 (type 2), and over 500 km
2 (type 3). As shown in
Table 4, we randomly selected ten lakes in each category to build a lake waterbody area validation set to quantitatively analyze the extraction results of the lake waterbody area to measure the accuracy of extracting the lake waterbody area of three types by different algorithms.
Figure 8 shows the error analysis results of lake waterbody extraction. Among all algorithms, the accuracy of the lake waterbody area extracted by the RF model is the highest (1.505%), followed by the SVM model (1.624%). In the threshold segmentation algorithm, the NDWI (3.501%) is the most accurate model, followed by AEWIs (6.789%). Comparing the area extraction accuracy of different types of lakes, we found that in almost all models, the error of the large lakes is the largest and that of the medium lakes is the smallest. The reasons for the significant lake errors are analyzed in combination with the identification results of Qinghai Lake (the largest lake in the QTP) in
Figure 7. The errors mostly appear in the dark areas within the lakes. During the experiment, we also found that the dark regions mainly appear in large lakes with larger surfaces and deeper depths. This is because deep water areas of large lakes are more prone to hyperreflective and dark spots, while larger spaces are also vulnerable to lake ice and shadowing, leading to the emergence of the dark regions. Therefore, the identification error is more significant for large lakes. In addition, the spatial resolution of L9C2_SR data is 30 m. When identifying small lakes, it is easy to generate commission errors at the mixed pixels of the waterway junction. Therefore, error analysis of small lakes is greatly influenced by error classification, resulting in large error fluctuations.
4.4. Analysis of Comparison with Landsat-8 Data
The Landsat-9 image attracted much attention in the field of plateau water and lake extraction because it has almost the same band setting as the widely recognized Landsat-8 image, and they have good consistency in theory. Therefore, they can complement each other and improve the time resolution of data. To verify the consistency and availability of Landsat-9 and Landsat-8 data, we selected an area of more than 24,000 square kilometers in the study area for comparative study.
The imaging time is controlled in May and June to reduce the impact of environmental transformation on data. The cloud removal and mosaic of data were carried out according to the above process, and the images of Landsat-8 and Landsat-9 were obtained, as shown in
Figure 9a,b. It can be seen that Landsat-8 data have higher cloud coverage in this area (as shown in the circled area in
Figure 9a) due to the differences in track settings and shooting time, and Landsat-9 data can complement it. After eliminating the missing and abnormal pixels, the reflectance and NDVI of Landsat-9 and Landsat-8 data were linearly regressed, and the results are shown in
Figure 9c,d. It can be seen that the linear regression coefficients of reflectance and NDVI of the two data tend to be 1, and R is also greater than 0.91, which shows that they have good consistency and can be used jointly for the extraction of water and lake areas.
Then, the RF and NDWI algorithms, which performed best in machine learning and threshold extraction algorithms, were selected for water extraction and accuracy verification. According to the above technical process, the accuracy of the two data is shown in
Table 5 below. Firstly, due to the smaller classification range and the reduced complexity of the environment, each algorithm’s recognition accuracy and kappa coefficient are improved compared to the extraction of the whole plateau. In addition, it can be seen that the maximum differences in the OA and kappa coefficient of the two images are 1.1% and 0.025, respectively. This shows that Landsat-9, similar to Landsat-8, can recognize plateau waterbodies well. At the same time, it is also proven that although the radiation resolution of Landsat-9 is improved to 14 bits, it has no apparent advantage in relatively simple tasks, such as waterbody recognition.
5. Conclusions
It is of great significance for global climate change to quickly and accurately obtain information on the changes of lakes in the QTP based on remote sensing technology, but there is a specific relationship between the performance of different lake waterbody extraction algorithms and application scenarios with remote sensing data. To explore the interaction between Landsat-9 data and the accuracy of varying algorithm models for QTP lake waterbody extraction and determine the algorithm for large-scale QTP lake waterbody area extraction suitable for Landsat-9 imagery, this study selected 10 models widely used in waterbody extraction, carried out comparative research leveraging the GEE platform, and found the following conclusions:
Affected by clouds and shadows, the Landsat-9 data with limited quality, and only 30 m resolution, the waterbody extraction model still achieved the best 96.59% overall accuracy and 1.505% average error in the extraction of QTP waterbody and lake waterbody area extraction. It is proven that those algorithms can effectively extract and identify QTP waterbodies using Landsat-9 data. With the publication of 15 m panchromatic data and the acquisition of high-quality images, there is still more significant improvement in the accuracy of waterbody and lake waterbody area extraction. Moreover, compared with the threshold segmentation algorithms, the machine learning algorithms have significant advantages in extracting large-scale QTP waterbody and lake waterbody areas. Comparing the two machine learning algorithms, under equal overall accuracy and average error, the operation efficiency of RF on the GEE is significantly higher than that of the SVM classifier. Therefore, the RF algorithm is more recommended in similar studies. With the development of the GEE platform, the constraint of computing power on model selection will be greatly reduced. More models with complex calculations but higher accuracy can be considered in future related research. Finally, among the traditional threshold segmentation waterbody extraction algorithms, the best extraction result is the NDWI method. The overall accuracy of waterbody extraction is 89.89%, and the average lake waterbody area extraction error is 3.501%. The NDWI method is a recommended practice in scenarios with limited samples or high operational efficiency.
This study also has some areas that need further research and improvement. First, because Landsat-9 data were just released, the data used in the study are mainly concentrated in January–April. During this period, there was still a part of incomplete melting lake ice in the lakes of the QTP, which makes the radiation transmission of waterbodies more complex and affects the accuracy of waterbody identification. However, due to limited data sources and relatively complex experiments, this study did not explore and eliminate the impact of lake ice. Secondly, to ensure the complete cloudless coverage of the whole study area as much as possible in the case of limited images, we used the min () function for image synthesis, which enhanced the influence of shadows on the image. Finally, the optimal thresholds for different underlying surfaces and scales are different for large-scale water extraction. Therefore, dividing the study area into different scale zones for threshold selection can further improve the accuracy of waterbody recognition. In the follow-up, we will further research the above problems and deficiencies when the data and computing power is improved.