1. Introduction
Inland surface water includes streams, canals, ponds, lakes and reservoirs [
1]. The amount and location of surface water change over time and space as a result of both natural processes and human practices and are, therefore, strong indicators of agricultural, environmental and ecological problems, as well as human socioeconomic development. Accurate mapping of surface water to describe its spatial and temporal distributions is essential for both academic research and policy-making [
2].
Satellite-based remote sensing can provide continuous snapshots of Earth’s surface over long periods. Landsat TM (Thematic Mapper) and ETM+ (Enhanced Thematic Mapper Plus) imagery has a moderate spatial resolution (30 m), provides multispectral images (seven or eight bands), with a short revisit interval (16 days) and includes decades of records (~30 years). Recently, the Landsat archive has been made freely available [
3]; thus, comprehensively studying the dynamic changes of surface water over large areas is no longer cost-prohibitive [
4,
5].
Numerous methods have been developed to delineate water bodies in remotely sensed imagery. The most commonly used methods fall into three categories:
- (1)
Spectral bands: These methods, which identify water bodies by applying thresholds to one or more spectral bands, are easy to implement, but often misclassify mountain shadows, urban areas or other background noise as water bodies [
6].
- (2)
Classification: These methods apply supervised or unsupervised machine-learning algorithms to extract water bodies from multispectral imagery. For supervised classification, the most notable methods are maximum-likelihood classifiers, decision trees, artificial neural networks and support vector machines. For unsupervised classification, the most common methods include the K-means and iterative self-organizing data analysis (ISODATA) [
7,
8]. These approaches may achieve higher accuracy than spectral band methods under some circumstances; however, expert experience or existing reference data are required to select appropriate training samples, which prevents these methods from being applied over large areas [
9].
- (3)
Water indices (WIs): These methods combine two or more spectral bands using various algebraic operations to enhance the discrepancy between water bodies and land. The principle underlies most WIs is similar to that of the normalized-difference vegetation index (NDVI) [
10].
The WIs have been widely used because of their relatively high accuracy in water body detection and their low-cost implementation. The design of these indices has been continually improving. For example, the normalized-difference water index (NDWI) was proposed by McFeeters [
11] to identify lakes and ponds associated with wetlands. Xu [
12] proposed the modified NDWI (MNDWI), which replaced Band 4 with Band 5. Ji
et al. [
13] suggested that the MNDWI has a more stable threshold than other WIs. Feyisa
et al. [
14] proposed two versions of an automated water extraction index (AWEI), namely AWEI
nsh (
i.e., with no shadow) and AWEI
sh (
i.e., with shadow).
To address the variation in the optimal thresholds and noise contamination issues, some studies have combined WIs with other methods, including color-space transformation [
15], principal components analysis [
16], image segmentation [
16], topographic masking [
17], linear discriminant analysis [
18] and a combination of digital image processing and geographical information system techniques [
5].
Although these efforts have made progress in eliminating noise and improving the accuracies, differences among water bodies have rarely been considered, and the accuracies are inconsistent among water body types [
13]. Pixels in deep, clean water bodies exhibit stable spectral profiles and are tolerant of threshold variations. In contrast, pixels in shallow and narrow water bodies may generate unstable spectral profiles or characteristics, due to mixed reflectance, which is caused by sediment and/or land [
19]. Therefore, these water pixels are very sensitive to threshold changes and cannot be separated from land easily. However, these pixels typically appear either at the edge of large water bodies or in narrow rivers; thus, topological relationships can be important for identifying mixed water pixels.
An automated method for extracting rivers and lakes (AMERL) is developed by combining WI with digital image processing techniques to identify water pixels, especially mixed water pixels in shallow or narrow water bodies. In this study, a narrow river is defined as having a width less than or equal to three pixels, because wide rivers (i.e., wider than three pixels) are more likely to contain pure water pixels that are easy to identify by the thresholding method.
2. Study Areas and Data Preparation
2.1. Study Areas
To evaluate the robustness of the new method, three study areas were selected in China with different water body types, climate conditions and land cover (
Table 1).
Figure 1 shows the locations and visual characteristics of the three study areas.
Study Area 1 is located in Hebei Province in northern China and encompasses the Panjiakou Reservoir and five rivers (i.e., Luanhe, Liuhe, Baohe, Heihe and Sahe). Primarily deciduous coniferous forest and shrubs cover the land surface. The area is rugged and mountainous; the elevation ranges from 107 to 1376 m. Therefore, the assessment of the water body extraction method in this area focused on distinguishing between water bodies and mountain shadows.
Study Area 2 is located at the southeast Poyang Lake, which is China’s largest freshwater lake located in Jiangxi Province in southern China. This area has abundant water resources that include eight reservoirs, three rivers (i.e., Leanjiang, Xinjiang and Wannianhe) and several tributaries. The area is primarily flat and is covered with irrigated farmland. Hilly land is located at the southeastern corner of the study area; this portion is covered by evergreen broad-leaved forest. The largest sources of background noise are Yugan County and several villages.
Study Area 3 is located in the Ningxia Hui Autonomous Region of northwestern China. The area is generally flat and is covered primarily by farmland and bare land. The Yellow River, which is the second longest river in China and well known for its turbid water, flows through the study area. This river was selected to test the robustness of the proposed method for turbid water bodies.
2.2. Image Preprocessing
Level 1T Landsat TM/ETM+ images were acquired for this study [
3]. These images have been geometrically corrected and in the form of raw DN (digital-number) values. An atmospheric correction was needed to adjust for atmospheric scattering and absorption effects that are caused by ozone, water vapor, aerosols and other particles and for Rayleigh scattering [
20]. The Landsat ecosystem disturbance adaptive processing system (LEDAPS, version 1.2.0) algorithm was used to retrieve the surface reflectance data from the DN value [
21].
2.3. Reference Data
To provide a basis for comparison, the “true” water bodies were manually digitized from the Landsat images. High-resolution Google Earth™ images were used as a complementary reference to help distinguish confusing water pixels from background noise (e.g., mountain shadows or urban areas).
3. Methods
3.1. Representative Spectral Profiles
To describe the spectral differences between water bodies and land, 1000 samples were selected to represent the five land cover types (i.e., mixed water, pure water, vegetation, urban and mountain shadow) from the entire scene of the Hebei image, which was used as the source for study Area 1; each category contained 200 samples. The mixed water pixels were sampled from narrow rivers and the edges of lakes or wide rivers. The pure water pixels were selected from the center of reservoirs and lakes, where the water bodies were deep and clear, to guarantee that the pixels contained only water. The other three sample sets were collected based on similar principles to ensure the representativeness of each pixel in the sample sets.
Figure 2 summarizes the reflectance characteristics (mean ± standard deviation) of the samples. The overall spectral reflectance of pure water was lower than that of the vegetation, urban, mountain shadow and mixed water categories [
22]. The mixed water reflectance increased with the presence of mud, hydrophytes, suspended sediments, the lake or river bed and/or land (
i.e., the riverbank or lake shore). The reflectance pattern for mountain shadows was similar to the pattern for the two water categories, which may lead to confusion between these categories during automated image processing [
14].
3.2. Water Index
The WIs are designed to enhance the discrimination between water bodies and land; however, their formulation makes them inherently sensitive to certain types of noise [
13].
Here, ρ represents the reflectance from the Landsat TM/ETM+ data for Bands 1 (blue), 2 (green), 4 (near-infrared), 5 (SWIR1) and 7 (SWIR2).
Figure 3 shows the distribution of WI values for the five land cover categories. The pure water pixels exhibit large differences relative to the non-water types for all WIs. The AWEI
nsh is most likely to confuse mixed water pixels with mountain shadow pixels, followed by the MNDWI, AWEI
sh and NDWI. For the NDWI, the mixed water pixels were more often confused with urban pixels than with shadow pixels. In addition, the NDWI and the two AWEIs had smaller ranges of values compared to the MNDWI for water pixels, whereas the AWEI
nsh exhibited the largest ranges of values for non-water pixels.
The characteristics of these distributions are important for developing the processing scheme (Section 3.3) and tuning its parameters (Section 3.6).
3.3. Automated Method for Extracting Rivers and Lakes (AMERL)
As discussed above, pure water pixels can be identified with strict WI thresholds; however, these thresholds may fail to identify mixed water pixels. In contrast, low WI thresholds may capture mixed water pixels; however, these thresholds include more noise.
In this paper, an automated method for extracting rivers and lakes (AMERL) was developed to address this issue.
Figure 4 illustrates the logic and sequence of operations for this processing scheme. Pure water pixels are extracted using a deliberately selected strict threshold (Thd
pure). This threshold (Thd
pure) is tested for each WI using several training images that have stable characteristics. A watershed segmentation method is then adopted to detect mixed water pixels at the edges of lakes or wide rivers. Moreover, a method based on linear feature enhancement is adopted to detect water pixels from narrow rivers. Lastly, the extracted lakes, wide rivers and narrow rivers are combined to produce the final water body results.
3.4. Lake and Wide River Extraction
The gradient of an image measures its changes in intensity in two aspects: the magnitude of the gradient (G
m) measures the rate at which the intensity changes; the direction of gradient depicts the direction of the maximum increase in intensity [
23]. Because the intensity of remote sensing imagery is determined by the land cover spectral reflectance [
24], pixels with low G
m represent areas of relatively homogeneous land cover types, whereas pixels with high G
m are associated with areas in which the surrounding land cover may be different.
Although there are differences in the WI values between pure and mixed water pixels, it is reasonable to assume that these differences are smaller than the differences between water and land pixels. Water bodies can be separated from land based on the G
m of WI using the watershed algorithm [
23,
25].
The original watershed algorithm is sensitive to regional minima and noise, which may lead to the over-segmentation of an image. One solution to this problem is to use markers to specify the allowed regional minima using prior knowledge about the image. The marker is composed of connected pixels in an image. Internal markers focus on objects of interest, while external markers are associated with the background [
26]. This marker-controlled watershed segmentation proceeds in three steps.
Step 1: Delineation of typical water bodies and land. Pixels with values greater than Thdpure are classified as pure water pixels (Section 3.3) and are labeled as internal markers. Pixels with values less than Thdland are classified as land pixels and are labeled as external markers. Pixels with values between Thdland and Thdpure may be land or mixed water pixels; these unmarked pixels are classified in Step 3.
For images of mountainous or hilly regions, shadow pixels must be identified and excluded from internal markers. A simple method for doing this involves thresholding (Thd
shadow;
Equation (5) based on the value of Band 2 (green), because Band 2 is more stable than the other bands for differentiating between water bodies and shadows [
14]. Shadow reduction methods based on elevation data were not used in this study, because the water areas are highly uncertain in the current existing 30-m resolution digital elevation model (DEM) products, e.g., The Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) [
27].
Step 2: Calculation of the G
m of WI. The Sobel operator is used to calculate the G
m of WI. The Sobel operator is a discrete differentiation operator that is used to estimate a digital image gradient. This operator performs better than other gradient operators, such as the Prewitt operator [
28].
Step 3: Segmentation by watersheds. The watershed algorithm iteratively expands the internal and external markers [
26], and each unmarked pixel is classified as either water body or land. Ultimately, all lakes and wide rivers are extracted in this step.
3.6. Parameter Tuning
Because the AMERL considers comprehensive situations in water body extraction, eight parameters are utilized for different purposes. These parameters were trained on the three study areas and four additional test areas (the test areas are not included in this paper). The values used in study Area 1 are listed in
Table 3. The eight parameters can be categorized into two groups by considering their purpose:
Group 1, used for water body extraction: The values of Thdpure, Thdland, LFEhigh and LFElow are dependent on which WI the AMERL is using, because the range of values for a particular land cover varies with each WI (Section 3.2). However, for a given WI, the four parameters are very stable throughout the three study areas and the four test areas.
Group 2, used for noise reduction: The values of Thdriver, NP, Thdshadow and Thdroads are independent of the WI. However, this group of parameters is not as stable as those of the first group. The noise reduction parameters can be divided into two sub-groups based on their stability:
Relatively stable: Thdriver and NP are the same in the three study areas; however, these parameters require additional training in two of the test areas. For example, images of rugged hills contain many non-water linear features; thus, increasing the values of the two parameters may eliminate more noise.
Unstable: Thdshadow and Thdroads typically require manual decisions regarding whether the target image contains the corresponding noise; if the noise is present, a parameter training process should be performed. Otherwise, these parameters should be ignored, because noise reduction methods have the potential to misidentify water bodies as noise. In this study, Thdshadow is not used in study Areas 2 and 3, whereas Thdroads is not used in study Area 1.
The accuracy of AMERL in water body extraction is compared with that of an optimal thresholding method. To determine the optimal threshold, a series of thresholds are tested. Each possible threshold generates a water body extraction result; errors are calculated by comparing this result with the reference data. A higher threshold may identify only a few water bodies (i.e., higher omission error); a lower threshold may not only identify many water bodies, but also mistakenly identify land as water bodies (i.e., higher commission error). The optimal threshold corresponds to the minimum total error, which is the sum of the commission and omission errors. In this study, the thresholds for all WIs in this paper are tested using values ranging from −0.1 to 0.1 in increments of 0.01.
4. Results and Discussion
The overall water body extraction performance was first tested via visual inspection and a pixel-by-pixel assessment of the images. Subsequently, detailed evaluations of the results were conducted using several metrics to assess the performance of the proposed method in identifying mixed water pixels at the edges of lakes or wide rivers and in narrow rivers.
4.1. Overall Water Body Mapping Performance
The performance of the AMERL was compared with the optimal thresholding results for each of the four WIs in the three study areas (
Table 4); however, to display the results concisely, only two representative WIs were demonstrated for each study area (
Figure 7).
A visual inspection indicates that the AMERL successfully extracted most of the narrow rivers in all of the study areas, whereas the optimal thresholding method extracted only small portions of these narrow rivers. However, regarding most of the lakes or wide rivers, there was little difference between the optimal thresholding method and the AMERL or among the four WIs. This similarity arises because the optimal thresholding method is sufficient to extract pure water pixels in this water body type. However, the extraction of wide river pixels, primarily in the Yellow River within study Area 3, using the NDWI is an exception. Due to the very high sediment load, the reflectance of the Yellow River in the near-infrared band is larger than in the green band; thus, the NDWI values in the wide river were smaller than the value of several bare land pixels. Both the AMERL and the optimal thresholding method failed to extract most of the pixels in the Yellow River using the NDWI in study Area 3.
Table 4 provides a pixel-by-pixel analysis of the classification accuracies based on four metrics: the user’s and producer’s accuracy, the kappa coefficient, and the total error. In comparison with the optimal thresholding method, the total error of the AMERL decreased by 1% to 12.8% for eleven of the twelve tests (i.e., the tests of four WIs in three study areas). The one exception was the AWEI
sh in Jiangxi (i.e., study area 2), which had a slightly higher total error using the AMERL. The reason for this discrepancy is discussed in Section 4.3. The improvements of water body extraction accuracies are insignificant regarding these metrics, because AMERL’s main objective is to enhance the ability to identify the mixed water pixels; however, the mixed water pixels are less than pure water pixels. Thus, additional metrics are required for evaluating the extraction results that focus on the mixed water pixels, including three metrics for them on edges of lakes and wide rivers (Section 4.2) and three metrics for them on narrow rivers (Section 4.3).
Among the WIs, the optimal threshold values are varied with each study area (
Table 4); however, the AMERL uses the same water extraction parameters for all three study areas. It is only required to specify the noise reduction parameters, Thd
shadow and Thd
roads. As AMERL’s accuracies are not less than the optimal thresholding method, it is reasonable to believe that the parameter tuning is easier in the AMERL than the thresholding method.
4.2. Accuracy of the Edge Positions for the Extracted Lakes and Wide Rivers
The accuracy of the extraction of lakes and wide rivers was assessed using a subset of the images for study Area 1 (
Figure 8). The image size is 500 × 595 pixels and is focused on the Panjiakou Reservoir in an area without rivers. The results using the optimal thresholding method and the AMERL were compared for the four WIs; however,
Figure 8 only shows the results for two representative WIs.
To further evaluate the identification of mixed pixels at the edges of lakes and wide rivers, the boundary extraction method was used to identify the edges of the extracted water bodies and the manually interpreted reference images. Boundary extraction is a simple morphological image processing algorithm that extracts the outermost pixels of features from a binary image [
23]. Three metrics were applied to evaluate the results from different perspectives:
such that A + EC + EO = 100%. Here, A is the accuracy of the edge position, E represents the edge position error frequency (C for commission and O for omission), NR is the number of edge pixels in the reference image, Nt is the total number of extracted edge pixels in the reference image, NC is the number of extracted edge pixels that lie outside the reference water bodies (i.e., the number of commission errors) and NO is the number of extracted edge pixels that lie inside the reference water bodies and that are not directly adjacent to the actual edge (i.e., omission errors).
Table 5 summarizes the edge extraction accuracy for Panjiakou Reservoir using the default thresholding method (non-optimized), the optimal thresholding method (Thd
opti) and the AMERL. The default thresholding method exhibits either high commission or omission errors in the water body results; therefore, this approach had the lowest accuracy among the three methods. For the optimal thresholding method, the highest A was achieved with the AWEI
sh, followed by the NDWI, MNDWI and AWEI
nsh. However, using the AMERL, the highest A was achieved with the MNDWI, followed by the AWEI
sh, AWEI
nsh and NDWI. The largest improvements relative to the optimal thresholding method using the AMERL were achieved by the AWEI
nsh (20.2%) and MNDWI (17.2%); the AWEI
sh (9.6%) and NDWI (6.4%) exhibited smaller improvements (
Figure 9). The reason for this result is that the AWEI
nsh and MNDWI are more vulnerable to shadow pixels than the AWEI
sh and NDWI; thus, the extra shadow reduction procedure in the AMERL provides a larger improvement.
However, there is an area that is situated 260 m above the Panjiakou Reservoir (measured using Google Earth) and located in the central region of the reservoir (
Figure 8). This area casts a shadow on the water body and bank. None of the methods used in this study were able to accurately identify the actual edge of the reservoir in this area. Although this problem did not introduce large errors in the overall water body extraction, it could be eliminated in future research by introducing a digital elevation model (DEM) with a sufficient spatial resolution and high data quality to accurately identify shadow pixels.
5. Conclusions
There are several difficulties associated with water body extraction from satellite imagery using water indices (WIs) that are based on the thresholding method, including: (1) inefficient identification of mixed water pixels; (2) confusion of water bodies with background noise; and (3) variations in the optimal water extraction threshold that depend on the characteristics of an individual scene. Considering that mixed water pixels usually appear in narrow rivers or along the edge of lakes or wide rivers, the automated method for extracting rivers and lakes (AMERL) is proposed for extracting mixed water pixels by considering not only their spectral characteristics, but also their topological connections. A series of processes are introduced to filter out noise caused by shadows, roads and small segments by considering both their spatial and spectral characteristics. The threshold optimization procedure is also simplified using AMERL.
The ability of the new method to improve the extraction of water bodies was tested based on four WIs calculated for Landsat TM (Thematic Mapper) and ETM+ (Enhanced Thematic Mapper Plus) imagery over three study areas in China. The AMERL outperformed the optimal thresholding method for most WIs in all study areas. The AMERL substantially improved the completeness of narrow rivers extraction for all four WIs. However, the modified normalized-difference water index (MNDWI) [
12] and automated water extraction index (AWEI
sh, with shadow presence) [
14] exhibited a larger improvement than the normalized-difference water index (NDWI) [
11] and AWEI
nsh (without shadow presence) [
14] in each of the three study areas. The proposed method can be further improved to take advantage of the high-resolution digital elevation model (DEM) for shadow reduction. Further applications are required to assess the robustness of the method on imagery from other sensors and other areas of the world.