Abstract
Coal is one of the most important fossil energy sources and is ensuring global energy security. Annual maximum NDVI (Normalized Difference Vegetation Index) data is an important indicator for the research in balancing coal mining and vegetation conservation. However, the existing annual maximum NDVI data displayed lower values with temporally inconsistent and a noticeable mosaic line. Here we propose an algorithm for automatically generating the annual maximum NDVI of Chinaâs coal bases in Google Earth Engine called: Auto-NDVIcb. The accuracy of the Auto-NDVIcb algorithm has been verified with an average RMSE of 0.087 for the 14 coal bases from 2013 to 2022. Based on the proposed Auto-NDVIcb algorithm, an annual maximum NDVI dataset for all 14 coal bases in China from 2013 to 2022 was publicly released. This dataset can be fast and automatically updated online. Hence, the public dataset will continuously serve to monitor the vegetation change induced by coal mining, exploring the mechanism of vegetation degradation, and providing scientific data for developing vegetation protection policies in coal mines.
Similar content being viewed by others
Background & Summary
Energy and the environment are at the centre of the worldâs current concerns. As one of the most important fossil energy sources, coal has been playing a crucial role in ensuring global energy security. However, coal exploitation has led to a range of ecological and environmental challenges, in particular, the degradation of vegetation, which is getting more and more attention from the major coal-producing countries of the world, such as America, Australia, and China1,2,3,4.
Normalized Difference Vegetation Index (NDVI) is a popular and effective indicator for characterizing the growth and coverage of vegetation. Since it was proposed in the 1970s5, numerous researchers have successfully used NDVI to characterize vegetation status in terms of spatial and temporal change6,7,8,9,10,11. For example, Gao et al.8 used NDVI as an indicator for statistical regression to analyse vegetation changes at different elevations over the past 30 years. Yang et al.12,13 used the NDVI time series and the shape-based clustering method for monitoring vegetation change in a surface coal mining area. Li et al.14 used NDVI data to investigate the vegetation cover to reflect the disturbance and restoration of the opencast mining areas. Therefore, high-quality NDVI data are urgently needed to monitor vegetation changes in coal bases. It can not only improve the knowledge of ecological changes in the mining area but also guide and optimize production and ecological restoration activities in the mining area. Due to mining and restoration activities, time-series change of vegetation at the object-scale in mining areas is usually temporally abrupt. Moreover, the inter-annual variation in phenology also influences the NDVI in a similar season every year. The annual maximum NDVI data better reflects the real growth of the vegetation.
There are two main methods for acquiring NDVI data. The first method is to retrieve NDVI data directly from publicly available NDVI products. The second method is to perform the own band math to generate NDVI by the users.
-
(1)
Using publicly available NDVI products. Some NDVI products based on different satellite sensors have been produced by various institutions or experts over the decades. The primary NDVI products are listed in Table 1, involved with satellites/sensors such as the Advanced Very High-Resolution Radiometer (AVHRR)15,16,17, SPOT/VEGETATION, Moderate Resolution Imaging Spectroradiometer (MODIS)18,19, Visible Infrared Imaging Radiometer Suite (VIIRS)20,21, GF series22, Landsat series23, and Sentinel series24. The spatial resolution of existing NDVI products varies from 10âkm (e.g., GIMMS NDVI) to 1 km-250 m (e.g., MODIS NDVI), 30âm (Landsat NDVI), and even 10âm (Sentinel-2 NDVI). Products such as âGIMMS NDVI From AVHRR Sensorsâ, âNOAA CDR AVHRR NDVIâ, and âMODIS Terra Daily NDVIâ offer high temporal resolution but low spatial resolution, making them unsuitable for small-scale studies. Low-resolution NDVI images may not capture fine vegetation features within mining areas, such as small vegetation cover or unevenly distributed vegetation. Additionally, the presence of multiple surface cover types around the coal base, including bare soil, ore stockpiles, vegetation, etc., can lead to the mixing of these diverse surfaces in a single image element. The accuracy of vegetation analysis results may be compromised. Conversely, âChina 10âm Year-by-Year NDVI Maximum Dataset (2016â2021)â and âMuSyQ GF-series 16âm/10-day NDVI Vegetation Index Dataset (2018â2020)â25 have a higher spatial resolution, but only provide data for recent several years, which cannot satisfy the long-time series research. Landsat satellites combine the advantages of both temporal and spatial resolution.
Landsat data have great potential for exploitation as they provide a wealth of free time-series public historical imagery with a spatial resolution of 30 m26,27,28. Consequently, several NDVI data products have been developed based on Landsat29. For example, the âChina annual maximum NDVI dataset (30âm,1986â2021)â relies on Landsat top of atmosphere (TOA) data as its source, computing NDVI throughout the year and selecting the maximum value to mosaic30. The âChina annual maximum NDVI dataset (30âm, 2000â2020)â presented by Dong31 was generated by extracting NDVI values from valid Landsat surface reflectance (SR) observations throughout the year. Both linear interpolation and the Savitzky-Golay (SG) smoothing filter methods were used in this process32. However, there are some minor defects with these products in Chinaâs coal bases. First, when using TOA data to calculate NDVI, the results are impacted by the atmosphere33. Secondly, the cloud cover would cause images missing during the most vigorous period of vegetation growth34,35,36. It is not appropriate for existing products to directly use data from other times to fill in the gaps. Furthermore, the problem of inconsistent time-phases for different images of Landsat are often ignored. This may cause notable mosaic lines in the NDVI data, as well as the annual maximum NDVI that falls considerably below the real value.
-
(2)
User-computed NDVI: Traditionally, users can compute NDVI through band math utilizing local software. Nevertheless, this approach is often time-consuming, especially when handling a large number of NDVI images. In contrast, Google Earth Engine (GEE) has become a leading platform for producing extensive time series of NDVI data37,38,39,40. For instance, Chen et al.41 presented the âGap Filling and Savitzky-Golay filtering (GF-SG)â approach that effectively reconstructs 8-day NDVI time-series data using MODIS and Landsat data within GEE. Furthermore, GEE provides the qualityMosaic (QM) method for extracting the maximum value from image collections. However, it addresses missing pixels by incorporating data from other times.
In summary, due to satellite revisiting capability and cloud cover, there is missing data during the most vigorous vegetation growth period. The fusion of NDVI images with inconsistent imaging time causes some problems as follows. (1) The extracted annual maximum values of NDVI are significantly lower than the real values. (2) Visually, the NDVI image displayed a noticeable mosaic line.
In this paper, we proposed a GEE-based algorithm for automatically calculating the temporally-consistent annual maximum NDVI of Chinaâs coal bases (Auto-NDVIcb). Auto-NDVIcb is based on Landsat SR data and achieves online fast generation of seamless annual maximum NDVI data with long-time series, spatial continuity, and temporal consistency. High-quality vegetation monitoring data of coal bases provides solid data support for ecological environment monitoring in mining areas. The main contributions of this paper can be summarized below:
-
(1)
Auto-NDVIcb solved the problem of the mosaic line caused by the lack of images during the vigorous vegetation growth and inconsistency in the acquiring time of nearby images. And is capable of producing temporally-consistent annual maximum NDVI of Chinaâs coal bases.
-
(2)
The annual maximum NDVI dataset of 14 coal bases in China from 2013 to 202242 is publicly available, which is closer to the reality of surface vegetation rather than just visual spatial smoothness. Compared with the existing annual maximum NDVI products and the current method, it shows better results at the scale of coal bases.
-
(3)
Auto-NDVIcb can be fully automated in GEE, so the public dataset can be fast and automatically updated online in a continuous way based on the proposed Auto-NDVIcb algorithm.
Methods
Study area
The NDVI data generated in this paper covers all of the 14 coal bases in China, i.e., Shendong (SD), Jinbei (JB), Jinzhong (JZ), Jindong (JD), Mengdong (MD), Yungui (YG), Henan (HN), Luxi (LX), Lianghuai (LH), Huanglong (HL), Jizhong (JIZ), Ningdong (ND), Shanbei (SB), and Xinjiang (XJ) coal bases (Fig. 1). These coal bases contain Chinaâs major coal mines. At these coal bases, large-scale and high-intensity coal mining activities have resulted in significant vegetation changes in mining areas.
Remotely sensed images
The remotely sensed images utilized in this study were the âUSGS Landsat 8 Level 2, Collection 2, Tier 1â43. The remotely sensed images utilized in this study were the âUSGS Landsat 8 Level 2, Collection 2, Tier 1â43. This dataset is atmospherically corrected surface reflectance data produced by the Landsat 8 OLI sensors using the Land Surface Reflectance Code (LaSRC) by the USGS Earth Resources Observation and Science (EROS) Center. The accuracy, precision, and uncertainty of the OLI surface reflectance data published by the USGS has been described quantitatively in detail in the reference44. It also has been demonstrated that the quality of OLI surface reflectance data generated by LaSRC allow scientists to provide more useful data products including the NDVI data44. It is publicly released on GEE with a spatial resolution of 30âm and a temporal resolution of 16 days (https://developers.google.com/earth-engine/datasets/catalog/landsat). The Worldwide Reference System (WRS) is used to identify a nominal Landsat scene by specifying the Path and Row numbers. The Path represents the descending orbit of the satellite. Row represents the location of a scene on the Path. In this paper, we use the notation ppprrr (i.e., Pathâ=âppp and Rowâ=ârrr) according to the WRS system to designate each specific scene. For example, the scene with the number 126032 is identified by Pathâ=â126 and Rowâ=â032. Each scene approximately covers 170âkm x 183âkm area in China.
The overview of the Auto-NDVIcb algorithm
Figure 2 illustrates the procedure of the proposed Auto-NDVIcb algorithm, which can be conducted totally in GEE. First, images effectively observed throughout the whole year are filtered, and NDVI calculations are performed on these images. Second, an automated selecting function has been developed to choose the reference NDVI image for each coal base, which most closely characterizes the vegetation growth at its peak. Third, a calibration model is constructed, ensuring consistency in the NDVI cumulative distribution function across similar land cover. Finally, all the collected NDVI images covering the coal base are batch-calibrated based on the constructed model and fused to obtain the annual maximum NDVI data.
Calculation of time-series NDVI
The time-series NDVI images were computed in GEE. Here is the description of the steps involved. First, the Landsat 8 SR dataset was loaded into GEE. Images falling within a designated period and specific geographical region were selected for further processing. Then, date filtering was applied to select images during the most vigorous vegetation growth period in the study area. For most coal bases, the date filtering parameter was set from April to October of each year to avoid extreme phenology. Given the limitations of cloud cover and Landsat revisit cycles, focusing only on summer season images would result in vacant pixel values in the NDVI results. An exception was made for the YG coal base. Frequent cloud coverage at the YG coal base throughout the year results in less valid data, so the date filtering parameter was set from January to December, thus ensuring the completeness of data coverage. The boundary of each coal base was used as a parameter for spatial filtering. Third, the cloud and its shadow were removed for each image after filtering. The cloud removal process for Landsat SR images in GEE focuses on clouds and cloud shadows45. The Quality Assessment (QA) band, which stores quality information for each image, was utilized. Pixels with quality information indicating âCloudâ and âCloud Shadowâ were filtered out using bitwise operations, creating a cloud mask. The âupdateMaskâ function in GEE was then used to remove the cloud mask area and obtain cloud-free images46. There may be unscreened clouds (e.g., cirrus clouds) after the filtering. However, in our study, we did not find the impact of cirrus clouds on the results. Even if cirrus cloud pixels exist, the Auto-NDVIcb algorithm will exclude them during the calculation of the maximum NDVI value. Finally, NDVI was computed for each image using the Eq. (1).
where NIR represents the surface reflectance in the near-infrared band and RED represents the surface reflectance in the red band.
Automatic selection of the reference NDVI image
The reference NDVI image for each coal base needs to most closely characterize the most vigorous vegetation growth in a year. Hence, the reference NDVI image is defined as the NDVI image with the highest average value among all the NDVI images of the coal base in a year. We designed an algorithm to automatically select the reference NDVI image from the NDVI collection in GEE. First, each coal base is covered by several scenes, and each scene covers an area of varying size. To ensure the sufficient coverage area of the reference NDVI image, 2-3 scenes were designated as selectable scenes at each coal base. For example, the JB coal base is covered by scenes of numbers 126032, 125032, 126033, 125033, 126034, and 125034 (Fig. 3). According to the coverage size, scene 126033 (Fig. 3(b)) and scene 125033 (Fig. 3(e)) were designated as selectable scenes at the JB coal base. Second, the images in the selectable scenes with over 40% clouds were removed to avoid severe missing pixels in the reference NDVI image after cloud removal. Finally, we calculate the average value of each NDVI image in the selectable scenes. The NDVI image with the highest average value is selected as the reference NDVI image.
Construction of the calibration model
Toblerâs First Law of Geography: All things are related, but nearby things are more related than distant things42. Simultaneously acquired Images of nearby areas generally have similar color tones, with very few abrupt changes in overall cliffiness. In other words, the cumulative probability distributions of contemporary NDVI images in nearby areas are similar. In light of this, a calibration model was constructed to harmonize the different time-phase NDVI images (called to-be-calibrated NDVI images) at all the scenes covering a coal base to align with the cumulative probability distribution of the reference NDVI image. Each coal base requires at least 4â9 scenes of images to be fully covered. There are 90 scenes at the 14 coal bases. Each scene has about 20 images per year. Therefore, at a minimum, about 1800 images in a year need to be processed. The construction of the calibration model includes the following steps:
-
(1)
The first step is to create a function that calculates the cumulative probabilities of both the reference NDVI image and the to-be-calibrated NDVI images. This function begins by computing the histogram of the NDVI image using the âee.Reducer.histogram()â reducer. It then derives the Cumulative Distribution Function (CDF) of the NDVI image. Finally, the function outputs a feature set containing cumulative probability values corresponding to each NDVI image.
-
(2)
In the second step, a fitting model is constructed for the feature sets, mapping NDVI values from the to-be-calibrated NDVI images to the reference NDVI image. In this paper, the CART classifier function is used to build the fitting model on GEE. The parameter of the CART classifier function is set to âregressionâ to allow outputting continuous values from a standard regression. Any NDVI value âaiâ in the to-be-calibrated NDVI image can find a corresponding NDVI value âbjâ on the reference NDVI image so that the cumulative probabilities corresponding to âaiâ and âbjâ are equivalent (Fig. 4). The input data for this function includes the to-be-calibrated and the reference NDVI images.
Batch calibration and fusion
The to-be-calibrated NDVI images at all the scenes covering a coal base were looped into the algorithm of constructing calibration models to get a calibrated NDVI image collection. Each calibrated NDVI was or was close to the annual maximum NDVI. The values of the calibrated NDVI image collection at each pixel were retrieved and sorted, and the median value was selected to get the annual maximum NDVI image. The function of median fusion can effectively avoid anomalous pixels. Considering the characteristics of discontinuous spatial distribution and the large geographic extent of each coal base, the algorithm was conducted for all 14 coal bases, respectively.
Data Records
The annual maximum NDVI dataset47 of all 14 coal bases in China from 2013 to 2022 is available to the public at 10.6084/m9.figshare.c.6933322. It includes the following 14 compressed packages of 140 GeoTIFF images:
-
1)
The annual maximum NDVI dataset of Henan (HN) coal base in China (2013â2022).
-
2)
The annual maximum NDVI dataset of Jinbei (JB) coal base in China (2013â2022).
-
3)
The annual maximum NDVI dataset of Xinjiang (XJ) coal base in China (2013â2022).
-
4)
The annual maximum NDVI dataset of Jindong (JD) coal base in China (2013â2022).
-
5)
The annual maximum NDVI dataset of Jinzhong (JZ) coal base in China (2013â2022).
-
6)
The annual maximum NDVI dataset of Lianghuai (LH) coal base in China (2013â2022).
-
7)
The annual maximum NDVI dataset of Luxi (LX) coal base in China (2013â2022).
-
8)
The annual maximum NDVI dataset of Mengdong (MD) coal base in China (2013â2022).
-
9)
The annual maximum NDVI dataset of Yungui (YG) coal bases in China (2013â2022).
-
10)
The annual maximum NDVI dataset of Jizhong (JIZ) coal base in China (2013â2022).
-
11)
The annual maximum NDVI dataset of Ningdong (ND) coal base in China (2013â2022).
-
12)
The annual maximum NDVI dataset of Shanbei (SB) coal base in China (2013â2022).
-
13)
The annual maximum NDVI dataset of Shendong (SD) coal base in China (2013â2022).
-
14)
The annual maximum NDVI dataset of Huanglong (HL) coal base in China (2013â2022).
Each package represents the NDVI images in 10 years for each coal base, with a spatial resolution of 30âm. The geographic reference for all images is ESPG:4326 (WGS_1984). The values of the image range from â1 to 1.
Technical Validation
Accuracy of reconstructing the annual maximum NDVI images
Experiments were conducted to assess the accuracy of the Auto-NDVIcb algorithm by assuming the absence of the annual maximum NDVI image. Since a single sensor was used as a data source, the maximum NDVI true value acquired by that sensor should be used as validation data. The nearby image that has the same imaging time as the reference NDVI image also closely represents the annual maximum NDVI. This image is regarded as the âTrue imageâ, and assumed missing. Auto-NDVIcb algorithm is conducted to reconstruct the annual maximum NDVI (âPredicted imageâ). The images used for accuracy validation are independent and high-quality data. The Root Mean Square Error (RMSE) between the âPredicted imageâ and the âTrue imageâ is calculated to verify the accuracy as Eq. (2). Here we use the example of the JB coal base in 2020 to provide a detailed illustration of the experiments. As shown in Fig. 3, the image acquired on 24th August with PathRowâ=â126033 has been selected as the reference NDVI image, most closely characterizing the most vigorous vegetation growth in 2020. The image with PathRowâ=â126032 was also acquired on 24th August, and also closely represented the annual maximum NDVI. Hence, the image (PathRowâ=â126032, 24th August) was regarded as the âTrue imageâ and removed to simulate the absence of the annual maximum NDVI image. The Auto-NDVIcb algorithm was conducted to predict the annual maximum NDVI image based on the reference NDVI image (PathRowâ=â126033, 24th August). Finally, the RMSE was calculated. In this study, the same experiments were carried out for all 14 coal bases to validate the accuracy.
where n represents the number of pixels in the image; \({{\rm{value}}}_{Predicted}({{\rm{x}}}_{{\rm{i}},}{{\rm{y}}}_{{\rm{i}}})\) represents the NDVI value of the pixel at position x, y in the âPredicted imageâ; \({{\rm{value}}}_{True}({{\rm{x}}}_{{\rm{i}},}{{\rm{y}}}_{{\rm{i}}})\) represents the NDVI value of the pixel at position x, y in the âTrue imageâ.
Table 2 shows the RMSE of the Auto-NDVIcb algorithm year by year in each coal base. The bold values indicate the maximum and minimum values of RMSE from 2013â2022 for each coal base. The algorithm produced an average RMSE of 0.087 across the 14 coal bases from 2013 to 2022. The algorithm had varying accuracy in different coal bases. It suggests that the highest accuracy is present in the YG coal base with an average RMSE of 0.062, followed by the JZ and JB coal base at 0.071 and 0.079, respectively. The average RMSEs for the JD, LH, LX, HL, SD, HN, and SB coal bases all ranged between 0.080 and 0.088. By contrast, the ND coal base displayed the lowest accuracy, with the highest average RMSE of 0.113. In terms of accuracy in specific years, the YG coal base achieved the lowest RMSE of 0.054 in 2022, while the ND coal base had the highest RMSE of 0.132 in 2015. The YG coal base belongs to the broad-leaved evergreen forest area, where the land cover is dominated by vegetation. Moreover, the vegetation grew vigorously, and the land cover between the scenes was close to each other. On the contrary, the ND coal base belongs to the plateau-continental climate area, with sparse surface vegetation and large heterogeneity of land covers between different scenes, but the RMSE of 0.132 in the ND coal base is also acceptable. The maximum inter-annual differences of RMSE vary from 0.020 to 0.045 with different coal bases, with an average of 0.031. In terms of all the coal bases, the average RMSE varies from 0.081 to 0.093 with different years. Figure 5 displays the distribution of RMSE for 14 coal bases from 2013 to 2022. Outliers were only observed in the LH and JD coal bases, exhibiting RMSE values of 0.103 and 0.108 correspondingly. Taken together, Fig. 5 and Table 2 show that the accuracy of the Auto-NDVIcb algorithm fluctuates little from year to year and remains stable. Overall, the Auto-NDVIcb algorithm exhibits a satisfactory performance across the 14 coal bases.
Applicability of the Auto-NDVIcb algorithm to the 14 coal bases
The theoretical basement of Auto-NDVIcb is Toblerâs First Law of Geography. So, when the to-be-calibrated NDVI image is far away from the reference NDVI image, the accuracy of Auto-NDVIcb may be lower. Experiments were conducted to assess the applicability of the Auto-NDVIcb algorithm at the scale of coal bases and measure the accuracies of different scenes. The JB coal base is also taken as an example to illustrate the experiments, and the accuracies of scenes 125032, 125033, 125034, and 126034 were measured. First, the image that is closest in each scene to the time of the reference NDVI image was selected as the âTrue imageâ. Second, the True image of each scene was removed to simulate the absence of the annual maximum NDVI image, respectively. Third, the reconstructed annual maximum NDVI images (called âPredicted imageâ) were produced by the Auto-NDVIcb algorithm. Finally, the RMSE between the True images and the Predicted images were calculated, respectively. Table 3 shows the accuracy of each scene covering the 14 coal bases. We define a term called âimage distanceâ. The image distance is the sum of the number of Path and Row differences between the to-be-calibrated NDVI image and the reference NDVI image as Eq. (3).
where Pathcal represents the Path number of the to-be-calibrated NDVI image; Rowcal represents the Row number of the to-be-calibrated NDVI image; Pathref represents the Path number of the reference NDVI image; Rowref represents the Row number of the reference NDVI image.
Figure 6 shows the variation of RMSE with image distance. Among all the scenes, the lowest RMSE is scene 122035 of LH coal base, with an image distance of 1 and an RMSE of 0.063. The highest RMSE is scene 144030 of the XJ coal base, with an image distance of 4 and an RMSE of 0.140. In terms of specific coal bases, for example, in JD coal base, the image distance of scene number 125035 is 1, and the RMSE is 0.068; the image distance of scene numbers 125036 and 126035 is 2, and the RMSE is 0.082 and 0.086, respectively. All 14 coal bases show that the farther the distance between the to-be-calibrated image and the reference NDVI image, the higher the RMSE. The farthest image distance is 4, and the accuracy of the algorithm is still acceptable (0.140).
Impact of Imaging Time on Accuracy
This experiment aims to assess the accuracy of the Auto-NDVIcb algorithm for to-be-calibrated NDVI images from different months. The experimental process is illustrated in Fig. 7 (JB coal base as an example). The nearby image that has the same imaging time as the reference NDVI image is regarded as the âTrue imageâ and removed. The to-be-calibrated image in each month is calibrated by Auto-NDVIcb to produce the âPredicted imageâ (e.g., the April and May in Fig. 7). If there is not only one image (usually two images) in a month, the images in this month are calibrated respectively, and is fused by âmedianâ function in GEE to produce the âPredicted imageâ in this month (e.g., the June and July in Fig. 7). Finally, the RMSE between the True image and the Predicted image for each month is calculated. This experiment has been carried out for each coal base, respectively.
Figure 8 shows the variation of RMSE with months at each coal base. The RMSE showed an obvious trend of decreasing and then increasing. The months with the highest accuracy were mainly July and August. The RMSE of YG, JIZ, and HL coal bases was lowest in July, and the RMSE of SD, JZ, LH, HN, JB, MD, LX, SB, and ND coal bases was lowest in August. RMSE in XJ was lowest in June, and RMSE in JD was lowest in September.
The month with the lowest RMSE usually coincided with the month in which the maximum NDVI was recorded, i.e., the month of the reference NDVI image. For instance, in terms of the JZ coal base, the reference NDVI image was acquired on 24th August and the lowest RMSE of 0.063 also appeared in August. Similarly, the reference NDVI image for the YG coal base was acquired on 11th July, and the lowest RMSE was present in July with 0.061. This trend is also observed for the HL, SD, JZ, LH, HN, JB, MD, LX, SB, and JD coal bases. However, there are some exceptions. For example, the ND coal base does not display the lowest RMSE in September, the month of the reference NDVI image. This misalignment can be attributed to the date of reference NDVI image, i.e., 1st September, which is close to August for the lowest RMSE. Although the month with the lowest RMSE in the XJ and JIZ coal bases did not exactly align with the month of the reference NDVI image, the difference between the lowest RMSE and the RMSE in the month of the reference NDVI image is small (XJ: 0.090 vs. 0.093; JIZ: 0.083 vs.0.088). These results suggest that the accuracy of the Auto-NDVIcb is related to the difference in imaging time between the to-be-calibrated NDVI image and the reference NDVI image. However, even when the difference in imaging time is the largest, for example, the to-be-calibrated NDVI image in January (YG coal base) or April (JB coal base), the largest RMSE is 0.121 (YG coal base) and 0.126 (JB coal base). It indicates that the imaging time of the to-be-calibrated NDVI image has an effect on the accuracy, but the effect is limited and acceptable.
Comparison with other NDVI datasets and methods
In this experiment, the annual maximum NDVI computed by the Auto-NDVIcb algorithm was compared with NDVI images from the publicly available dataset and NDVI computed by the qualityMosaic method in GEE. The two publicly available datasets are the â32-Day NDVI Compositeâ48 produced by USGS (referred to as the âUSGSâ dataset) and the âChina annual maximum NDVI dataset (30âm, 2000â2020)â31 (referred to as the âCNMDâ dataset). The âUSGSâ dataset is made from Tier 1 orthorectified scenes, using the computed top-of-atmosphere (TOA) reflectance. These composites are created from all the scenes in each 32 days beginning from the first day of the year and continuing to the 352nd day of the year. The last composite of the year, beginning on day 353, will overlap the first composite of the following year by 20 days. All the images from every 32 days are included in the composite, with the most recent pixel as the composite value, which is publicly available on GEE. In this study, we extracted the annual maximum NDVI at the pixel scale. The âCNMDâ dataset utilizes all Landsat5/7/8/9 remote sensing data throughout the year to obtain the maximum value of NDVI in one year for each pixel from 2000 to 2020 using series data preprocessing and data smoothing. The dataset is produced based on the GEE cloud computing platform and publicly released in the National Ecosystem Science Data Center. The dataset has a spatial resolution of 30âm and a temporal resolution of years.
Figure 9 shows the annual maximum NDVI results for the YG, MD, SB, and LX coal base from the âUSGSâ dataset, the qualityMosaic method, the âCNMDâ dataset, and the Auto-NDVIcb algorithm. In Fig. 9(a,b), the mosaic lines were noticeable. For the âUSGSâ dataset, the TOA data was used for calculating NDVI, which lacked atmospheric correction. Hence, the NDVI values in Fig. 9(a) were significantly lower than others. The qualityMosaic method can calculate NDVI using Landsat SR data, but it usually suffers from the problem of missing data during the most vigorous vegetation growth period due to the satellite revisiting capability and cloud cover. The missing data were replaced by data at other times in the qualityMosaic method. Consequently, the annual maximum NDVI images exhibited distinct mosaic lines, and some areas had considerably lower NDVI values compared to the real annual maximum NDVI values. Figure 9(c) displays the annual maximum NDVI results from the âCNMDâ dataset. While this dataset shows slight traces of mosaic lines in the visualization, the overall NDVI values were lower. To address the problems of missing data during the peak of vegetation growth, time-series images were interpolated and subjected to SG filtering. However, this approach could not fully recover the annual maximum NDVI, leading to lower NDVI values overall. Figure 9(d) shows the annual maximum NDVI results generated by the Auto-NDVIcb algorithm. These results do not show mosaic lines in the visualization, and the values are closer to the real annual maximum NDVI values. In comparison, the Auto-NDVIcb algorithm is capable of producing annual maximum NDVI with better temporal consistency at the scale of coal bases.
We conducted a detailed comparison of annual maximum NDVI data in time series from different datasets and methods using a sample area within the YG coal base, as shown in Fig. 10. The USGS dataset is available until 2021. However, its NDVI values were significantly lower than those of the other three datasets and exhibited noticeable mosaic lines. The qualityMosaic method resulted in an NDVI that highlights a distinct mosaic line for the years 2018 and 2019. The CNMD dataset is available until 2020. In terms of visualization, it displays horizontal stripes in NDVI images. Moreover, the NDVI values for 2019 and 2020 were abnormally lower in comparison to the first six years.
To further compare different NDVI datasets, two sample points were randomly chosen and the NDVI time-series plots for the âUSGSâ dataset, qualityMosaic method, âCNMDâ dataset, and Auto-NDVIcb algorithm were plotted (Fig. 11). Combined with Fig. 10, it can be seen that Fig. 11(c) âQMâ data in 2018 NDVI values plummeted may not attribute to vegetation damage, but the absence of remote sensing image at the moment of the most vigorous growth of vegetation. The results suggest that the Auto-NDVIcb algorithm maintained consistently stable annual maximum NDVI values, even during years when abnormal NDVI values were produced by the âCNMDâ dataset and qualityMosaic method. Furthermore, the changing pattern of NDVI generated by the Auto-NDVIcb algorithm over years is consistent with that of the NDVI from the âCNMDâ dataset and the qualityMosaic method. Consequently, these results demonstrate that the Auto-NDVIcb algorithm reliably delivers the annual maximum NDVI and displays better consistency over distinct years.
The superiority of the Auto-NDVIcb algorithm
In this paper, a new algorithm for automatically producing the annual maximum NDVI of China coal bases for GEE (Auto-NDVIcb) was proposed. The problem of the mosaic line caused by the absence of images with maximum NDVI and inconsistency in the acquiring time of adjacent images was solved. The Auto-NDVIcb shows the following superiorities.
-
(1)
Auto-NDVIcb is only based on Landsat as the data source and does not require data from other satellite sensors (e.g., MODIS) to fuse and interpolate the missing NDVI data with different spatial resolutions. Sensors such as MODIS or AHI provide data that are suitable for spatially homogeneous vegetation over a range of several hundred meters. However, few scenarios meet this vegetation requirement, especially in mining areas. In this paper, we would like to provide an annual maximum NDVI with higher resolution and data quality to facilitate a more accurate analysis of the ecological changes in the mining area. Sensor fusion may bring some new problems. The NDVI calculated by different sensors is inconsistent due to spatial49, temporal50, radiometric, and spectral factors. The single data source reduces the systematic bias caused by heterogeneous sensors as well as the loss of data detail information caused by low resolution50,51. The Auto-NDVIcb algorithm proposed in this paper avoids these problems by using a single sensor and finishing the task of generating annual maximum NDVI with satisfying accuracy.
-
(2)
Auto-NDVIcb can be automatically and completely performed in GEE, including the image filtering, cloud removal, NDVI calculation, selection of the reference NDVI image, construction of the calibration model, batch calibration, and the last fusion to generate annual maximum NDVI. While the calibration of a single image can be achieved using software like The Environment for Visualizing Images (ENVI) or ERDAS IMAGINE, it becomes cumbersome when dealing with study areas covered by multiple satellite scenes and multiple time-phase images. For instance, the YG coal base required data from at least 7 scenes, totalling around 168 images per year for a single coal base. When the research was extended to 14 coal bases and carried out for more than ten years, it was impractical to download the images, then filter the reference NDVI images from many images, and calibrate a large number of NDVI images.
Future of the Auto-NDVIcb algorithm
While the Auto-NDVIcb algorithm has achieved satisfactory accuracy in producing the annual maximum NDVI of the 14 coal bases in China, it still has some limitations to address in the future.
First, the algorithm was based on Toblerâs First Law of Geography and the consistency of the cumulative distribution probability function of similar features. However, itâs important to recognize that features from different scenes are not entirely the same. Consequently, the accuracy exhibited variations with different coal bases. The algorithm performs well when there is a high level of land cover consistency or when the coal base is relatively small. As the distance between the reference NDVI image and the to-be-calibrated NDVI image increases, the accuracy of the algorithm decreases. The accuracy with image distance >4 cannot be confirmed. While this study demonstrated the effectiveness of Auto-NDVIcb at the coal-base scale, further research is needed at a larger spatial scale to explore the improvement of Auto-NDVIcb in producing the NDVI across the whole of China and even the world.
Second, the accuracy of the annual maximum NDVI was closely related to the quality of the reference NDVI image. In an extreme case, the accuracy of the annual maximum NDVI may not be guaranteed when Landsat images are missing so severely that there is not an available image for the entire study area during periods of vigorous vegetation. Although such a case did not occur within the 14 coal bases examined in the past ten years, it remains a theoretical possibility when applying the algorithm to other regions.
Code availability
The GEE codes used to produce the annual maximum NDVI dataset is available to the public at https://code.earthengine.google.com/124ae202e3241a9523c0951ae6e4353f?noload=true.
References
Mudd, G. M. A Comprehensive dataset for Australian mine production 1799 to 2021. Sci Data 10, 391 (2023).
Maus, V. et al. An update on global mining land use. Sci Data 9, 433 (2022).
Maus, V. et al. A global-scale data set of mining areas. Sci Data 7, 289 (2020).
Jasansky, S., Lieber, M., Giljum, S. & Maus, V. An open database on global coal and metal mine production. Sci Data 10, 52 (2023).
Rouse, J. W., Haas, R. W., Schnell, J. A., Deering, D. W. & Harlan, J. C. Monitoring the vernal advancement and retrogradation (green wave effect) of natural vegetation. Report No. 75N28492 (NASA/GSFCT Type Final Report, 1974).
Piao, S. et al. Evidence for a weakening relationship between interannual temperature variability and northern vegetation activity. Nat Commun 5, 5018 (2014).
Hashimoto, H. et al. New generation geostationary satellite observations support seasonality in greenness of the Amazon evergreen forests. Nat Commun 12, 684 (2021).
Gao, M. et al. Divergent changes in the elevational gradient of vegetation activities over the last 30 years. Nat Commun 10, 2970 (2019).
Heck, E. V., de Beurs, K. M., Owsley, B. C. & Henebry, G. M. Evaluation of the MODIS collections 5 and 6 for change analysis of vegetation and land surface temperature dynamics in North and South America. ISPRS J. Photogramm 156, 121â134 (2019).
Peng, S. et al. Asymmetric effects of daytime and night-time warming on Northern Hemisphere vegetation. Nature 501, 88â92 (2013).
Morton, D. C. et al. Dry-season greening of Amazon forests. Nature 531, 221â224 (2016).
Yang, Z., Shen, Y., Li, J., Jiang, H. & Zhao, L. Unsupervised monitoring of vegetation in a surface coal mining region based on NDVI time series. Environmental Science and Pollution Research 29, 26539â26548 (2021).
Yang, Z. et al. Identification of the disturbance and trajectory types in mining areas using multitemporal remote sensing images. Science of The Total Environment 644, 916â927 (2018).
Li, X., Lei, S., Cheng, W., Liu, F. & Wang, W. Spatio-temporal dynamics of vegetation in Jungar Banner of China during 2000â2017. Journal of Arid Land 11, 837â854 (2019).
Beck, H. E. et al. Global evaluation of four AVHRRâNDVI data sets: Intercomparison and assessment against Landsat imagery. Remote Sens Environ 115, 2547â2563 (2011).
Fensholt, R., Rasmussen, K., Nielsen, T. T. & Mbow, C. Evaluation of earth observation based long term vegetation trends â Intercomparing NDVI time series trend analysis consistency of Sahel from AVHRR GIMMS, Terra MODIS and SPOT VGT data. Remote Sens Environ 113, 1886â1898 (2009).
Gallo, K., Ji, L., Reed, B., Eidenshink, J. & Dwyer, J. Multi-platform comparisons of MODIS and AVHRR normalized difference vegetation index data. Remote Sens Environ 99, 221â231 (2005).
Liu, Y. et al. Evaluation of consistency among three NDVI products applied to High Mountain Asia in 2000â2015. Remote Sens Environ 269, 112821 (2022).
Toté, C. et al. Evaluation of the SPOT/VEGETATION Collection 3 reprocessed dataset: Surface reflectances and NDVI. Remote Sens Environ 201, 219â233 (2017).
Shabanov, N., Vargas, M., Miura, T., Sei, A. & Danial, A. Evaluation of the performance of Suomi NPP VIIRS top of canopy vegetation indices over AERONET sites. Remote Sens Environ 162, 29â44 (2015).
van Leeuwen, W. J. D., Orr, B. J., Marsh, S. E. & Herrmann, S. M. Multi-sensor NDVI data continuity: Uncertainties and implications for vegetation monitoring applications. Remote Sens Environ 100, 67â81 (2006).
Ke, Y. et al. Suaeda salsa spectral index for Suaeda salsa mapping and fractional cover estimation in intertidal wetlands. ISPRS J. Photogramm 207, 104â121 (2024).
Zhu, X. & Liu, D. Improving forest aboveground biomass estimation using seasonal Landsat NDVI time-series. ISPRS J. Photogramm 102, 222â231 (2015).
Zarco-Tejada, P. J., Hornero, A., Hernandez-Clemente, R. & Beck, P. S. A. Understanding the temporal dimension of the red-edge spectral region for forest decline detection using high-resolution hyperspectral and Sentinel-2a imagery. ISPRS J Photogramm Remote Sens 137, 134â148 (2018).
Li, S. et al. GF-series 16m/10days Normalized Difference Vegetation Index product (from 2018 to 2020 across China version 01). China scientific data, http://csdata.org/en/p/567/ (2022).
Karen, C. S., Robert, K. K. & Curtis, E. W. Landsat reveals Chinaâs farmland reserves, but theyâre vanishing fast. Nature 406, 121 (2000).
Feng, L. et al. Concerns about phytoplankton bloom trends in global lakes. Nature 590, 35â47 (2019).
Kong, J. et al. Super resolution of historic Landsat imagery using a dual generative adversarial network (GAN) model with CubeSat constellation imagery for spatially enhanced long-term vegetation monitoring. ISPRS J. Photogramm 200, 1â23 (2023).
Madonsela, S., Cho, M. A., Ramoelo, A. & Mutanga, O. Remote sensing of species diversity using Landsat 8 spectral variables. ISPRS J. Photogramm 133, 116â127 (2017).
Xu, X. L. A 10m year-by-year NDVI maximum dataset for China. Resource and environmental science data registration and publication system, https://www.resdc.cn/DOI/doi.aspx?DOIid=50 (2022).
Dong, J.W. et al. China 30-metre annual maximum NDVI data set, 2000-2022. National ecological science data centre, http://www.nesdc.org.cn/sdo/detail?id=60f68d757e28174f0e7d8d49 (2021).
Yang, J. L. et al. Divergent shifts in peak photosynthesis timing of temperate and alpine grasslands in China. Remote Sens Environ 233, 111395 (2019).
Ke, Y., Im, J., Lee, J., Gong, H. & Ryu, Y. Characteristics of Landsat 8 OLI-derived NDVI by comparison with multiple satellite sensors and in-situ observations. Remote Sens Environ 164, 298â313 (2015).
Ho, J. C., Michalak, A. M. & Pahlevan, N. Widespread global increase in intense lake phytoplankton blooms since the 1980s. Nature 574, 667â670 (2019).
Mateo-GarcÃa, G., Laparra, V., López-Puigdollers, D. & Gómez-Chova, L. Transferring deep learning models for cloud detection between Landsat-8 and Proba-V. ISPRS J. Photogramm 160, 1â17 (2020).
Sun, L. et al. A cloud shadow detection method combined with cloud height iteration and spectral analysis for Landsat 8 OLI data. ISPRS J. Photogramm 138, 193â207 (2018).
Adrian, J., Sagan, V. & Maimaitijiang, M. Sentinel SAR-optical fusion for crop type mapping using deep learning and Google Earth Engine. ISPRS J. Photogramm 175, 215â235 (2021).
Gorelick, N. et al. Google Earth Engine: Planetary-scale geospatial analysis for everyone. Remote Sens Environ 202, 18â27 (2017).
Dong, J. W. et al. Mapping paddy rice planting area in northeastern Asia with Landsat 8 images, phenology-based algorithm and Google Earth Engine. Remote Sens Environ 185, 142â154 (2016).
Hu, T. et al. Mapping fine-scale human disturbances in a working landscape with Landsat time series on Google Earth Engine. ISPRS J. Photogramm 176, 250â261 (2021).
Chen, Y., Cao, R., Chen, J., Liu, L. & Matsushita, B. A practical approach to reconstruct high-quality Landsat NDVI time-series data by gap filling and the SavitzkyâGolay filter. ISPRS J. Photogramm 180, 174â190 (2021).
Tobler, W. A Computer Movie Simulating Urban Growth in the Detroit Region. Econ Geogr 46, 234â240 (1970).
Landsat 8-9 OLI/TIRS Collection 2 Level-2 Science Products. United States Geological Survey: Earth Resources Observation and Science (EROS) Center, https://doi.org/10.5066/P9OGBGM6 (2020).
Vermote, E., Justice, C., Claverie, M. & Franch, B. Preliminary analysis of the performance of the Landsat 8/OLI land surface reflectance product. Remote Sens Environ 185, 46â56, https://doi.org/10.1016/j.rse.2016.04.008 (2016).
Chen, B., Jin, Y. & Brown, P. Automatic mapping of planting year for tree crops with Landsat satellite time series stacks. ISPRS Journal of Photogrammetry and Remote Sensing 151, 176â188 (2019).
Cao, R. Y., Chen, Y., Chen, J., Zhu, X. L. & Shen, M. G. Thick cloud removal in Landsat images based on autoregression of Landsat time-series data. Remote Sens Environ 249, 112001 (2020).
Li, J., Qin, T. & Zhang, C. The annual maximum NDVI dataset of 14 coal bases in China (2013-2022). figshare, https://doi.org/10.6084/m9.figshare.c.6933322 (2024).
Google Earth Engine, https://developers.google.com/earth-engine/datasets (2022).
Goodin, D. G. & Henebry, G. M. The effect of rescaling on fine spatial resolution NDVI data: A test using multi-resolution aircraft sensor data. International Journal of Remote Sensing 23, 3865â3871 (2010).
Fensholt, R., Sandholt, I., Proud, S. R., Stisen, S. & Rasmussen, M. O. Assessment of MODIS sun-sensor geometry variations effect on observed NDVI using MSG SEVIRI geostationary data. International Journal of Remote Sensing 31, 6163â6187 (2010).
Liu, M. et al. An Improved Flexible Spatiotemporal Data Fusion (IFSDAF) method for producing high spatiotemporal resolution normalized difference vegetation index time series. Remote Sens Environ 227, 74â89 (2019).
Eric, V. NOAA Climate Data Record (CDR) of AVHRR Normalized Difference Vegetation Index (NDVI), Version 5. NOAA National Centers for Environmental Information. https://doi.org/10.7289/V5ZG6QH9 (2023).
National Center for Atmospheric Research Staff (Eds). The Climate Data Guide: NDVI: Normalized Difference Vegetation Index-3rd generation: NASA/GFSC GIMMS, https://climatedataguide.ucar.edu/climate-data/ndvi-normalized-difference-vegetation-index-3rd-generation-nasagfsc-gimms (2023).
PROBA-V LEVEL-3 TOC-NDVI DATA. European Space Agency, https://doi.org/10.5270/PRV-htrh1c8 (2023).
Didan, K. MOD13Q1 MODIS/Terra Vegetation Indices 16-Day L3 Global 250m SIN Grid V006. NASA EOSDIS Land Processes Distributed Active Archive Center, https://doi.org/10.5067/MODIS/MOD13Q1.006 (2023).
Didan, K. MOD13A1 MODIS/Terra Vegetation Indices 16-Day L3 Global 500m SIN Grid V006. NASA EOSDIS Land Processes Distributed Active Archive Center https://doi.org/10.5067/MODIS/MOD13A1.006 (2023).
Didan, K. MOD13A2 MODIS/Terra Vegetation Indices 16-Day L3 Global 1km SIN Grid V006. NASA EOSDIS Land Processes Distributed Active Archive Center, https://doi.org/10.5067/MODIS/MOD13A2.006 (2023).
Didan, K., Huete, A. MOD13A3 MODIS/Terra Vegetation Indices Monthly L3 Global 1km SIN Grid. NASA LP DAAC, https://doi.org/10.5067/MODIS/MOD13A3.006 (2023).
Didan, K. VIIRS/NPP Vegetation Indices 16-Day L3 Global 500m SIN Grid V001. NASA LP DAAC, https://doi.org/10.5067/VIIRS/VNP13A1.001 (2023).
Xu, X.L. China 30m Annual NDVI Maximum Dataset. Resource and Environmental Science Data Platform, https://doi.org/10.12078/2022030801 (2022).
Acknowledgements
This work was funded by the National Key Research and Development Program of China (grant number 2022YFF1303301), the National Natural Science Foundation of China (grant numbers 42271480, 42371347), and the Fundamental Research Funds for the Central Universities of China (grant number 2023ZKPYDC10).
Author information
Authors and Affiliations
Contributions
Conceptualization: J.L., T.Q., C.Z.; Investigation: J.L., C.Z.; Methodology: C.Z., T.Q.; Data production: T.Q., Y.Z. (Yicong Zhang), Y.Z. (Yaping Zhang); Validation: T.Q., H.S., Y.Y.; Visualization: Y.Z. (Yicong Zhang), Y.Z. (Yaping Zhang); Writingâoriginal draft: J.L., T.Q.; Writingâreview & editing: C.Z. All authors read the manuscript and approved the submitted version.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisherâs note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the articleâs Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the articleâs Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Li, J., Qin, T., Zhang, C. et al. Automated generation of consistent annual maximum NDVI on coal bases with a new algorithm. Sci Data 11, 689 (2024). https://doi.org/10.1038/s41597-024-03543-2
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41597-024-03543-2