Improved Mapping of Regional Forest Heights by Combining Denoise and LightGBM Method

Sang, Mengting; Xiao, Hai; Jin, Zhili; He, Junchen; Wang, Nan; Wang, Wei

doi:10.3390/rs15235436

Open AccessArticle

Improved Mapping of Regional Forest Heights by Combining Denoise and LightGBM Method

by

Mengting Sang

¹,

Hai Xiao

²,

Zhili Jin

¹,

Junchen He

¹

,

Nan Wang

¹

and

Wei Wang

^1,*

¹

School of Geosciences and Info-Physics, Central South University, Changsha 410083, China

²

The Second Surveying and Mapping Institute of Hunan Province, Key Laboratory of Natural Resources Monitoring and Supervision in Southern Hilly Region, Ministry of Natural Resources, Changsha 410009, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2023, 15(23), 5436; https://doi.org/10.3390/rs15235436

Submission received: 10 October 2023 / Revised: 10 November 2023 / Accepted: 16 November 2023 / Published: 21 November 2023

(This article belongs to the Special Issue LiDAR Remote Sensing for Forest Mapping)

Download

Browse Figures

Versions Notes

Abstract

:

Currently, the integration of satellite-based LiDAR (ICESat-2) and continuous remote sensing imagery has been extensively applied to mapping forest canopy height over large areas. A considerable fraction of low-quality photons exists in ICESAT-2/ATL08 products, which restricts the performance of regional canopy height estimation. To solve these problems, a Local Noise Removal-Light Gradient Boosting Machine (LNR-LGB) method was proposed in this study, which efficiently filtered the unreliable canopy photons in ATL08, constructed an extrapolation model by combining multiple remote sensing data, and finally mapped the 30 m forest canopy height of Hunan Province in 2020. To verify the feasibility of this method, the canopy parameters were also filtered based on ATL08 product attributes (traditional method), and the accuracy of the two models was compared using the 10-fold cross-validation. The conclusions were as follows: (1) compared with the traditional model, the overall accuracy of the LNR-LGB model was approximately doubled, in which R² increased from 0.46 to 0.65 and RMSE decreased from 6.11 m to 3.48 m; (2) the forest height in Hunan Province ranged from 2.53 to 50.79 m with an average value of 18.34 m. The LNR-LGB method will provide a new concept for achieving high-accuracy mapping of regional forest height.

Keywords:

forest canopy height; ICESat-2; Landsat-8; Sentinel-2; LightGBM

Graphical Abstract

1. Introduction

As the primary terrestrial ecosystem, forests play a vital role in dominating the global carbon cycle and regulating climate change [1,2]. Driven by the global dual carbon goal [3,4], the accurate accounting of the spatial distribution patterns of forest carbon stocks and dynamic change is an important basis for gaining insights into the carbon cycle mechanism [5,6]. Forest canopy height is defined as the distance from the top of the tree canopy to the ground; it is also an impressive indicator for estimating forest carbon stocks [7,8], analyzing canopy structural complexity [9], evaluating forest productivity [10], and tracking biodiversity [11]. As critical forest vertical structure attributes, accurate estimates of large-scale forest canopy height provide an essential source to quantify forest above-ground biomass (AGB) [7,12]. Consequently, forest canopy height has been the priority of forestry survey missions [13], and the mapping of multi-temporal high-resolution forest height and the analysis of its spatial and temporal variation are important for forest ecosystem management and policy-making [12,14].

At present, evident deficiencies in the global-scale forest height products remain in spatial and temporal resolutions [15], thus how to accurately quantify the regional forest canopy height has become a primary target. Firstly, traditional forest survey methods can provide high-accuracy forest height information; however, this method is time-consuming and laborious, which is impractical for achieving spatial mapping of forest canopy height at large scales [16]. However, remote sensing technology has enabled large-area continuous Earth observation [17] and is being increasingly applied to extensive forest canopy height mapping [18]. In particular, passive optical sensors can provide spectral feature information of a large-scale continuous surface and combine with field measurements to build a regression estimation model [19], or use the stereo imagery pair technique [20,21] to obtain the corresponding digital surface model and achieve the accurate extraction of continuous canopy height. However, passive optical imagery cannot directly acquire canopy vertical structural information [22]; its signal is easily saturated for dense forest areas and its estimation accuracy is generally low [23,24]. Compared with visible wavelengths, microwave band may offer more advantages in penetrating canopy [25], where the backscatter coefficient of Synthetic Aperture Radar (SAR) [26], and the phase center height of InSAR [27], to some extent, can reflect the vertical structure information of canopy, and thus create the connection between it and forest canopy height [28,29]. However, the microwave radar data are affected by the topographic undulations and suffer from low penetration and easy saturation [28,30], which are insufficient to solve the challenge of large-area canopy height estimation. Among multiple remote sensing platforms, Light Detection and Ranging (LiDAR) technology achieves the direct measurement of forest vertical structures, including canopy height, through laser ranging [31,32], and it is acknowledged as the optimal instrument for mapping large-area forest canopy height. With the development of near-surface LiDAR platforms, terrestrial laser scanning (TLS) [16] and airborne laser scanning (ALS) [33,34], which have been widely used in forestry surveys, can achieve precise measurements of canopy height from individual to standing forest level. Due to the high flight cost and limited spatial coverage of near-ground LiDAR [32], the continuous monitoring of large areas cannot be achieved, while satellite-borne LiDAR can achieve continuous Earth observation using the satellite platform onboard, which in turn can solve the above problems.

In 2018, NASA successfully launched the Ice, Cloud and Land Elevation Satellite II (ICESat-2), which carried the Advanced Topographic Laser Altimeter System (ATLAS) with multi-beam photon-counting LiDAR technology [35] to obtain high-precision photon data, which demonstrated promising applications in global forest vertical structure surveys [36,37]. In particular, the Land Vegetation Canopy product of ICESat-2 (ATL08) provides crucial information, such as topographic elevation and the vegetation relative height of each 100 m segment along a track [38,39]. The ATL08 product has good agreement with regional forest canopy heights and holds great potential for the inversion of canopy heights, as validated by a large amount of airborne data [36,40,41,42]. ICESat-2 photon data are at the footprint level and have greatly increased point density compared with ICESat-1, but still cannot achieve continuous observation of canopy heights [43]. Thus, the current solution is to combine spatially continuous remote sensing imagery with canopy attributes from satellite-borne LiDAR, and use non-parametric methods such as machine learning to build estimation models, which have been widely applied to continuously map forest height at large scales [25,29,43,44].

Optical satellites provide long time-series Earth observations, and the spectral reflectance, vegetation index, and texture information extracted from optical remote sensing images are well correlated with surface vegetation structure [29,43,45], which are commonly applied for forest canopy height modeling. Landsat satellites are dedicated to supplying high-resolution, full-coverage Earth observation products, and the analysis-ready data (ARD) [46] are useful for capturing information on vertical structural attributes of surface forests. Zhu and Wang [47] combined ICESat-2 simulated airborne data (MABEL) with Landsat-8 optical imagery to successfully map wall-to-wall forest canopy heights in sample sites of the U.S. Wu and Shi [48] considered the effect of ecological zoning and combined ATLAS data with multiple Landsat spectral variables to construct an ecological zoning RF retrieval model to achieve 30 m resolution forest height estimation in China, and they successfully displayed the great potential of Landsat satellite for forest canopy height observation. The ESA Sentinel satellites consist of an SAR sensor (Sentinel-1) and an optical sensor (Sentinel-2), with the distinct advantages of short revisit periods and a high spatial resolution. Sentinel-2 satellite provides additional red-edge bands, which are susceptible to surface vegetation cover growth [25] and facilitate the accurate estimation of regional canopy height [43,45,49]. Zhang and Liu [50] merged multiple remote sensing data to map boreal forest height in Alaska, and Xi, Xu [51] integrated ICESat-2, Sentinel imagery, and topographic data to develop a canopy height retrieval model for different forest types, and their results illustrated that Sentinel 2 red-edge bands are important for the interpretation of the inversion model. In summary, the use of high-resolution optical imagery for estimating forest canopy height is distinctively advantageous. The Landsat and Sentinel 2 products can be superiorly complementary to provide continuous Earth observations due to the challenges of cloudy conditions, and recently, the free access to the Harmonized Landsat Sentinel-2 (HLS) datasets [52,53] significantly promoted the accurate estimation of surface vegetation height. Due to the geographical complexity of forest structure, the advantages of using multiple sensor data are generally greater than the individual sensor [25,45], hence constructing retrieval models by integrating multisource data can enhance the accuracy of forest height estimation.

Most studies use machine learning methods to build models that have plenty of strengths as non-parametric models, including the ability to ignore linear relationships between variables, the absence of excessive pre-set parameters, and the efficient processing of training sets directly [51,54]. GBDT (Gradient Boosting Decision Tree) is an additive model that is based on the idea of boosting integrated learning [55], using weak classifiers (decision tree) iterative training to obtain the optimal model. The model has the advantages of good training effect, not prone to overfitting, etc. LightGBM (Light Gradient Boosting Machine) is a framework for implementing the GBDT algorithm [56], which supports high-efficiency parallel training and has the advantages of faster speed of training, lower memory consumption, and better accuracy, thus processing massive data quickly. Nevertheless, the accuracy of forest canopy height modeling is still constrained by several factors, including geospatial heterogeneity [48,57], the complexity of forest canopy structure [58], and the inherent errors of satellite-based LiDAR products [59,60]. In particular, improving the accuracy of the ATL08 product in extracting canopy height is a direct and effective approach, yet it has rarely been noticed by most studies in the estimation of tree height. The ICESat-2 official team used the DRAGANN (Differential, Regressive, and Gaussian Adaptive Nearest Neighbor) filter to perform processing on the original geolocation photon ATL03, and provided ground (1), canopy (2), and top canopy (3) classification results [36]. Yet, this algorithm retains quite a few noisy photons due to the self-adaptive search [60], which affects the canopy height extraction. According to the comparison results with the airborne data, the extraction accuracy of ATL08 products is affected by multiple factors such as topographic conditions [60,61], canopy coverage [58,62], beam intensity [61,62], solar background [61], and cloud scattering [63], and the correct filtering of high-quality photons has become the challenge in current research. Currently, the following methods are used to filter unreliable photon fragments [25,59,60,62,64]: (i) choose nighttime; (ii) choose strong light; (iii) exclude high cloud confidence; (iv) exclude high canopy uncertainty; (v) exclude sparsely distributed photon fragments; (vi) set a rational range of canopy heights; (vii) judge whether they are anomalous photons based on the difference from the topographic height of the photons to the reference height. However, high uncertainties remain in the photons retained by the traditional filtering methods [65], which are relatively restricted for improving the estimation accuracy of canopy height [60]. Thus, this study should investigate in depth the mechanism for the removal of the noise of ATL08 photon products through the application of a quantitative approach.

Currently, relatively few studies related to the filtering of ATL08 unreliable photons using quantitative methods are available. This study proposed a Local Noise Removal (LNR) method for ATL08 products, which calculated the statistical features of the neighborhood by considering the spatial distribution characteristics of the photons at the top of the canopy using the KNN search, and realized the effective filtering of low-quality canopy photons depending on the threshold judgment. In this study, Hunan Province was selected as the study area. It has complex topographic conditions, a wide distribution of mountain ranges, high forest coverage, and abundant species, which is the best study area for studying forest structural parameters. Meanwhile, the cloudy and rainy subtropical monsoon climate provides a great challenge to the inversion of forest vertical structure. We developed an estimation model to map high-resolution forest canopy height with the algorithm and witnessed the distinctive advantages of our LNR-LightGBM method in comparison with the traditional filtering results of ATL08. This research was conducted as follows: (1) a denoising method was designed on the basis of neighborhood statistical features to qualitatively and quantitatively filter the ATL08 canopy parameters depending on attributes or the LNR method; (2) an extrapolation model of forest canopy height in Hunan Province was developed using the LightGBM algorithm in combination with multi-source remote sensing data; (3) the accuracy of the two models was compared using the sample cross-validation method and the importance of the variables for height estimation was analyzed; (4) a 30 m forest canopy height map of Hunan Province generated by the LNR-LightGBM model in this study was compared with the other existing forest height products. Overall, the objectives of the study was to explore the feasibility of the LNR method by comparing it with the traditional algorithm and developing a workflow for mapping 30 m forest canopy height in Hunan Province by combining LNR and LightGBM algorithms, which could provide an innovative framework of reference for the subsequent application of forest structure quantification.

2. Data and Methods

2.1. Study Area

Hunan Province is in Southeastern China, with a land area of about 211,800 km², located at longitude 108°47′–114°15′E and latitude 24°38′–30°08′N. Hunan Province is situated in the transition area from the Yunnan–Guizhou Plateau to the hills in the south of the Yangtze River, with various types of landforms, including mountains, basins, hills, and plains. The province is surrounded by mountains in the east, south, and west, with hills dominating the central part and basin plains in the north. The province has a typical subtropical monsoon climate, with hot summers and cold winters, and four distinct seasons of precipitation, with average annual temperatures ranging from 15 °C to 18 °C and annual rainfall ranging from 1300 to 1700 mm. Under the influence of the above natural factors, Hunan Province is endowed with abundant forest plant resources [66], and its forest area accounts for 59.98% of the overall territory (Figure 1), including different types of evergreen broadleaf forest, deciduous broadleaf forest, evergreen needle forest, and a small portion of mixed forests. The region is located in the subtropics, with its complex topography and cloudy and rainy conditions, which brings great uncertainty to the inversion of forest vertical structure parameters [28].

2.2. Data Collection and Processing

2.2.1. ICESat-2 Data

The new generation photon-counting LiDAR ICESat-2/ATLAS emits six laser beams, which are divided into three strong/weak beam groups along the track direction, with the inter-group cross-track distance of about 3.3 km, the intra-group cross-track distance of about 90 m, and a laser footprint diameter of about 17 m [35]. ICESat-2 provides 21 standard products, ATL00~ATL21 (without ATL05), of which ATL08 is the official terrestrial vegetation height product, which is mainly applied for the accurate estimation of regional topographic elevation and canopy height [36,39]. The ATL08 product (V005) between April 2019 and September 2021 was downloaded by visiting the National Snow and Ice Data Center (NSIDC; https://nsidc.org/data/icesat-2, accessed on 20 March 2023). Here, we used h_canopy as the height estimation metric, which can represent 98% of the relative height during the energy return, and RH98 can characterize the top height of the canopy instead of the RH100 due to the high signal-to-noise ratio at the top of canopy [67]. In addition, a number of metrics were extracted to ensure the accuracy of the data, including longitude (longitude), latitude (latitude), best-fit terrain (h_te_best_fit), terrain uncertainty (h_te_uncertainty), canopy uncertainty (h_canopy_uncertainty), land cover classification (segment_landcover), day–night flag (night_flag), cloud confidence (cloud_flag_atm), and flight direction (sc_orient), as explained in Table 1. According to the forest survey in Hunan Province, the anomalies with canopy heights less than 0 m and more than 60 m were removed, and the footprints located outside the study area were excluded. After preliminary filtering, we obtained a total of 592,331 ICESat-2 footprints in Hunan Province; they were further processed and used for subsequent machine learning modeling.

2.2.2. Landsat-8 Data

The HLS project [53] provides consistent surface reflectance (SR) data by coordinating the Landsat8-OLI and the Multi-Spectral Instrument (Sentinel2-MSI) to enable short-period, full-coverage, and high-resolution global terrestrial observations. The HLS project performs a complete set of processing procedures for products from OLI and MSI, including atmospheric correction using LaSRC (Land Surface Reflectance Code), the generation of classification masks for clouds, water, and snow using CFMASK, the normalized correction of BRDF reflectance to obtain NBAR (Nadir BRDF-Adjusted Reflectance), uniformly using the Universal Transverse Mercator (UTM) projection system, spectral bandpass adjustment, and resampling to 30 m pixel size for all bands. By visiting the Earthdata website (https://www.earthdata.nasa.gov/, accessed on 6 April 2023), Landsat-8 surface reflectance products were downloaded for the growing seasons (June to November) from 2020 to 2021, and the image time series were synthesized into the median image using Landsat pixel quality assessment (QA) products to remove clouds from images. Here, surface reflectance was calculated for five spectral bands, namely, blue band (Blue), green band (Green), red band (Red), near infrared band (NIR), and short-wave infrared band (SWIR2) and six vegetation indices: relative vegetation index (RVI), green leaf index (GLI), red–green vegetation index (RGVI), atmospherically resistant vegetation index (ARVI), normalized difference infrared index (NDII), and modified soil adjusted vegetation index (MSAVI). All variables extracted from the Landsat-8 are listed in Table 2 for the subsequent modeling of variable inputs.

2.2.3. Sentinel-2 Data

Given that the red-edge band of Sentinel-2 is more susceptible to surface vegetation, the time series of Sentinel-2 images from the growing seasons of 2020 to 2021 (June to November) were also de-clouded and the median synthesis was completed here. Considering the consistency of HLS products in common bands, only the red-edge bands (Red-Edge1, Red-Edge2, and Red-Edge3) were applied to calculate vegetation indices, including three Red-Edge normalized difference vegetation indices (NDVI58A, NDVI56, NDVI47), the red-edge chlorophyll index (RECI), and the plant senescence reflectance index (PSRI), as explained in Table 2. All images were re-stitched and synthesized, and stored in geographic coordinates for the following data processing.

2.2.4. Ancillary Data

Additionally, three auxiliary data were collected to accomplish continuous forest canopy height mapping, which contained topographic metrics, biophysical features, and climate characteristics. First, topographic metrics were derived from the global corrected SRTM elevation product for vegetated areas [68], which was linearly corrected for systematic overestimation bias that occurred in the original SRTM DEM vegetated topography. The elevation data of the study area were acquired here from the GUO-LAB group (https://www.3decology.org/the-corrected-global-srtm-dem/, accessed on 7 April 2023), with the spatial resolution of probably 0.00023°, and were calculated from the area slope based on the 3 × 3 elevation grid. Next, two vegetation products from the Moderate-resolution Imaging Spectroradiometer (MODIS) sensor were used, which could be utilized for the characterization of biophysical features of the terrestrial forest. The first product, the vegetation continuous field (VCF) product, was derived from the annual vegetation cover percentage product trained from 16 days of surface reflectance data [69]. The study used the 250 m resolution VCF tiles in 2020 and, based on the provided QA (value > 2), then excluded the pixels with cloud cover. The second product was the leaf area index (LAI) product [70], which refers to the green leaf area per unit ground area, which is well correlated with the surface vegetation. Here, the LAI time series image with 500 m resolution in 2020 was downloaded, and the cloud-covered pixels were also removed, like in the VCF. Finally, climatic metrics are the essential determinant of regional forest canopy height distribution patterns [71], especially temperature and precipitation. The 1970–2000 synthetic bioclimatic variables of version 2 in this study are available from the WorldClim (https://www.worldclim.org/data/, accessed on 14 April 2023) [72], from which we obtain BIO1, BIO4, BIO12 BIO18, and BIO19 as climate variables with a spatial resolution of about 1 km. In addition, the GLC_FCS30 global land cover product was downloaded from the CASEarth Data Sharing and Service Portal (https://data.casearth.cn/sdo/detail/5fbc7904819aec1ea2dd7061, accessed on 20 April 2023). This product provides global terrestrial regions with nearly 30 fine cover classification information [73], which contains different forest types.

Table 2. Description of predictor variables for forest canopy height estimation.

Data Type	Variables Name	Description
Topographic Metrics (Corrected SRTM-DEM [68])	DEM	Mean of topographic elevation
Topographic Metrics (Corrected SRTM-DEM [68])	Slope	Slope extracted from DEM
Surface Reflectance (Landsat-8 Data [53])	Blue	0.45~0.51 µm band reflectivity
	Green	0.53~0.59 µm band reflectivity
	Red	0.64~0.67 µm band reflectivity
	NIR	0.85~0.88 µm band reflectivity
	SWIR	2.11~2.29 µm band reflectivity
Vegetation Index (Landsat-8 Data [53])	RVI	NIR/Red
	GLI	(2 × Green − Red − Blue)/(2 × Green + Red + Blue)
	RGVI	(Red − Green)/(Red + Green)
	ARVI	(NIR – 2 × Red + Blue)/(NIR + 2 × Red − Blue)
	NDII	(NIR − SWIR)/(NIR + SWIR)
	MSAVI	(2 × NIR + 1 − $\sqrt{{(2 \times NIR + 1)}^{2} - 8 \times (NIR - Red)}$ )/2
Red-edge Vegetation Index (Sentinel-2 Data [53])	NDVI58a	(NIR2 − RE1)/(NIR2 + RE1)
	NDVI56	(RE2– RE1)/(RE2 + RE1)
	NDVI47	(RE3 − Red)/(RE3 + Red)
	RECI	RE3/RE1 − 1
	PSRI	(Red − Green)/RE2
Biophysical Features (MODIS Product [69,70])	VCF	Percentage of vegetation cover
Biophysical Features (MODIS Product [69,70])	LAI	Leaf area index
Climatic Metrics (WorldClim Data [71])	AMT	Annual mean temperature
	TQ	Temperature seasonality
	PQ	Precipitation seasonality
	PHQ	Precipitation of hottest quarter
	PCQ	Precipitation of coldest quarter

Note: RE1, RE2, RE3, NIR2 corresponds to Red-Edge 1 (B05), Red-Edge 2 (B06), Red-Edge 3 (B07), and Narrow NIR (B08A) of Sentinel-2, respectively.

In conclusion, a total of 25 predictor variables were extracted and are explained in Table 2. All data, including ICESat-2/ATL08, topographic data, surface reflectance, vegetation index, biophysical data, and climate data, were projected into the Universal Transverse Mercator (UTM) system and redivided into a 0.00027° × 0.00027° (30 m resolution) grid. During the merging process, we specifically paid attention to null masks, data format conversion, and resolution compatibility to ensure the quality of the merged dataset, which will be used for subsequent modeling processing.

2.2.5. Compared Products

To validate the reliability of the mapping product in this study, two existing forest canopy height datasets were used here. Potapov et al. [57] of the Global Land Analysis and Discovery (GLAD) team developed the 30 m global forest canopy height product of 2019, which was generated by combining the Global Ecosystem Dynamics Investigation (GEDI) 95% relative height with the Landsat ARD multi-temporal phenology metric to construct a forest height estimation model. The height product was validated to be in good agreement with the reserved footprint data: R² = 0.62; RMSE = 6.6 m. Secondly, the team of Liu et al. [59] at the Institute of Botany, Chinese Academy of Sciences, released the 2019 forest canopy height distribution map by developing a new neural network guided interpolation (NNGI) method to integrate ICESat-2 data, GEDI data, Sentinel-2 images, topographic data, and meteorological data to achieve the accurate 30 m forest canopy height of China. Through the validation of the reserved footprints, airborne data, and field data, the results indicated the NNGI method performed superior to the common spatial interpolation method for canopy height estimation, and the accuracy of the results was better. We downloaded two existing products separately and extracted forest heights for the Hunan Province region, and used the GLC_FCS30 product in non-forest masking for the subsequent accuracy comparison of provincial height products.

2.3. Forest Canopy Height Estimation

To cope with the problem of ATL08 noise interference, this paper proposed a Localized Noise Removal (LNR) method considering the spatial distribution characteristics of canopy photons and filtering ATL08 canopy photons; afterwards, the LightGBM algorithm was used to efficiently train the matched data in order to facilitate the derivation of a canopy height extrapolation model. Combining the above two methods, this study developed an LNR-LGB method framework for ATL08 products, which enabled the fine mapping of forest canopy height in Hunan Province at 30 m resolution. In order to validate the performance of the LNR-LGB method, we filtered the height footprints with reference to the ATL08 attribute information (Traditional method), and compared the accuracy with the results using the LNR-LGB method. First, ICESat-2/ATL08 data and multiple remote sensing data were downloaded and processed; the original canopy photons were filtered qualitatively by ATL08 attributes from the traditional method, and the canopy photons were filtered quantitatively in accordance with the designed LNR method proposed in this study; the workflow for mapping the forest canopy height in Hunan Province was developed using LightGBM, and the model accuracy of the two methods was contrasted by 10-fold cross-validation. Finally, the 30 m forest canopy height map of Hunan Province was generated by employing the learning model. The main steps in the study are given in Figure 2 for understanding.

2.3.1. Local Noise Removal (LNR) Method

The h_canopy parameter provided by the ATL08 product retains considerable noise [36,63], which affects the accurate extraction of forest canopy height. Therefore, the traditional solution practice is to remove low-quality photons [44,48,60] based on product properties or in combination with field topography and thus achieve regional forest height estimation. However, this qualitative filtering method does not form a regular standard, and the retained canopy photons are still full of uncertainties [61,62,63], which causes the current regional forest canopy height estimation models based on ICESat-2 data to be difficult to reach the desired accuracy requirements. Therefore, this study takes into account the spatial distribution features of the top for the native forest canopy and proposes an LNR method based on the adjacent statistical signature. The method assumes that the heights of geographically similar footprints should fluctuate within a certain range, and if the value is too high or low for the region, it is an anomalous footprint. We unfolded all the laser footprints in a two-dimensional plane, implemented the nearest-point search for each point in turn (Figure 3), and determined whether each point was an anomaly by calculating the average height difference of the neighborhood. The specific algorithm flow is as follows:

Search for k nearest points in the two-dimensional plane within the given maximum range d_max with the current point O as the center, and calculate the average value of the absolute difference between the heights of all nearest points and the center point

\bar{Δ h} = \frac{1}{k} \sum_{i = 1}^{k} | h_{i} - h_{o} |

(1)

2.: When the average value of height difference is greater than the given threshold (here, the neighborhood standard deviation is taken), the point is considered as a noise point and removed, otherwise it is kept as a valid point.

\bar{Δ h} - \sqrt{\frac{\sum_{i = 1}^{k} {(h_{i} - \bar{h})}^{2}}{k}} {\begin{matrix} \geq 0, r e t a i n e d \\ < 0, r e m o v e d \end{matrix}

(2)

Given the positive correlation between the slope and the error of ATL08 canopy height extraction, the number of nearest points k is determined by the slope of the centroid. After numerous tests, this filtering method was notably tolerant when k was larger, and the number of valid points retained stabilized when k was larger than 25. Therefore, the LNR method considers 5 as the initial number (minimum number of searches) and divides the number of points positively according to its own slope level, and k does not exceed 25. In addition, the threshold value was set as the standard deviation of all points in the neighborhood, and fewer points were retained in this way when the threshold value decreased. In order to comply with the requirements of high-resolution mapping, the study filtered out about 397,000 ICESat-2 points using a single standard deviation, which meant that about 33% of the low-quality photons were excluded. Through the filtering of the LNR method, we obtained ATL08 height footprints with more homogeneous spatial distribution and used it for subsequent machine learning modeling.

To validate the feasibility of the LNR method, the ATL08 canopy height parameter h_canopy was also filtered qualitatively here. Therefore, two ATL08 photon filtering methods were designed for this study. (1) Using the traditional method: selecting nighttime strong beam data; removing points with cloud_flag_atm ≥ 2; removing points with h_canopy_uncertainty > 20; and removing points with absolute difference between the topographic height and the reference DEM > 50; (2) using the LNR method. All points acquired by both methods were involved in the subsequent machine learning modeling to facilitate a direct comparison of accuracy.

2.3.2. LightGBM

LightGBM is a novel GBDT algorithm, which is shown in Figure 4. Different from XGBoost, LightGBM uses a histogram-based algorithm and leaf-by-leaf growth strategy with maximum depth limitation to improve the training speed and overcome the disadvantages of high space consumption and high time cost [56]. First, LightGBM accelerates the segmentation point-finding process and reduces memory consumption through the histogram algorithm, and then uses a leaf-growth strategy with depth limitation to improve the base learner accuracy and generate decision trees more efficiently. Gradient-based One-Side Sampling (GOSS) is used to reduce the number of samples by focusing on samples with larger gradients, and Exclusive Feature Bundling (EFB) is used to reduce the number of mutually exclusive features under sparse data. Finally, LightGBM supports efficient parallelism by finding the optimal segmentation points on different machines and on different feature sets and then synchronizing the optimal segmentation points among machines to achieve feature parallelism. By optimizing the segmentation point-finding process of the base learner decision tree as well as the tree growth method, LightGBM can effectively reduce the time complexity and space complexity when dealing with large-scale datasets and improve the efficiency of model training [74], as well as eliminate the need for data preprocessing for missing values and type features. In order to generate reliable models, we used the original data for testing, set the number of leaves of different sizes, and analyzed the changes in the cumulative residuals of the model when the number of trees grows sequentially. To prevent the results from random interference, the testing process was looped 100 times. It was finally determined that the model error stabilized when n_estimators was greater than 2000. In addition to this, the two models were tested here using two-thirds, half, and one-third of the matched data, respectively, and the final results showed that the sample size had essentially no effect on the robustness of the model output.

2.3.3. Accuracy Evaluation

Most studies used validation datasets, mainly including ALS data, field data, and reserved satellite-based LiDAR footprints, to display model accuracy. However, they rarely focused on the estimation performance of the model itself. This study used the 10-fold cross-validation method, which can provide an effective estimation of the generalization error of the model, to evaluate the model. The principle is that the training set is randomly split into ten pieces, one of which is selected as the validation set, and the other nine pieces are used as the training set. This is cycled 10 times so as to traverse all the data and calculate the average performance metrics. We evaluated the final model accuracy using the coefficient of determination (R²) (Equation (3)), root mean square error (RMSE) (Equation (4)), and the mean absolute error (MAE) (Equation (5)), respectively.

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(h_{i} - \hat{h_{i}})}^{2}}{\sum_{i = 1}^{n} {(h_{i} - \bar{h_{i}})}^{2}}

(3)

R M S E = \sqrt{\sum_{i = 1}^{n} \frac{{(h_{i} - \hat{h_{i}})}^{2}}{n}}

(4)

M A E = \frac{\sum_{i = 1}^{n} | h_{i} - \hat{h_{i}} |}{n}

(5)

where

h_{i}

is the height observation,

\hat{h_{i}}

is the height prediction,

\bar{h_{i}}

is the mean of height observation, and n is the length of samples.

3. Results

3.1. Forest Canopy Height Model Accuracy

Selecting and analyzing feature variables from multi-source remote sensing datasets, we divided the 25 input variables into five categories, and tested the model performance variation under the two methods through different combinations in turn, and finally chose seven optimal combinations. The results are shown in Table 3. It can be shown that the different combinations of variables have a certain impact on the model performance, in which the traditional model is more sensitive to different input variables, as the range of accuracy fluctuations is more pronounced. Firstly, topography, reflectance, and vegetation index can be the essential input variables of the model, which determine the overall performance of the model. Secondly, with the variety of input variables, the model error decreases gradually, thus the diversification of variable combinations can help to improve the model accuracy. Finally, we chose all the variables as inputs, which helps to analyze the importance of factors and optimize the estimation model.

The LightGBM algorithm provides an important evaluation of all the predictor variables, and Figure 5 shows the results of ranking the importance of variables in the two extrapolation models. The first model revealed that climatic metrics are the most important explanatory variables, including TQ, AMT, PCQ, PQ, and PHQ, followed by topographic variables, including slope and topographic elevation, and then also VCF and LAI. It adequately illustrates that three major factors, including temperature, precipitation, and topography, can be important indicators for canopy height estimation and are closely related to regional forest canopy height distribution. In addition, PSRI and NIR are more important, which adequately indicates that they are more advantageous for forest canopy height estimation compared with other vegetation indices and surface reflectance. The ranking results of the importance of variables were basically similar for both models after cross-sectional comparison. The second ranking results determined that DEM had the best explanation for the model, followed by VCF, LAI, and climate variables. Similarly, topography, temperature, and precipitation still maintain good interpretability for the model, and VCF and LAI, as a characterization of vegetation physiological properties, also correlate well with forest canopy height. It is noteworthy that the NIR ranking is also high, which indicates that the texture information contained in the NIR band can effectively reflect the structural characteristics of the forest. PSRI, as the red-edge vegetation index of Sentinel-2, explained the model significantly better than other vegetation indices, in addition to GLI and MSAVI. However, the contribution of ARVI, visible band reflectance (Blue, Green, and Red), and red-edge vegetation index (NDVI58A, NDVI56, and NDVI47) to the model was relatively low, which may be due to the visible channel and some red-edge band saturated in dense forest areas, which contributed directly to the reduced susceptibility to forest structure.

After determining the hyperparameters of the model, we constructed regression models for the data retained by the two methods separately and used 10-fold cross-validation to assess the accuracy of the final model, and the fitting results of the two models as well as the accuracy indexes are given in Figure 6. Firstly, the amount of data for the two models is basically close to each other, and using the LNR method can obviously improve the model fitting accuracy compared with the traditional method. Firstly, the direct validation results show that the estimation accuracy of the two models for the unbiasedness of the original data is basically close, which proves the superiority of the LightGBM fitting algorithm; secondly, the 10-fold cross-validation results show that by comparing the statistical indexes, it can be concluded that the R² is increased from 0.46 to 0.65, the RMSE is reduced from 6.11 m to 3.48 m, and the MAE is reduced from 4.48 m to 2.66 m, and the overall accuracy is about doubled. Secondly, it can be found from the figure that LNR mainly excludes more local extremes with high uncertainty (especially greater than 40 m) than the traditional method, which tries to avoid the interference of some outliers. The vegetation height measured by ATL08 is the discrepancy between the height of the top of the canopy and the height of the terrain, and when the terrain conditions are complex or the canopy is densely covered, the actual forest vertical structure is difficult to reproduce resulting in biased anomalies. Such local outliers will interfere with the model training and fitting process and reduce the model performance. Thus, different from the direct filtering method based on attributes, the footprint distribution retained by LNR by calculating the statistical features of the neighborhood is more homogeneous, which helps to reduce the model generalization error, which leads to the variability between the accuracy of the fitting models. Based on comparisons with the final mapping product, most of the excluded footprints can be confirmed as low-quality noise. Finally, both models showed an overestimation of low values (<5 m) and an underestimation of high values (>30 m), which may be due to the fact that the LightGBM algorithm outputs an average result of the integrated learner.

3.2. Spatial Distribution of Canopy Height

The continuous spatial feature information and discrete satellite-based LiDAR canopy data were integrated to achieve large area high-resolution mapping of forest canopy height from wall to wall. The canopy height map with 30 m resolution was generated for Hunan Province in 2020 using the LNR-LGB algorithm (Figure 7), and non-forest areas were masked according to the GLC_FCS30 coverage product. Generally, the forest canopy height in Hunan Province ranges from 2.53 to 50.79 m, with an average value of 18.34 m and a standard deviation of 5.26 m. Spatially, the forest canopy height values are mainly distributed among the steep mountain areas in the northwest, southwest, and east of the province, while the sparse vegetation height on gentle terrain is relatively low. Figure 7a,b gives detailed maps of different regions: (a) is the mountainous region of northwest Hunan, mainly for evergreen broad-leaved forests, with forest heights basically higher than 15 m and a more homogeneous and continuous spatial distribution; (b) is the mountainous region of southeast Hunan, with complex forest types, including broad-leaved, coniferous, and mixed forests, ranging from 10 m to 25 m in height, with the spatial distribution of heights closely correlated with topographic conditions.

Differences in forest canopy height distributions across vegetation types were also preliminarily explored, using violin plots to visualize the distributions (Figure 8). Such plots combine the advantages of a box plot and kernel density plot, showing the probability of distribution of the data by calculating the kernel density while directly displaying the quantile. Considering the effect of quantitative differences on the results, only three main forest types were calculated here. First, the height values of evergreen broadleaf forest are slightly higher than those of the other two types, with a more upward distribution; the distribution of values of deciduous broadleaf forest and evergreen coniferous forest are basically equal, but the height range of deciduous broadleaf forest is narrower and the distribution of values was more concentrated; compared with the other two types, evergreen broadleaf forest shows the widest height range and a relatively discrete distribution. Overall, the variability in the distribution of three vegetation types is in evidence, with the height range roughly between 10 and 25 m. In general, the final estimated forest canopy heights are influenced by multiple factors, including vegetation cover, topographic conditions, and forest type, and the distribution reveals substantial spatial heterogeneity.

3.3. Comparison with Other Height Products

To completely analyze the accuracy of our generated forest canopy height maps, the LightGBM-derived forest heights were synthetically compared with two existing forest canopy height products (Table 4). In terms of methodology, Potapov et al. [57] used the regression tree algorithm, which is similar to the LightGBM algorithm, both of which were trained multiple times by constructing massive decision trees. While Liu et al. [59] used the Neural Network Guided Interpolation (NNGI) method, which is essentially a spatial interpolation method enacted by adjusting the weights of the spatial factors, and which is not consistent with the ensemble arithmetic, the height distribution is close to the normal curve, which distinguishes it from the centralized distribution of the decision results. The NNGI method uses the ALS data as the training data to learn the weights, thus the machine learning algorithms that are used to train the satellite-mounted LIDAR footprints are still the main solution. From the table, the height products of Potapov et al. [57] were well consistent with the reserved footprint data (R² = 0.62, RMSE = 6.60 m) but showed significant underestimation when compared with ALS data, especially for short (<7 m) and tall forests (>21 m). The forest height product of Liu et al. [59] correlated well with three validation datasets (R²: 0.55–0.60; RMSE: 4.88 m–5.32 m), but the NNGI model was trained by drone LiDAR data, which have limited distribution and are prone to neglect the fine features of regional canopy height. Two existing forest height products are basically validated for accuracy by either the reserved footprints or the ALS validation set, but the following problems may exist: firstly, there is high uncertainty of validating the whole model using a single reserved footprints data, which does not take into account the overall characteristics of data; secondly, the use of ALS data can effectively validate the estimation accuracy in local areas; however, the spatial scope is relatively limited. The 10-fold cross-validation approach applied to this study adequately considered all the data involved in the training and accurately quantified the overall error of ICESat-2 for estimating forest height in the study area, with relatively lower model error (RMSE = 3.48 m) in comparison with other products.

At the provincial scale, there is still considerable variation in the value distribution of different forest height products. The values obtained by Potapov et al. [57] range from 3.0 to 37.0 m, those obtained by Liu et al. [59] range from 0.4 to 57.1 m, and those of the present product, with a more concentrated distribution of values, excluding the effect of outliers (more than 30 m), range from 2.5 to 50.8 m. The mean height of the product in this study is 18.34 m, and the estimated forest canopy height is slightly higher compared with those of the two products; the standard deviation of height is shown to be 5.26 m, which is basically close to the standard deviation of height of the other two products. As seen in Figure 9, at the pixel level, the average difference from our product to the two reference products is 3.31 m and 2.40 m, with mean absolute errors of about 4.17 m and 4.11 m, respectively, and standard deviations of errors of 3.73 m and 4.43 m.

Apart from the provincial comparisons, Figure 10 offers a comparison map of localized areas, with subregions 1–3 representing forested areas with different levels of coverage, respectively. In particular, the figure shows that the present product mainly favors the overestimation of the Potapov et al. [57], while the variability of the difference with Liu et al.’s [59] is more pronounced, as shown by the more prominent high values of the low values. Under high cover conditions (e.g., subregion 1), the differentiation of the present product from the two products is even more obvious, especially for Liu et al. [59]. By observing subregions 2–3, the present product is basically higher than the other two products, which illustrates the possibility that the model may be overestimating the height of the sparse forests. Overall, there is some variability between this product and the two open products, which reminds us of the imperative of using field data for model accuracy validation. In addition, the pixel-level correlation coefficients with the two products are 0.59 and 0.35, respectively, describing a superior relationship between the present product and the reference dataset (especially that of Potapov et al. [57]). Notably, the correlation between the two reference products is about 0.20, which implies the great uncertainty that comes with mapping high-resolution forest canopy heights at a large scale. In conclusion, this study generated forest height products in Hunan Province with good accuracy and fine resolution, which are significant references for management and decision-making in subtropical forest ecosystem.

4. Discussion

4.1. Uncertainty of ATL08 Canopy Height Estimation

The ICESat-2 ATL08 product suffers from large uncertainties in the estimation of canopy height on track segments due to factors such as complex topographic conditions and variable forest canopy structure. First, the accuracy of canopy height extraction by ICESat-2 often depends on the determination of the height of understory topography, and the topographic error increases with slope and height [42], especially for complex terrain with slopes > 20° [75]. The photons emitted by ICESat-2 are difficult to adapt to the detection of vegetation structures with different coverage, due to the necessity of acquiring a certain number of ground and canopy photons simultaneously. For sparse vegetation, photons with a low probability of reflecting the canopy are easily exposed to noise near the canopy. For dense vegetation, it is difficult for photons to reach the ground, which severely limits the accuracy of understory topography inversion. Thus, the accuracy of ATL08 assessment for canopy height is strongly dependent on canopy cover. Some research reveals that ATL08 products overestimate the height of shrublands and underestimate the height of dense vegetation [61]; Neuenschwander et al. [62] initially reported that 40–80% coverage is the optimal range for estimating canopy height. In addition, vertical sampling error is a factor affecting the ATL08 canopy height error [36], which is caused by the lower probability of photons reflected from the canopy top compared with that of the probability of reflection from the canopy perimeter. The influence of environmental conditions is often irresistible, and it is particularly significant to remove low-quality photons in order to enhance the accuracy of the ATL08 canopy inversion. The ATL08 algorithm retains a considerable portion of noise photons, which are misclassified into forest canopy or terrain. Currently, obtaining photon data for nighttime intensity can properly avoid the interference of solar background, but the performance is still limited for the noisy photons retained by the algorithm itself. Our proposed LNR removes photons with large height differences as unreliable points according to the volatility of spatially similar canopy photons. The principle of the LNR method proposed in the study is simple and aims to provide a new concept for excluding low-quality photons of ATL08. However, the accuracy of the LNR method filtering has not been sufficiently validated due to the lack of support from large-area ALS data. In addition, ICESat-2 footprints are usually distributed regularly along the trajectory, and the LNR method only considers spatial distance and slope to determine that the search neighborhood is incomplete. Multiple factors, including longitude, latitude, elevation, slope, vegetation index, percent cover, etc., should be considered for spatially similar canopy photons. Integrating multiple factors to determine the appropriate neighborhood extent can reduce the possibility of misclassification. We use the absolute mean of the difference as the determination criterion, which makes the retained footprint distribution more continuous; however, the selection of the threshold value is not sufficiently discussed and will be explored in depth in conjunction with ALS data subsequently. It is worth noting that the ATL08 algorithm has a number of limitations, thus it is important to develop a more reliable photon filtering algorithm for the original ATL03 product [36,76]; however, how to develop and implement an adaptive algorithm for noise elimination is still in the experimental stage.

4.2. Discussion of Model for Canopy Height Extrapolation

By combining ICESat-2 data and multiple predictor variables, the LightGBM algorithm was used to achieve accurate mapping of forest canopy height at a large scale. Compared with other common methods, the LightGBM algorithm performed relatively better in terms of accuracy, and its output was more robust. However, the LightGBM model presents an overestimation of low values (<5 m) and an underestimation of high values (>30 m) (Figure 6), with a more concentrated distribution of predicted values. Since the LightGBM algorithm accelerates the selection of optimal segmentation points by constructing multiple decision trees using the histogram algorithm, the corresponding decision results are finally provided in the form of voting. Since the output results are crowdfunded voting results, its partial prediction data may be closer to the overall average value, and thus the training model appears to replicate prediction biases for certain height ranges. Apart from this, further discussion on the spatial distribution of the LNR-LGB model errors is also conducted here. We plotted the spatial distribution of errors in the provincial areas and gave box-and-line plots of the errors with the two predictors (Figure 11). First, overall, the spatial distribution of errors is not obviously characterized, and the distribution of overestimated or underestimated regions may have some randomness. Second, the LNR-LGB model predicts well with an error range between [–5, 5], an average of 0.05 m, and a standard deviation of 3.94 m. Finally, Figure 11c,d displays the change in error with the increase in factor values: firstly, the fluctuation range of the error increases with the increase in slope, presenting a positive correlation, which shows that the higher the slope, the lower the model accuracy; secondly, the fluctuation range of the error increases, decreases, and then increases again with the increase in vegetation cover, implying that the model has a certain prediction performance for the medium cover (50–70%) with certain advantage. However, the median value of the error was basically around the 0 value, which suggests that it is still challenging to explore the model estimation error.

This integrated decision-like LightGBM may degrade the information accuracy of the original ATL08 product, making the estimated forest height lack detail. Potapov et al. [57] reported that applying the integrated regression tree model at large areas was less effective than the locally calibrated model. Compared with LightGBM models, deep learning (DL) models offer potential in canopy height estimation by building multilayer neural networks for feature learning [25]. However, DL models usually require more lengthy and complex training of the model network and adjusting hyperparameters to reach optimality [77], and the robustness of result accuracy remains to be explored. Jiang et al. [43] tried to combine four basic models and new joint modeling of the prediction results of each model. This stacked algorithm can reduce the overall error and improve the prediction accuracy but increases the time cost, and the feasibility of large-scale mapping has not been verified. Since nonparametric methods were difficult to achieve full utilization of training data during regression, Liu et al. [59] developed the NNGI method to successfully map continuous forest canopy heights. Common spatial interpolation methods did not yield satisfactory results due to the discontinuity of the satellite-based LiDAR footprint. The NNGI method significantly reduced the saturation effect that occurred when mapping forest canopy heights using optical images and regression models by fusing multiple source data and adjusting the interpolation weights of multiple elements, sufficiently improving the accuracy of the mapping product. However, the method uses ALS data as training data to learn the weights, and it is unknown whether the same accuracy can be achieved by training the satellite-based LiDAR footprints. As can be seen, the development of methods for canopy height extrapolation models is still relatively limited. However, regardless of the method used, the analysis and processing of the original satellite-based LiDAR data is the essential prerequisite for improving the accuracy of forest height estimation [60].

4.3. Limitations and Prospects

In this study, a canopy height extrapolation model was developed by using the LNR-LGB method to map the canopy height in Hunan Province at a 30 m resolution. Based on the cross-validation results, the model performed well. However, the height products of this study could not be fully validated due to the lack of ALS data. Based on the above analysis, our height product basically locates between 10 and 30 m, with a more concentrated value distribution. According to the comparison with other height products, our height maps show underestimation in some dense forests and overestimation in some sparse vegetation, causing less pronounced height volatility in some areas. Of course, this phenomenon mainly stems from the error of the ATL08 product and the average nature of LightGBM decision results. This study used 25 predictor variables for the extrapolation model, and the importance ranking results showed that topography, temperature, and precipitation explained the model optimally. However, the climate data resolution is relatively coarse, which affects the fine mapping of forest canopy height. For dense southern forests, optical remote sensing images are prone to saturation [48], which is susceptible to potential estimation errors. At present, microwaves can penetrate the canopy, and for canopy height extrapolation, cross-polarized data from PALSAR-2 (L-band) and Sentinel-1 (C-band) can be considered [29,78], which are more sensitive to the vertical structure of forest than co-polarized data and provide greater advantages in extracting forest canopy height. Due to the complex and diverse forest canopy structures, modeling different forest types separately can achieve fine estimation and effectively improve model performance. In addition, for large-scale forest canopy height modeling, ecological zoning methods can also be applied to reduce the influence of ecosystem types and geospatial heterogeneity, thus reducing the errors of modeling different ecological zones. Research has indicated that ATL08 has certain advantages in measuring the medium-covered vegetation in the temperate cold zone, and the LNR-LGB methodological framework developed in this study can theoretically be better applicable to mid- and high-latitude regions on the basis of real-time adjustments; however, the tropical subtropics, with its complex climatic conditions and dense vegetation cover, is challenging for measuring the forest canopy height, which is also a challenge that needs to be overcome for future studies. Except for ICESat-2 data, GEDI provides a global-scale detection of vegetation structure with a better estimation accuracy than ICESat-2 for low vegetation [61], and its adaptability to different cover conditions is superior. However, the geolocation biases of the GEDI footprint are large and may be as high as 60 m [59], thus affecting the overall accuracy of forest canopy height. Therefore, the applicability of both data for canopy height extraction in different situations must be explored, and appropriate data are required to be selected depending on various application demands. Currently, it is significant to develop an error correction model according to high-accuracy topographic products to acquire finer canopy height estimates. The integration of multi-source satellite remote sensing data, including Landsat-8 and Sentinel, to synergistically detect multi-temporal changes in surface vegetation and thus accelerate and advance the dynamic monitoring of global vegetation structure would be desirable.

5. Conclusions

Large-scale and high-resolution forest canopy height mapping is significant for accurate inventories of forest carbon stock and the assessment of forest health. In this study, we considered the geospatial distribution characteristics of height footprints and proposed an LNR method for ATL08 canopy parameters. Taking Hunan Province, China as the case study, we filtered the original ATL08 footprints by the ATL08 attribute or the LNR method, respectively, and constructed a high-resolution canopy height estimation model using an LightGBM algorithm. Finally, we successfully mapped the 30 m forest canopy height in Hunan Province by the LNR-LGB method. The research results showed the following:

Topography, vegetation cover, temperature, and precipitation could be considered important variables for canopy height estimation;
In contrast to the traditional model, the accuracy of the model was significantly increased by using the LNR-LGB method, in which R² increased from 0.46 to 0.65 and RMSE decreased from 6.11 m to 3.48 m, reducing the error by about 50%;
Based on the LNR-LGB model generating 30 m forest canopy height maps of Hunan Province, the forest height ranged from 2.53 to 50.79 m with the mean value of 18.34 m, and its spatial distribution was closely correlated with the topographic conditions;
Through comparison with two existing forest canopy height products, our model exhibited a lower error, and the accuracy of the output forest height products demonstrated high reliability.

Finally, the LNR-LGB method is utilized to integrate multiple remote sensing data to generate high-resolution canopy height products, which can provide a scientific foundation of data for forest ecological management and carbon-stock monitoring. In the future, we will explore the allometric equation of forest stand biomass by collecting field data to provide a fine estimation of regional carbon stock, which will provide a practicable reference solution for forestry carbon sink monitoring.

Author Contributions

Conceptualization, H.X.; methodology, M.S.; validation, M.S., W.W. and Z.J.; formal analysis, M.S. and N.W.; investigation, J.H.; resources, M.S.; writing—original draft preparation, M.S.; writing—review and editing, M.S. and W.W.; visualization, Z.J.; supervision, W.W. and M.S.; project administration, W.W.; funding acquisition, W.W. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by Department of Natural Resources of Hunan Province (2022) No. 5, the Basic Science-Center Project of National Natural Science Foundation of China (72088101), National Natural Science Foundation of China (42371392), Natural Science Foundation of Hunan Province, China (2023JJ30660), and the Key Program of the National Natural Science Foundation of China (41930108).

Data Availability Statement

The ICESat-2 is be available from the NSIDC’s official (https://nsidc.org/data/icesat-2, accessed on 20 March 2023); the Landsat-8 and Sentinel-2 data is available from (https://www.earthdata.nasa.gov/, accessed on 6 April 2023 ); The DEM data can be obtained from (https://www.3decology.org/the-corrected-global-srtm-dem/, accessed on 7 April 2023); the VCF and LAI products is available from (https://modis.gsfc.nasa.gov/data/dataprod/, accessed on 13 April 2023); the climatic data is available from the WorldClim (https://www.worldclim.org/, accessed on 14 April 2023); the landcover data is available from (https://data.casearth.cn/sdo/detail/5fbc7904819aec1ea2dd7061, accessed on 20 April 2023).

Acknowledgments

We express our sincere gratitude to those who provided support and advice for this article.

Conflicts of Interest

The authors declare no conflict of interest.

References

Pan, Y.; Birdsey, R.A.; Fang, J.; Houghton, R.; Kauppi, P.E.; Kurz, W.A.; Phillips, O.L.; Shvidenko, A.; Lewis, S.L.; Canadell, J.G. A large and persistent carbon sink in the world’s forests. Science 2011, 333, 988–993. [Google Scholar] [CrossRef] [PubMed]
Dong, L.; Tang, S.; Min, M.; Veroustraete, F.; Cheng, J. Aboveground forest biomass based on OLSR and an ANN model integrating LiDAR and optical data in a mountainous region of China. Int. J. Remote Sens. 2019, 40, 6059–6083. [Google Scholar] [CrossRef]
Hao, J.; Gao, F.; Fang, X.; Nong, X.; Zhang, Y.; Hong, F. Multi-factor decomposition and multi-scenario prediction decoupling analysis of China’s carbon emission under dual carbon goal. Sci. Total Environ. 2022, 841, 156788. [Google Scholar] [CrossRef]
Liu, B.; Ma, X.; Guo, J.; Li, H.; Jin, S.; Ma, Y.; Gong, W. Estimating hub-height wind speed based on a machine learning algorithm: Implications for wind energy assessment. Atmos. Chem. and Physics. 2023, 23, 3181–3193. [Google Scholar] [CrossRef]
Peng, S.; Wen, D.; He, N.; Yu, G.; Ma, A.; Wang, Q. Carbon storage in China’s forest ecosystems: Estimation by different integrative methods. Ecol. Evol. 2016, 6, 3129–3145. [Google Scholar] [CrossRef] [PubMed]
Yang, S.; Yang, J.; Shi, S.; Song, S.; Luo, Y.; Du, L. The rising impact of urbanization-caused CO2 emissions on terrestrial vegetation. Eco. Indicators. 2023, 148, 110079. [Google Scholar] [CrossRef]
Lefsky, M.A.; Cohen, W.B.; Harding, D.J.; Parker, G.G.; Acker, S.A.; Gower, S.T. Lidar remote sensing of above-ground biomass in three biomes. Glob. Ecol. Biogeogr. 2002, 11, 393–399. [Google Scholar] [CrossRef]
Asner, G.P.; Powell, G.V.; Mascaro, J.; Knapp, D.E.; Clark, J.K.; Jacobson, J.; Kennedy-Bowdoin, T.; Balaji, A.; Paez-Acosta, G.; Victoria, E. High-resolution forest carbon stocks and emissions in the Amazon. Proc. Natl. Acad. Sci. USA 2010, 107, 16738–16742. [Google Scholar] [CrossRef]
Hill, R.A.; Hinsley, S.A. Airborne lidar for woodland habitat quality monitoring: Exploring the significance of lidar data characteristics when modelling organism-habitat relationships. Remote Sens. 2015, 7, 3446–3466. [Google Scholar] [CrossRef]
Zhang, J.; Nielsen, S.E.; Mao, L.; Chen, S.; Svenning, J.C. Regional and historical factors supplement current climate in shaping global forest canopy height. J. Ecol. 2016, 104, 469–478. [Google Scholar] [CrossRef]
Wang, R.; Gamon, J.A. Remote sensing of terrestrial plant biodiversity. Remote Sens. Environ. 2019, 231, 111218. [Google Scholar] [CrossRef]
Kumar, L.; Mutanga, O. Remote Sensing of Above-Ground Biomass. Remote Sens. 2017, 9, 935. [Google Scholar] [CrossRef]
Zhao, M.; Yang, J.; Zhao, N.; Liu, Y.; Wang, Y.; Wilson, J.P.; Yue, T. Estimation of China’s forest stand biomass carbon sequestration based on the continuous biomass expansion factor model and seven forest inventories from 1977 to 2013. For. Ecol. Manag. 2019, 448, 528–534. [Google Scholar] [CrossRef]
Rodríguez-Veiga, P.; Wheeler, J.; Louis, V.; Tansey, K.; Balzter, H. Quantifying forest biomass carbon stocks from space. Curr. For. Rep. 2017, 3, 1–18. [Google Scholar] [CrossRef]
Duncanson, L.; Armston, J.; Disney, M.; Avitabile, V.; Barbier, N.; Calders, K.; Carter, S.; Chave, J.; Herold, M.; Crowther, T.W. The importance of consistent global forest aboveground biomass product validation. Surv. Geophys. 2019, 40, 979–999. [Google Scholar] [CrossRef]
Wang, Y.; Lehtomäki, M.; Liang, X.; Pyörälä, J.; Kukko, A.; Jaakkola, A.; Liu, J.; Feng, Z.; Chen, R.; Hyyppä, J. Is field-measured tree height as reliable as believed–A comparison study of tree height estimates from field measurement, airborne laser scanning and terrestrial laser scanning in a boreal forest. ISPRS J. Photogramm. Remote Sens. 2019, 147, 132–145. [Google Scholar] [CrossRef]
Zhang, Y.; Wang, W.; He, J.; Jin, Z.; Wang, N. Spatially continuous mapping of hourly ground ozone levels assisted by Himawari-8 short wave radiation products. GISci. Remote Sens. 2023, 60, 2174280. [Google Scholar] [CrossRef]
Pei, Z.; Han, G.; Mao, H.; Chen, C.; Shi, T.; Yang, K.; Ma, X.; Gong, W. Improving quantification of methane point source emissions from imaging spectroscopy. Remote Sens. Environ. 2023, 295, 113652. [Google Scholar] [CrossRef]
Pascual, C.; Garcia-Abril, A.; Cohen, W.B.; Martin-Fernandez, S. Relationship between LiDAR-derived forest canopy height and Landsat images. Int. J. Remote Sens. 2010, 31, 1261–1280. [Google Scholar] [CrossRef]
Ni, W.; Sun, G.; Pang, Y.; Zhang, Z.; Liu, J.; Yang, A.; Wang, Y.; Zhang, D. Mapping three-dimensional structures of forest canopy using UAV stereo imagery: Evaluating impacts of forward overlaps and image resolutions with LiDAR data as reference. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2018, 11, 3578–3589. [Google Scholar] [CrossRef]
Li, C.; Song, J.; Wang, J. New approach to calculating tree height at the regional scale. For. Ecosyst. 2021, 8, 1–19. [Google Scholar] [CrossRef]
McCombs, J.W.; Roberts, S.D.; Evans, D.L. Influence of fusing lidar and multispectral imagery on remotely sensed estimates of stand density and mean tree height in a managed loblolly pine plantation. For. Sci. 2003, 49, 457–466. [Google Scholar]
Su, Y.; Ma, Q.; Guo, Q. Fine-resolution forest tree height estimation across the Sierra Nevada through the integration of spaceborne LiDAR, airborne LiDAR, and optical imagery. Int. J. Digit. Earth 2017, 10, 307–323. [Google Scholar] [CrossRef]
Chopping, M.; Moisen, G.G.; Su, L.; Laliberte, A.; Rango, A.; Martonchik, J.V.; Peters, D.P. Large area mapping of southwestern forest crown cover, canopy height, and biomass using the NASA Multiangle Imaging Spectro-Radiometer. Remote Sens. Environ. 2008, 112, 2051–2063. [Google Scholar] [CrossRef]
Li, W.; Niu, Z.; Shang, R.; Qin, Y.; Wang, L.; Chen, H. High-resolution mapping of forest canopy height using machine learning by coupling ICESat-2 LiDAR with Sentinel-1, Sentinel-2 and Landsat-8 data. Int. J. Appl. Earth Obs. Geoinf. 2020, 92, 102163. [Google Scholar] [CrossRef]
Zhang, Z.; Ni, W.; Sun, G.; Huang, W.; Ranson, K.J.; Cook, B.D.; Guo, Z. Biomass retrieval from L-band polarimetric UAVSAR backscatter and PRISM stereo imagery. Remote Sens. Environ. 2017, 194, 331–346. [Google Scholar] [CrossRef]
Qi, W.; Lee, S.-K.; Hancock, S.; Luthcke, S.; Tang, H.; Armston, J.; Dubayah, R. Improved forest height estimation by fusion of simulated GEDI Lidar data and TanDEM-X InSAR data. Remote Sens. Environ. 2019, 221, 621–634. [Google Scholar] [CrossRef]
Hu, Y.; Xu, X.; Wu, F.; Sun, Z.; Xia, H.; Meng, Q.; Huang, W.; Zhou, H.; Gao, J.; Li, W. Estimating forest stock volume in Hunan Province, China, by integrating in situ plot data, Sentinel-2 images, and linear and machine learning regression models. Remote Sens. 2020, 12, 186. [Google Scholar] [CrossRef]
Huang, W.; Min, W.; Ding, J.; Liu, Y.; Hu, Y.; Ni, W.; Shen, H. Forest height mapping using inventory and multi-source satellite data over Hunan Province in southern China. For. Ecosyst. 2022, 9, 100006. [Google Scholar] [CrossRef]
Sharifi, A.; Amini, J. Forest biomass estimation using synthetic aperture radar polarimetric features. J. Appl. Remote Sens. 2015, 9, 097695. [Google Scholar] [CrossRef]
Drake, J.B.; Dubayah, R.O.; Clark, D.B.; Knox, R.G.; Blair, J.B.; Hofton, M.A.; Chazdon, R.L.; Weishampel, J.F.; Prince, S.D. Estimation of tropical forest structural characteristics using large-footprint lidar. Remote Sens. Environ. 2002, 79, 305–319. [Google Scholar] [CrossRef]
Simard, M.; Pinto, N.; Fisher, J.B.; Baccini, A. Mapping forest canopy height globally with spaceborne lidar. J. Geophys. Res. Biogeosci. 2011, 116. [Google Scholar] [CrossRef]
Wulder, M.A.; White, J.C.; Nelson, R.F.; Næsset, E.; Ørka, H.O.; Coops, N.C.; Hilker, T.; Bater, C.W.; Gobakken, T. Lidar sampling for large-area forest characterization: A review. Remote Sens. Environ. 2012, 121, 196–209. [Google Scholar] [CrossRef]
Almeida, D.d.; Broadbent, E.N.; Zambrano, A.M.A.; Wilkinson, B.E.; Ferreira, M.E.; Chazdon, R.; Meli, P.; Gorgens, E.; Silva, C.A.; Stark, S.C. Monitoring the structure of forest restoration plantations with a drone-lidar system. Int. J. Appl. Earth Obs. Geoinf. 2019, 79, 192–198. [Google Scholar] [CrossRef]
Markus, T.; Neumann, T.; Martino, A.; Abdalati, W.; Brunt, K.; Csatho, B.; Farrell, S.; Fricker, H.; Gardner, A.; Harding, D. The Ice, Cloud, and land Elevation Satellite-2 (ICESat-2): Science requirements, concept, and implementation. Remote Sens. Environ. 2017, 190, 260–273. [Google Scholar] [CrossRef]
Neuenschwander, A.; Pitts, K. The ATL08 land and vegetation product for the ICESat-2 Mission. Remote Sens. Environ. 2019, 221, 247–259. [Google Scholar] [CrossRef]
Xie, D.; Li, G.; Wang, J.; Guo, J.; Jiaqi, M.; Yang, C. An Overview of the Application Prospect of New Laser Altimetry Satellite ICESat-2 in Geoscience. Geomat. Spat. Inf. Technol. 2020, 43, 38–42+45. [Google Scholar]
Magruder, L.A.; Brunt, K.M. Performance Analysis of Airborne Photon-Counting Lidar Data in Preparation for the ICESat-2 Mission. IEEE Trans. Geosci. Remote Sens. 2018, 56, 2911–2918. [Google Scholar] [CrossRef]
Narine, L.L.; Popescu, S.; Neuenschwander, A.; Zhou, T.; Srinivasan, S.; Harbeck, K. Estimating aboveground biomass and forest canopy cover with simulated ICESat-2 data. Remote Sens. Environ. 2019, 224, 1–11. [Google Scholar] [CrossRef]
Neuenschwander, A.L.; Magruder, L.A. Canopy and terrain height retrievals with ICESat-2: A first look. Remote Sens. 2019, 11, 1721. [Google Scholar] [CrossRef]
Narine, L.; Malambo, L.; Popescu, S. Characterizing canopy cover with ICESat-2: A case study of southern forests in Texas and Alabama, USA. Remote Sens. Environ. 2022, 281, 113242. [Google Scholar] [CrossRef]
Mulverhill, C.; Coops, N.C.; Hermosilla, T.; White, J.C.; Wulder, M.A. Evaluating ICESat-2 for monitoring, modeling, and update of large area forest canopy height products. Remote Sens. Environ. 2022, 271, 112919. [Google Scholar] [CrossRef]
Jiang, F.; Zhao, F.; Ma, K.; Li, D.; Sun, H. Mapping the forest canopy height in Northern China by synergizing ICESat-2 with Sentinel-2 using a stacking algorithm. Remote Sens. 2021, 13, 1535. [Google Scholar] [CrossRef]
Zhu, X. Forest Height Retrieval of China with a Resolution of 30 m Using ICESat-2 and GEDI Data. Ph.D. Thesis, Aerospace Information Research Institute, Chinese Academy of Sciences (CAS), Beijing, China, 2021. [Google Scholar]
Potapov, P.; Hansen, M.C.; Kommareddy, I.; Kommareddy, A.; Turubanova, S.; Pickens, A.; Adusei, B.; Tyukavina, A.; Ying, Q. Landsat analysis ready data for global land cover and land cover change mapping. Remote Sens. 2020, 12, 426. [Google Scholar] [CrossRef]
Zhu, X.; Wang, C.; Nie, S.; Pan, F.; Xi, X.; Hu, Z. Mapping forest height using photon-counting LiDAR data and Landsat 8 OLI data: A case study in Virginia and North Carolina, USA. Ecol. Indic. 2020, 114, 106287. [Google Scholar] [CrossRef]
Wu, Z.; Shi, F. Mapping Forest Canopy Height at Large Scales using ICESat-2 and Landsat: An Ecological Zoning Random Forest Approach. IEEE Trans. Geosci. Remote Sens. 2022, 61, 1–16. [Google Scholar] [CrossRef]
Lang, N.; Schindler, K.; Wegner, J.D. Country-wide high-resolution vegetation height mapping with Sentinel-2. Remote Sens. Environ. 2019, 233, 111347. [Google Scholar] [CrossRef]
Nandy, S.; Srinet, R.; Padalia, H. Mapping forest height and aboveground biomass by integrating ICESat-2, Sentinel-1 and Sentinel-2 data using Random Forest algorithm in northwest Himalayan foothills of India. Geophys. Res. Lett. 2021, 48, e2021GL093799. [Google Scholar] [CrossRef]
Zhang, T.; Liu, D. Mapping 30m Boreal Forest Heights Using Landsat and Sentinel Data Calibrated by ICESat-2. Authorea Prepr. 2022. [Google Scholar] [CrossRef]
Xi, Z.; Xu, H.; Xing, Y.; Gong, W.; Chen, G.; Yang, S. Forest canopy height mapping by synergizing icesat-2, sentinel-1, sentinel-2 and topographic information based on machine learning methods. Remote Sens. 2022, 14, 364. [Google Scholar] [CrossRef]
Shang, R.; Zhu, Z. Harmonizing Landsat 8 and Sentinel-2: A time-series-based reflectance adjustment approach. Remote Sens. Environ. 2019, 235, 111439. [Google Scholar] [CrossRef]
Claverie, M.; Ju, J.; Masek, J.G.; Dungan, J.L.; Vermote, E.F.; Roger, J.-C.; Skakun, S.V.; Justice, C. The Harmonized Landsat and Sentinel-2 surface reflectance data set. Remote Sens. Environ. 2018, 219, 145–161. [Google Scholar] [CrossRef]
Liu, T.; Abd-Elrahman, A.; Morton, J.; Wilhelm, V.L. Comparing fully convolutional networks, random forest, support vector machine, and patch-based deep convolutional neural networks for object-based wetland mapping using images from small unmanned aircraft system. GIScience Remote Sens. 2018, 55, 243–264. [Google Scholar] [CrossRef]
Liang, W.; Luo, S.; Zhao, G.; Wu, H. Predicting hard rock pillar stability using GBDT, XGBoost, and LightGBM algorithms. Mathematics 2020, 8, 765. [Google Scholar] [CrossRef]
Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.-Y. Lightgbm: A highly efficient gradient boosting decision tree. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
Potapov, P.; Li, X.; Hernandez-Serna, A.; Tyukavina, A.; Hansen, M.C.; Kommareddy, A.; Pickens, A.; Turubanova, S.; Tang, H.; Silva, C.E. Mapping global forest canopy height through integration of GEDI and Landsat data. Remote Sens. Environ. 2021, 253, 112165. [Google Scholar] [CrossRef]
Malambo, L.; Popescu, S.C. Assessing the agreement of ICESat-2 terrain and canopy height with airborne lidar over US ecozones. Remote Sens. Environ. 2021, 266, 112711. [Google Scholar] [CrossRef]
Liu, X.; Su, Y.; Hu, T.; Yang, Q.; Liu, B.; Deng, Y.; Tang, H.; Tang, Z.; Fang, J.; Guo, Q. Neural network guided interpolation for mapping canopy height of China’s forests by integrating GEDI and ICESat-2 data. Remote Sens. Environ. 2022, 269, 112844. [Google Scholar] [CrossRef]
Pang, S.; Li, G.; Jiang, X.; Chen, Y.; Lu, Y.; Lu, D. Retrieval of forest canopy height in a mountainous region with ICESat-2 ATLAS. For. Ecosyst. 2022, 9, 100046. [Google Scholar] [CrossRef]
Liu, A.; Cheng, X.; Chen, Z. Performance evaluation of GEDI and ICESat-2 laser altimeter data for terrain and canopy height retrievals. Remote Sens. Environ. 2021, 264, 112571. [Google Scholar] [CrossRef]
Neuenschwander, A.; Guenther, E.; White, J.C.; Duncanson, L.; Montesano, P. Validation of ICESat-2 terrain and canopy heights in boreal forests. Remote Sens. Environ. 2020, 251, 112110. [Google Scholar] [CrossRef]
Moudrý, V.; Gdulová, K.; Gábor, L.; Šárovcová, E.; Barták, V.; Leroy, F.; Špatenková, O.; Rocchini, D.; Prošek, J. Effects of environmental conditions on ICESat-2 terrain and canopy heights retrievals in Central European mountains. Remote Sens. Environ. 2022, 279, 113112. [Google Scholar] [CrossRef]
Sun, T.; Qi, J.; Huang, H. Discovering forest height changes based on spaceborne lidar data of ICESat-1 in 2005 and ICESat-2 in 2019: A case study in the Beijing-Tianjin-Hebei region of China. For. Ecosyst. 2020, 7, 1–12. [Google Scholar] [CrossRef]
Dong, J.; Ni, W.; Zhang, Z.; Sun, G. Performance of ICESat-2 ATL08 product on the estimation of forest height by referencing to small footprint LiDAR data. Natl. Remote Sens. Bull 2021, 25, 1294–1307. [Google Scholar] [CrossRef]
Chen, L.-C.; Guan, X.; Li, H.-M.; Wang, Q.-K.; Zhang, W.-D.; Yang, Q.-P.; Wang, S.-L. Spatiotemporal patterns of carbon storage in forest ecosystems in Hunan Province, China. For. Ecol. Manag. 2019, 432, 656–666. [Google Scholar] [CrossRef]
Neuenschwander, A.; Magruder, L.; Guenther, E.; Hancock, S.; Purslow, M. Radiometric assessment of ICESat-2 over vegetated surfaces. Remote Sens. 2022, 14, 787. [Google Scholar] [CrossRef]
Zhao, X.; Su, Y.; Hu, T.; Chen, L.; Gao, S.; Wang, R.; Jin, S.; Guo, Q. A global corrected SRTM DEM product for vegetated areas. Remote Sens. Lett. 2018, 9, 393–402. [Google Scholar] [CrossRef]
Carroll, M.; Townshend, J.; Hansen, M.; DiMiceli, C.; Sohlberg, R.; Wurster, K. MODIS vegetative cover conversion and vegetation continuous fields. Land Remote Sens. Glob. Environ. Chang. 2011, 11, 725–745. [Google Scholar]
Yang, W.Z.; Tan, B.; Huang, D.; Rautiainen, M.; Shabanov, N.V.; Wang, Y.; Privette, J.L.; Huemmrich, K.F.; Fensholt, R.; Sandholt, I.; et al. MODIS leaf area index products: From validation to algorithm improvement. IEEE Trans. Geosci. Remote Sens. 2006, 44, 1885–1898. [Google Scholar] [CrossRef]
Tao, S.; Guo, Q.; Li, C.; Wang, Z.; Fang, J. Global patterns and determinants of forest canopy height. Ecology 2016, 97, 3265–3270. [Google Scholar] [CrossRef]
Fick, S.E.; Hijmans, R.J. WorldClim 2: New 1-km spatial resolution climate surfaces for global land areas. Int. J. Climatol. 2017, 37, 4302–4315. [Google Scholar] [CrossRef]
Zhang, X.; Liu, L.; Chen, X.; Gao, Y.; Xie, S.; Mi, J. GLC_FCS30: Global land-cover product with fine classification system at 30 m using time-series Landsat imagery. Earth Syst. Sci. Data 2021, 13, 2753–2776. [Google Scholar] [CrossRef]
Zhang, Z.; Jung, C. GBDT-MO: Gradient-boosted decision trees for multiple outputs. IEEE Trans. Neural Netw. Learn. Syst. 2020, 32, 3156–3167. [Google Scholar] [CrossRef] [PubMed]
Wang, C.; Zhu, X.; Nie, S.; Xi, X.; Li, D.; Zheng, W.; Chen, S. Ground elevation accuracy verification of ICESat-2 data: A case study in Alaska, USA. Opt. Express 2019, 27, 38168–38179. [Google Scholar] [CrossRef]
Huang, X.; Cheng, F.; Wang, J.L.; Duan, P.; Wang, J.S. Forest Canopy Height Extraction Method Based on ICESat-2/ATLAS Data. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–14. [Google Scholar] [CrossRef]
Zhang, L.P.; Zhang, L.F.; Du, B. Deep Learning for Remote Sensing Data A technical tutorial on the state of the art. Ieee Geosci. Remote Sens. Mag. 2016, 4, 22–40. [Google Scholar] [CrossRef]
Sothe, C.; Gonsamo, A.; Lourenço, R.B.; Kurz, W.A.; Snider, J. Spatially Continuous Mapping of Forest Canopy Height in Canada by Combining GEDI and ICESat-2 with PALSAR and Sentinel. Remote Sens. 2022, 14, 5158. [Google Scholar] [CrossRef]

Figure 1. Map of forest type distribution in Hunan Province, China (land cover data from GLC-FCS 30 product).

Figure 2. Overall process of forest canopy height mapping in Hunan Province by the LNR-LGB method.

Figure 3. Searching diagram for neighborhood points of LNR algorithm.

Figure 4. Schematic diagram of LightGBM algorithm.

Figure 5. Ranking of the importance of variables from the LightGBM algorithm: (a) Traditional method; (b) LNR-LGB method.

Figure 6. Scatter plots between the canopy heights estimated by two methods and the ICESat-2 canopy height: (a,c) Direct validation; (b,d) 10-fold cross-validation.

Figure 7. Forest canopy height distribution map (30 m) in Hunan Province: the northwest region and southeast region are showed respectively in enlarged detail maps (a) and (b).

Figure 8. Violin plot of forest canopy height organized by vegetation type.

Figure 9. Frequency distribution of pixel level differences with two existing height products: (a) and (b) represent the pixel-level difference from Potapov et al [57] and Liu et al [59].

Figure 10. Enlarged plots of errors from three different subregions: the first row (a–c) shows the predicted canopy height; the second row (d–f) shows the deviation from Potapov et al. [57]; and the third row (g–i) shows the deviation from Liu et al. [59].

Figure 11. Results of spatial analysis of LNR-LGB model errors: (a) spatial distribution of errors in Hunan Province; (b) histogram of errors; (c) boxplot of error variations in Slope factor; (d) boxplot of error variations in VCF factor.

Table 1. Extracted attributes of ICESat-2 ATL08 product.

Attribute	Description
Longitude	Center longitude of each 100 m segment
Latitude	Center latitude of each 100 m segment
h_te_best_fit	Best fit terrain height at the regional center of each 100 m segment
h_te_uncertainty	Uncertainty of terrain height estimation
h_canopy	98% relative canopy height of each 100 m segment
h_canopy_uncertainty	Uncertainty of relative canopy height estimation
segment_landcover	Land cover classification of each 100 m segment
night_flag	Data acquisition time (0 for day and 1 for night)
cloud_flag_atm	Cloud confidence flag of ATL09
sc_orient	Spacecraft flight direction

Table 3. Combination of predictor variables for forest height modeling.

ID	Variable Combination	Traditional Method			LNR-LGB Method
ID	Variable Combination	R²	RMSE (m)	MAE (m)	R²	RMSE (m)	MAE (m)
1	TM; SR; VI; CM	0.41	6.49	4.62	0.57	3.78	3.25
2	TM; RVI; CM	0.43	6.34	4.57	0.56	3.85	3.37
3	TM; SR; VI; BF	0.38	6.84	5.04	0.61	3.68	2.94
4	TM; RVI; BF	0.44	6.30	4.47	0.63	3.57	2.76
5	TM; SR; VI; RVI; CM	0.45	6.28	4.42	0.63	3.56	2.73
6	TM; SR; VI; RVI; BF	0.39	6.34	4.67	0.65	3.52	2.72
7	TM; SR; VI; RVI; BF; CM	0.46	6.11	4.48	0.65	3.48	2.66

Note: TM, SR, VI, RVI, BF, and CM corresponds to Topographic Metrics, Surface Reflectance, Vegetation Index, Red-edge Vegetation Index, Biophysical Features, and Climatic Metrics, respectively.

Table 4. Comparison of different forest height products for Hunan Province.

Product	Coverage	Data Source	Res	Training Method	Accuracy Verification				Forest Height (m)
Product	Coverage	Data Source	Res	Training Method	Validation	Num			Range	Mean	Std
Potapov et al. [57]	Global	Landsat ARD SRTM DEM * GEDI RH95	30 m	Regression tree	Set-aside GEDI	~3 × 10⁶	0.62	6.60 m	[3.0, 37.0]	12.72	4.15
Potapov et al. [57]	Global	Landsat ARD SRTM DEM * GEDI RH95	30 m	Regression tree	ALS data	~10⁶	0.61	9.07 m	[3.0, 37.0]	12.72	4.15
Liu et al. [59]	China	Corrected SRTM Sentinel-2 WorldClim 2.1 * GEDI RH100 * ICESat-2 RH98	30 m	Neural network guided interpolation	Set-aside GEDI	~10⁶	0.55	5.32 m	[0.4, 57.1]	13.52	4.71
					ALS data	65,600	0.58	4.93 m
					Field data	59,780	0.60	4.88 m
This study	Hunan, China	HLS Corrected SRTM MOD44, MOD15 WorldClim 2.1 * ICESat-2 RH98	30 m	Light Gradient Boosting Machine	Training data (10-fold cross validation)	396,989	0.65	3.48 m	[2.5, 50.8]	18.34	5.26

Note: * represents the source of the satellite-based LIDAR data.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Sang, M.; Xiao, H.; Jin, Z.; He, J.; Wang, N.; Wang, W. Improved Mapping of Regional Forest Heights by Combining Denoise and LightGBM Method. Remote Sens. 2023, 15, 5436. https://doi.org/10.3390/rs15235436

AMA Style

Sang M, Xiao H, Jin Z, He J, Wang N, Wang W. Improved Mapping of Regional Forest Heights by Combining Denoise and LightGBM Method. Remote Sensing. 2023; 15(23):5436. https://doi.org/10.3390/rs15235436

Chicago/Turabian Style

Sang, Mengting, Hai Xiao, Zhili Jin, Junchen He, Nan Wang, and Wei Wang. 2023. "Improved Mapping of Regional Forest Heights by Combining Denoise and LightGBM Method" Remote Sensing 15, no. 23: 5436. https://doi.org/10.3390/rs15235436

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Improved Mapping of Regional Forest Heights by Combining Denoise and LightGBM Method

Abstract

1. Introduction

2. Data and Methods

2.1. Study Area

2.2. Data Collection and Processing

2.2.1. ICESat-2 Data

2.2.2. Landsat-8 Data

2.2.3. Sentinel-2 Data

2.2.4. Ancillary Data

2.2.5. Compared Products

2.3. Forest Canopy Height Estimation

2.3.1. Local Noise Removal (LNR) Method

2.3.2. LightGBM

2.3.3. Accuracy Evaluation

3. Results

3.1. Forest Canopy Height Model Accuracy

3.2. Spatial Distribution of Canopy Height

3.3. Comparison with Other Height Products

4. Discussion

4.1. Uncertainty of ATL08 Canopy Height Estimation

4.2. Discussion of Model for Canopy Height Extrapolation

4.3. Limitations and Prospects

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI