Monitoring Canopy Height in the Hainan Tropical Rainforest Using Machine Learning and Multi-Modal Data Fusion

Ling, Qingping; Chen, Yingtan; Feng, Zhongke; Pei, Huiqing; Wang, Cai; Yin, Zhaode; Qiu, Zixuan

doi:10.3390/rs17060966

Open AccessArticle

Monitoring Canopy Height in the Hainan Tropical Rainforest Using Machine Learning and Multi-Modal Data Fusion

by

Qingping Ling

^1,2,†,

Yingtan Chen

^1,2,†,

Zhongke Feng

²

,

Huiqing Pei

³,

Cai Wang

^1,2,

Zhaode Yin

^1,2

and

Zixuan Qiu

^1,2,4,*

¹

National Key Laboratory for Tropical Crop Breeding, School of Breeding and Multiplication (Sanya Institute of Breeding and Multiplication), Hainan University, Sanya 572025, China

²

Intelligent Forestry Key Laboratory of Haikou City, School of Tropical Agriculture and Forestry, Hainan University, Haikou 570228, China

³

Department of Global Agricultural Sciences, Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo 113-8657, Japan

⁴

School of Information and Communication Engineering, Hainan University, Haikou 570228, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Remote Sens. 2025, 17(6), 966; https://doi.org/10.3390/rs17060966 (registering DOI)

Submission received: 6 January 2025 / Revised: 12 February 2025 / Accepted: 6 March 2025 / Published: 9 March 2025

(This article belongs to the Special Issue New Methods and Applications in Remote Sensing of Tropical Forests)

Download

Browse Figures

Versions Notes

Abstract

:

Biomass carbon sequestration and sink capacities of tropical rainforests are vital for addressing climate change. However, canopy height must be accurately estimated to determine carbon sink potential and implement effective forest management. Four advanced machine-learning algorithms—random forest (RF), gradient boosting decision tree, convolutional neural network, and backpropagation neural network—were compared in terms of forest canopy height in the Hainan Tropical Rainforest National Park. A total of 140 field survey plots and 315 unmanned aerial vehicle photogrammetry plots, along with multi-modal remote sensing datasets (including GEDI and ICESat-2 satellite-carried LiDAR data, Landsat images, and environmental information) were used to validate forest canopy height from 2003 to 2023. The results showed that RH80 was the optimal choice for the prediction model regarding percentile selection, and the RF algorithm exhibited the optimal performance in terms of accuracy and stability, with R² values of 0.71 and 0.60 for the training and testing sets, respectively, and a relative root mean square error of 21.36%. The RH80 percentile model using the RF algorithm was employed to estimate the forest canopy height distribution in the Hainan Tropical Rainforest National Park from 2003 to 2023, and the canopy heights of five forest types (tropical lowland rainforests, tropical montane cloud forests, tropical seasonal rainforests, tropical montane rainforests, and tropical coniferous forests) were calculated. The study found that from 2003 to 2023, the canopy height in the Hainan Tropical Rainforest National Park showed an overall increasing trend, ranging from 2.95 to 22.02 m. The tropical montane cloud forest had the highest average canopy height, while the tropical seasonal forest exhibited the fastest growth. The findings provide valuable insights for a deeper understanding of the growth dynamics of tropical rainforests.

Keywords:

Hainan Tropical Rainforest National Park; multi-modal remote sensing datasets; canopy height; GEDI; ICESat-2 ATLAS

1. Introduction

The impacts of global climate change and human activities on natural ecosystems are intensifying. Forests, which play a crucial role in the Earth’s carbon cycle and ecological balance, have consequently become key subjects in global ecological research. The forest canopy is a fundamental characteristic of forest structure. Canopy height is not only a critical parameter for measuring aboveground biomass but also fundamental to forest ecosystem research, such as primary productivity, biodiversity, and carbon cycling [1]. Large-scale, high-resolution forest height data are vital for assessing global and regional forest carbon stocks as well as the carbon balance in terrestrial ecosystems [2,3,4]. The rich biodiversity and rapid ecological changes in tropical rainforest regions make accurate estimations of forest canopy height essential for forest resource management, climate change research, and carbon stock monitoring.

Remote-sensing technology is an effective tool for studying forest canopy height. Forest canopy height estimation relies primarily on various remote-sensing technologies, including optical remote sensing, light detection, and ranging (LiDAR) data, and digital elevation models (DEMs). Traditional optical remote sensing has been used for forest height inversion [5,6]. However, due to signal saturation and its inability to obtain vertical structural information of the canopy directly, its estimation accuracy is relatively low [7]. Microwave remote sensing can penetrate the canopy and extract vertical structural parameters. However, microwave signals are easily affected by terrain and suffer from saturation issues, limiting their application in complex forest environments. In contrast, LiDAR can penetrate the forest canopy and accurately capture vertical structural information. As a result, LiDAR has become a core tool in forest height research and is now widely used to measure and model forest canopy height [8,9,10].

Spaceborne LiDAR has distinct advantages for large-scale forest height inversion analysis and mapping [11,12,13,14,15,16]. The ICESat-1 satellite, launched in 2003, was equipped with the world’s first laser altimeter system (GLAS), laying the foundation for global forest height research [16]. However, due to its low laser point density, ICESat-1 had limited spatial resolution and a narrower range of applications. The ICESat-2 satellite, launched in 2018, utilizes photon-counting technology to achieve a higher laser point density and a smaller spot size (with spot intervals as low as 0.7 m), significantly enhancing the resolution of forest height inversion data [17]. The Global Ecosystem Dynamics Investigation (GEDI) system, installed on the International Space Station, employs full-waveform LiDAR technology to provide high-density point cloud data, greatly improving the accuracy of forest height measurements [18,19]. Since the data acquisition times of ICESat-2/ATLAS and GEDI are approximately the same, these two datasets can be integrated in a geographically complementary manner to increase the density of forest height sample points [19], providing unprecedented opportunities for large-area, high-resolution forest height mapping.

Ground-based LiDAR can perform submeter-level high-precision measurements and rapidly capture the 3D structures and spectral information of target objects. It has vegetation penetration, non-destructive capabilities, high density, and high resolution. These characteristics make it highly promising for inversion of vegetation phenotypes, biochemical parameters, and biomass [20,21,22,23,24,25]. In forest canopy height research, ground-based LiDAR is often used to validate and calibrate spaceborne LiDAR data. By comparing the forest canopy heights measured by satellite LiDAR with those obtained by ground LiDAR, the accuracy and reliability of satellite LiDAR in measuring forest canopy heights can be verified.

In recent years, researchers have increasingly combined multisource remote-sensing data with machine-learning algorithms to enhance the accuracy of forest canopy height estimations. Previous studies have demonstrated that integrating LiDAR data with environmental factors (e.g., terrain and climate) can significantly improve canopy height prediction accuracy. Lefsky et al. combined ICESat spaceborne LiDAR data with shuttle radar topography mission (SRTM) terrain data to develop a model for estimating forest canopy height. Their model explained 59–68% of the variance in measured forest canopy height across the study areas, with root mean square errors (RMSEs) ranging from 4.85 to 12.66 m [26]. Using machine-learning algorithms, such as random forest (RF), Pourshamsi et al. successfully estimated forest height by integrating LiDAR and PolSAR data. The model achieved an average R² value of 0.70 and an RMSE of 10 m [27].

Machine-learning algorithms, such as RF, gradient boosting decision trees (GBDT), and deep-learning methods, have also been employed to process remote-sensing data, improving canopy height estimation accuracy. Shah et al. applied convolutional neural network (CNN) algorithms to model training based on Landsat satellite imagery, successfully enhancing forest canopy height estimation accuracy. The predicted mean absolute error was 3.092 m, the mean squared error was 0.8872 m, and the variance was 0.864 m [28]. Stojanova et al. utilized spaceborne LiDAR data and machine-learning algorithms to estimate vegetation height in Slovenian forests, finding the integrated approach significantly outperformed single- and multi-target regression trees [29].

While significant progress has been made in forest canopy height inversion analysis, challenges such as data saturation and low estimation accuracy remain for large-scale, high-resolution forest height estimations. This study builds on previous research by incorporating innovative analysis combinations to address these issues: (1) two types of spaceborne LiDAR data (ICESat-2 and GEDI) were combined to compensate for the limitations of single data sources regarding spatial coverage and accuracy; (2) various environmental factors (e.g., slope, temperature, and precipitation) were integrated to enhance the accuracy and comprehensiveness of the estimation results; (3) vegetation indices (e.g., the normalized difference vegetation index [NDVI]) derived from Landsat imagery were incorporated to improve model accuracy and robustness; (4) tree height data obtained from portable 3D LiDAR scanning was used as a validation dataset to enhance the credibility and interpretability of the model results; and (5) four machine-learning algorithms—RF, backpropagation neural network (BP), CNN, and GBDT—were evaluated to identify the best-performing models for estimating forest canopy height in the Hainan Tropical Rainforest National Park, China.

By integrating multi-modal remote-sensing data and employing machine-learning algorithms, such as RF, GBDT, CNN, and BP, this study aimed to monitor forest canopy height in the Hainan Tropical Rainforest National Park from 2003 to 2023 (the technical roadmap is shown in Figure 1). The main objectives were as follows: (1) to improve the accuracy and reliability of forest canopy height estimations in the Hainan Tropical Rainforest National Park; (2) to identify the most suitable model for this region by comparing various algorithms; and (3) to map the forest canopy height distribution from 2003 to 2023. This study provides technical support for forest resource management, carbon storage monitoring, and climate change research.

2. Materials and Methods

2.1. Study Area

The Hainan Tropical Rainforest National Park is located in the central and southern mountainous regions of Hainan Island, extending from Nanjiao Town in Wanning City in the east to Banqiao Town in Dongfang City in the west, and from Maogan Township in Baoting Li and Miao Autonomous County in the south to Qingsong Township in Baisha Li Autonomous County in the north (18°33′–19°14′N, 108°44′–110°04′E). Situated at the intersection of the north–south thermal boundary and the east–west moisture divide (Figure 2), the region has an average annual temperature of 25 °C and an annual rainfall ranging from 1700 to 2700 mm. The Hainan Tropical Rainforest National Park covers an area of 4269 km² approximately 13% of Hainan’s total land area) and has a forest coverage rate of 95.85%. The park contains more than 95% of the primary forests and over 55% of its natural forests, making it China’s most concentrated, well-preserved, and largest contiguous tropical rainforest.

2.2. Data

2.2.1. Field Data

A 1 × 1 km grid was used to randomly select 140 sample plots within the Hainan Tropical Rainforest National Park (Figure 2) for tropical rainforest biomass field surveys. Due to the varying terrain in the study area, some sample plots were located less than 1 km apart. Each sample plot was 10 × 10 m, representing the surrounding forest conditions. A VLP-16 LiDAR (Velodyne LiDAR Inc., San Jose, CA, USA) was used to scan all 140 sample plots to obtain complete 3D point cloud data, which were preprocessed to ensure data quality. The Lego_Loam algorithm was applied to process the point cloud data and successfully reconstruct the 3D structure of the tropical rainforest within the sample plots. Additionally, in the Ubuntu operating system, the Rviz tool was used to measure tree height and diameter at breast height within the sites.

In addition, drone-based data were collected for the canopy height from 315 sample plots within the study area. A DJI Mavic 3M drone (DJI Innovations, Shenzhen, China) equipped with a multispectral camera was used for data collection. A “grid flight” planning method was implemented, with a flight altitude of 120 m, a speed of 7.9 m/s, a camera tilt angle of 60°, a side overlap of 70%, and a longitudinal overlap of 80%. Real-time kinematic (RTK) positioning was enabled throughout the flight, and the WGS84-UTM-49 N coordinate system was uniformly applied. The total area covered was 1.15 km², and 4786 photographs were captured. After data collection, DJI Terra 3.9.2 software (DJI Innovations, Shenzhen, China) was used to process the drone-captured remote-sensing images and reconstruct digital orthophoto map images, digital surface model (DSM) images, and 3D point cloud data. The DSM images and 3D point cloud data were then used to measure the canopy height of the 315 sample plots.

2.2.2. Spaceborne LiDAR Data

The ICESat-2 satellite, launched in October 2018, is equipped with the ATLAS system, which employs single-photon detection technology to enhance the acquisition of elevation data. The data from ICESat-2 is publicly available through the National Snow and Ice Data Center (NSIDC) at no cost (https://nsidc.org/data/atl08/versions/6; last accessed on 13 November 2024). For this study, canopy height data were extracted from the ATL08 product. To ensure data quality, a series of filtering steps was applied. Weak beam data (atlas_beam_type = weak), daytime observations (night_flag = 0), and data with high uncertainty (b_canopy_uncertainty = 3.4028235 × 10³⁸) were excluded. Additionally, data with a canopy photon ratio below 5% (canopy_m_conf = 0), data affected by clouds or aerosols (cloud_flag_atm ≥ 2), and spots where ground elevation deviated by more than 50 m from the SRTM DEM were removed to minimize potential errors. Only data within forested areas were retained for further analysis. Following this screening process, canopy height values at the RH80, RH85, RH90, and RH95 percentiles were selected for canopy height modeling in the Hainan Tropical Rainforest National Park.

The GEDI satellite, launched by NASA in December 2018, provides high-resolution forest structure data through three lasers that generate eight parallel observation tracks, each emitting 242 pulses per second and covering a 25 m footprint. This study used the second version of the GEDI L2A dataset, updated in April 2021, which is accessible via NASA’s Land Processes Distributed Active Archive Center (LPDAAC) (https://lpdaac.usgs.gov/products/gedi02_av002/; last accessed on 13 November 2024). To maintain data reliability, several filtering criteria were applied. Only data points within the study area were considered, based on longitude and latitude values (lon_lowestmode_a<n> and lat_lowestmode_a<n>). Data with a quality flag of 1 (quality_flag_a<n> = 1) were retained, while those with elevation discrepancies exceeding 50 m from TanDEM-X (|elev_lowestmode-TanDEM-X| > 50 m) were discarded to eliminate cloud-related errors. Additional filtering steps included removing degraded data (degrade_flag = 1), excluding points with sensitivity below 0.9, and ensuring algorithm integrity by retaining only data with ‘rx_algrunflag’ = 1. After applying these criteria, only valid waveforms across all six GEDI processing algorithms were retained. Canopy height values at the RH80, RH85, RH90, and RH95 percentiles were then extracted for canopy height modeling in the Hainan Tropical Rainforest National Park.

A total of 20,162 initial data points were acquired from ICESat-2 and GEDI, of which 9136 remained after filtering.

2.2.3. Terrain Feature Data

In this study, 12.5 m high-resolution DEM data of Hainan Island obtained from the ALOS satellite were downloaded from NASA’s Earth Science Data website (https://search.asf.alaska.edu; last access on 13 November 2024). In ArcGIS, the Slope and Aspect tools were used to process DEM data and derive slope and aspect information for the Hainan Tropical Rainforest National Park. The slope, aspect, and elevation data were resampled to a 30 m resolution using the cubic convolution resampling method to ensure data consistency.

2.2.4. Climate Data

Monthly temperature and precipitation data from 1960–1982 for the study area were downloaded from the WorldClim database (https://www.worldclim.org/; last accessed on 13 November 2024). These data were preprocessed to obtain the annual average temperature and annual average precipitation data for the study period, with a spatial resolution of approximately 4 km. In addition, historical annual average temperature and annual average precipitation data from 1982 to 2023 were obtained from the National Earth System Science Data Center (https://www.geodata.cn; last accessed on 13 November 2024) at a 1 km spatial resolution. The two datasets were adjusted and calibrated to ensure consistency and accuracy. Pixel-level statistical processing was performed using ArcGIS Pro to generate average annual temperature and precipitation data for Hainan Island for five periods: 1960–2023, 1960–2018, 1960–2013, 1960–2008, and 1960–2003. To enhance the details and consistency of the data, the resolution of the five datasets was adjusted to 30 m using cubic convolution resampling.

2.2.5. Vegetation Index Data

Landsat series data (radiometric calibration and atmospheric correction) were downloaded from the Google Earth Engine (GEE) platform (https://code.earthengine.google.com; last access on 13 November 2024). This study obtained Landsat satellite remote-sensing data from 2003 to 2023, including Landsat 5 and 8 OLI data. The median synthesis method was applied when compositing the Landsat data, generating one composite image per year. ENVI 5.6 software was then used to calculate 30 m resolution NDVI values for the study area from 2003 to 2023 based on the synthesized Landsat data.

To minimize the impact of seasonal fluctuations on NDVI values, the median synthesis approach was employed for the time-series data. This method effectively reduces the influence of outliers, such as cloud cover or extreme weather conditions, thereby ensuring more representative NDVI values. By using median synthesis, the stability and reliability of NDVI values were enhanced, mitigating potential biases associated with single-time-phase data. The NDVI calculation formula is given in Equation (1) [30].

N D V I = \frac{N I R - R e d}{N I R + R e d}

(1)

2.2.6. Other Data

In this study, the natural forest classification dataset and the Global 30 m Land-Cover Dynamic Monitoring Product with Fine Classification System (GLC_FCS30D) land-cover classification data were downloaded [31,32]. The forest distribution map of Hainan Tropical Rainforest National Park from 2003 to 2023 was generated by integrating these two datasets and referencing the study by Yang et al. (Figure 3) [33]. Since the forest area in Hainan exhibited minimal change between 2022 and 2023, the GLC_FCS30D data from 2022 were used to represent the forest distribution for 2023.

2.3. Methods

2.3.1. Feature Selection

Feature data related to the target variable was collected prior to data modeling. However, not all features contribute equally to a model’s predictive ability, making feature selection essential. Spearman’s rank correlation coefficient, a non-parametric statistical measure, was used to evaluate the monotonic relationship between two variables. This method is particularly suitable for analyzing non-normally distributed or ordinal data.

In this study, Spearman’s correlation coefficients were calculated between the feature data and the dependent variables (canopy height and biomass). The features were ranked based on the absolute values of their coefficients to identify those with the strongest correlations.

Subsequently, collinearity diagnostics were performed on the ranked features to assess the impact of multicollinearity on model stability and parameter estimation. The variance inflation factor (VIF) was used for feature screening, with higher VIF values indicating more severe multicollinearity among independent variables. Features with a VIF value of less than 10 were retained.

The final selected predictive features were historical average annual temperature, historical average annual precipitation, elevation, slope, aspect, and NDVI.

2.3.2. Model Construction

The BP, RF, CNN, and GBDT algorithms were used to develop a canopy height model for the Hainan Tropical Rainforest National Park. A total of 9136 photon points were divided into a training set (80%) and a testing set (20%) for model development.

The CNN algorithm, known for its powerful feature extraction capability, processes remote-sensing data through convolutional, pooling, and fully connected layers. After iterative parameter tuning, model accuracy stabilized, and further adjustments no longer improved performance. The final model employed a 3 × 1 convolution kernel with 32/64 filters, max-pooling layers, and a 20% dropout rate to prevent overfitting. The fully connected layer contained 128 neurons and was optimized using the Adam optimizer. The model was trained with a batch size of 64, a maximum of 2000 epochs, and a stepwise learning rate adjustment strategy.

The RF algorithm applies ensemble learning techniques to construct decision trees using bootstrap sampling. After multiple rounds of parameter tuning, the model’s accuracy reached its optimal level, and further modifications did not enhance performance. The final model integrated 100 decision trees with a minimum leaf node count of 10, improving the model’s robustness and generalization capabilities.

As a classical feedforward neural network, the BP algorithm comprises one hidden layer and one output layer and is trained using a backpropagation algorithm. After repeated parameter tuning, accuracy gains plateaued, leading to the selection of the final hyperparameters. The hidden layer of the final model included 50 neurons, with 300 iterations, an error threshold of 1 × 10⁻⁹, and a learning rate of 0.001. The input data were normalized for optimal performance.

The GBDT algorithm iteratively optimizes predictions by integrating multiple decision trees, making it well-suited for complex nonlinear relationships and feature interactions. Extensive parameter adjustments were performed, and once accuracy ceased to improve, the final configuration was determined. The final model utilized 800 decision trees with a maximum split count of 10 and a learning rate of 0.1, effectively reducing prediction errors and enhancing model performance.

All algorithms were implemented in MATLAB 2024 (MathWorks Inc., Natick, MA, USA).

2.3.3. Model Evaluation and Validation

Five evaluation metrics were used to assess the model’s performance: coefficient of determination (R²), bias, relative bias, root mean square error (RMSE), and relative root mean square error (RRMSE). These metrics comprehensively reflect the model’s fitting accuracy and predictive capability.

R² measures the extent to which the model explained the variance in the observed data. Bias evaluates the systematic error between predicted and observed values, while relative bias represents the proportion of bias relative to the observed values. RMSE quantifies the average error between predicted and observed values, whereas RRMSE assesses the relative magnitude of these errors by calculating the ratio of RMSE to the observed values. The specific calculation formulas are as follows:

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}},

(2)

R M S E = \sqrt{\sum \frac{{(y_{i} - {\hat{y}}_{i})}^{2}}{n}},

(3)

Bias = \sum \frac{{\hat{y}}_{i} - y_{i}}{n},

(4)

R R M S E = \frac{RMSE}{\bar{y}} \times 100 %,

(5)

R e l a t i v e B i a s = \frac{B i a s}{\bar{y}} \times 100 %,

(6)

where

\bar{y}

represents the average value of the observed data,

y_{i}

represents the observed value,

{\hat{y}}_{i}

represents the corresponding predicted value, and

n

represents the sample size.

3. Results

3.1. Accuracy Validation of the Canopy Height Remote Sensing Estimation Model

During the model development process, four machine learning algorithms underwent multiple rounds of testing and parameter tuning based on their performance in the training and testing stages to enhance the model’s generalization ability and accuracy. Accuracy assessment using the 20% testing dataset served as internal validation, evaluating the model’s performance on unseen data within the same dataset. Accuracy assessment using independent datasets, including 140 LiDAR-scanned plots and 315 UAV-derived canopy height validation points, served as external validation to verify the model’s generalization capability to entirely new data. Table 1 and Table 2 summarize the average accuracy results from these evaluations, reflecting the performance of each algorithm in internal and external validation across different percentiles (RH80, RH85, RH90, and RH95).

From an algorithmic perspective, RF exhibited the best performance among the four algorithms, demonstrating the highest internal and external accuracy. Specifically, RF outperformed the other algorithms in terms of R² values for both the training and testing sets. For the RH80, RH85, RH90, and RH95 percentiles, the R² values of the RF model on the testing set were 0.60, 0.56, 0.59, and 0.60, respectively. These results indicate that the RF model demonstrated strong fitting ability and accuracy. Additionally, RF had the lowest RMSE and RRMSE values on the testing set, particularly for RH80 and RH85, with RMSE values of 3.11 m and 3.31 m and RRMSE values of 21.36% and 21.28%, respectively. These results suggest high accuracy and low error in RF predictions. Although the accuracy of all algorithms decreased in the external validation, RF remained relatively stable in terms of bias and RMSE, particularly at the RH95 percentile, where the bias was 4.74 m and RMSE was 6.24 m. While the errors were higher, RF still outperformed the other algorithms in terms of stability and precision. Therefore, considering its overall performance and stability across different percentiles, RF was deemed the most suitable algorithm.

Compared to RF, CNN and GBDT showed moderate performances in canopy height prediction. At the RH90 and RH95 percentiles, CNN has a testing R² of 0.46 and an RRMSE of 21.34%, while GBDT had a testing R² of 0.49 and an RRMSE of 21.26%. These results were lower than those of RF but still improved compared to those of BP. Prediction errors increased as the percentiles increased. At RH95, both CNN and GBDT showed greater uncertainty. The BP algorithm performed the worst, with the lowest R² values (0.44 at RH80 and 0.39 at RH90) and the highest RRMSE (25.79% at RH80 and 22.56% at RH90), highlighting its limitations in modeling the nonlinear complexity of canopy height. In external validation, CNN and GBDT had R² values ranging from 0.39 to 0.43, with higher RRMSE values than RF, indicating greater prediction errors. BP performed the worst again, with an R² of 0.38 at RH80 and an RRMSE of 39.63%, showing the lowest accuracy and the highest prediction error among the models.

Overall, while CNN and GBDT performed better than BP, they were less accurate and stable than RF, which demonstrated the highest precision and generalization ability in canopy height prediction.

Regarding percentile selection, RH80 was identified as the optimal choice for the prediction model. According to the data in Table 1 and Table 2, RF exhibited the optimal performance at RH80, with a testing set R² of 0.60, RMSE of 3.11 m, and RRMSE of 21.36%, indicating the highest precision at this percentile. As the percentiles increased (e.g., RH85, RH90, and RH95), prediction errors gradually increased, particularly in external validation, where bias and RMSE values increased significantly, indicating greater uncertainty in predictions at higher canopy heights. RH80 typically corresponds to lower canopy heights, which are more accurately estimated in remote-sensing data processing compared to higher canopy heights. Therefore, selecting RH80 reduced errors and enhanced the reliability and stability of the model. Additionally, RH80 is applicable across a wide range of ecological environments and forest types, providing more accurate and generalizable results for canopy height estimation in diverse regions. Hence, considering both accuracy and applicability, RH80 was determined to be the most suitable percentile.

To comprehensively evaluate the performance of different percentiles (RH80, RH85, RH90, and RH95) in predicting canopy height in the Hainan Tropical Rainforest National Park, as well as the effectiveness of various machine-learning models (RF, CNN, GBDT, and BP), scatter plots for all models are provided in the Appendix A.

3.2. Canopy Height Changes in the Hainan Tropical Rainforest National Park from 2003 to 2023

This study employed an RF-based canopy height estimation model for the RH80 percentile to predict canopy heights at 3,332,789 sample points (spaced at 30 m intervals) across Hainan Tropical Rainforest National Park from 2003 to 2023. The resulting 30 m spatial resolution canopy height distribution map is shown in Figure 4.

The results indicated an overall increasing trend in rainforest canopy height during the study period, ranging from 2.95 to 22.02 m. In 2003, canopy heights were generally lower than in later years, reflecting the early stages of ecosystem recovery. By 2013, and especially in 2023, canopy heights had significantly increased, with most areas exceeding 20 m. This change highlights the substantial recovery of the tropical rainforest ecosystem over the past two decades, likely driven by a combination of natural resilience and conservation efforts in Hainan Province.

Further analysis revealed that areas with higher canopy heights were primarily concentrated in mountainous core protection zones with complex terrain and higher elevations, such as the central mountainous regions of various park divisions. These regions exhibited more extensive and contiguous high canopy coverage, suggesting a robust recovery closely linked to strict conservation policies and favorable natural conditions. In contrast, canopy heights in low-elevation and peripheral areas were generally lower and more fragmented, possibly constrained by environmental conditions and human activities. Notably, some areas exhibited relatively limited canopy height growth from 2008 to 2013, potentially due to climatic fluctuations and the pace of ecological recovery.

From Figure 5, it is evident that the canopy height of different forest types exhibited a general increasing trend from 2003 to 2023, reflecting forest growth and recovery. The most significant increase occurred between 2003 and 2013, followed by a slower growth rate after 2013. Among these forest types, tropical lowland rainforests and tropical seasonal forests exhibited the fastest growth, with mean canopy heights increasing from 13.02 m to 14.51 m and from 13.55 m to 15.03 m, respectively, indicating strong recovery potential. In contrast, tropical montane cloud forests and tropical coniferous forests showed the least growth, with mean canopy heights increasing only slightly from 17.10 m to 17.37 m and from 15.04 m to 16.14 m, suggesting that these ecosystems may have reached a stable or mature stage. Tropical montane rainforests showed moderate growth, with the mean canopy height increasing from 16.54 m to 17.25 m, primarily before 2013.

The standard deviation (σ) across all forest types remained relatively stable, indicating that while canopy height increased, the internal height variability did not expand significantly. This suggests that forest recovery was a relatively uniform process rather than a localized surge in canopy height.

Overall, variations in tree height across forest types reflect significant differences in growth conditions and ecological recovery capacities. Canopy height changes from 2003 to 2023 highlight the recovery potential and spatial variability of the rainforest ecosystem, providing critical insights for future conservation and restoration initiatives. A more comprehensive analysis of canopy height dynamics will offer valuable scientific data for guiding targeted and refined protection measures and recovery strategies.

4. Discussion

4.1. Comparative Analysis of Four Machine Learning Algorithms for Forest Canopy Height Estimation

An in-depth exploration of canopy height estimation models for the Hainan Tropical Rainforest National Park, including the performance of four machine-learning algorithms, was conducted.

For internal validation, the RF algorithm achieved an R² of 0.71, with a test set R² of 0.60 (RH80), indicating high fitting accuracy for both the training and test sets. The external validation results also highlighted the robust performance of the RF model, with an R² of 0.45 and an RRMSE of 33.05%. These results demonstrated that RF outperformed other algorithms when predicting new data, aligning with findings from several related studies. Ghosh et al. estimated the canopy height of the Bhitarkanika Mangrove Reserve in India and reported that the RF model achieved an internal validation RMSE of 1.57 m and an R² of 0.60, demonstrating the strong adaptability of RF across different geographic regions and ecosystems [34]. Peng et al. applied the RF algorithm to estimate five types of forest canopy structures using data from 60 tropical forest plots in Hainan Province. They found that the RF algorithm exhibited relatively low RRMSE values (10.60–27.44%), further confirming its reliability for canopy height estimation in tropical rainforests [35].

Although the CNN algorithm exhibited superior performance in terms of external validation bias (bias = 1.06 m, relative bias = 8.16%), indicating its potential in capturing nonlinear relationships and complex patterns, its RRMSE of 36.77% was higher compared to the overall superior performance of the RF algorithm model. CNN is advantageous because of its ability to extract spatial features from high-resolution remote-sensing imagery; however, its adaptability to different forest types has certain limitations. Shah et al. used a CNN algorithm to estimate forest canopy height in the Coconino National Forest region using Landsat images and found that the CNN model performed well in this area, with a mean absolute error of 3.092 m, a mean squared error of 0.8872 m, and a variance of 0.864 m. These results indicated that while the CNN algorithm was effective for canopy height estimation in specific regions, it was less stable than the RF algorithm when applied to different environments and vegetation types [28].

In contrast, the BP algorithm exhibited the lowest performance in this study, with R² values of 0.44 in internal validation and 0.38 in external validation, along with an RRMSE of 39.63, indicating its shortcomings in handling complex nonlinear problems and inability to effectively capture patterns in canopy height changes. The GBDT algorithm performed slightly worse than RF, with an R² of 0.49 and an RRMSE of 24.67% on the test set. However, it still demonstrated some predictive ability, particularly in analyzing canopy height across different environments, where its performance remained relatively stable.

To further validate the superiority of the RF algorithm, we compared the accuracy results of this study with those of other relevant studies. The RF algorithm demonstrated strong adaptability across different locations and vegetation types, which aligns with previous studies, highlighting its robustness in accurately estimating canopy heights in diverse ecological environments. Jin et al. highlighted the transferability of the RF algorithm for estimating canopy height across different locations and vegetation types. They found that the RF model exhibited high accuracy, with R² > 0.6 and RMSE < 6 m, demonstrating the reliability of RF in large-scale canopy height estimation across diverse geographic regions and vegetation types [36]. Fayad et al. used ICESat/GLAS LiDAR waveform data and SRTM DEM to study canopy height in the tropical forests of French Guiana. Their results showed that the RF algorithm provided the highest estimation accuracy, with an RMSE of 3.4 m, outperforming multiple linear regression and principal component analysis [37]. Ghosh et al. applied the RF algorithm in combination with multiple remote-sensing data sources, including GEDI LiDAR, SAR backscatter, terrain, and canopy density, to estimate canopy height in different forest types in India. Their study also demonstrated that RF, when integrated with LiDAR and multisource data, could achieve high prediction accuracy, particularly in areas with complex terrain, further confirming its strong adaptability [38].

Overall, the RF algorithm exhibited excellent performance in this study and has been widely applied in other research. Therefore, its superior accuracy in estimating the canopy height of the Hainan tropical rainforest further validates its broad adaptability for remote-sensing applications.

4.2. Research on Forest Canopy Height Estimation in the Hainan Tropical Rainforest National Park

This study provides new scientific evidence for estimating canopy height and assessing ecological restoration in Hainan’s tropical rainforest through long-term time-series analysis, multi-modal data integration, and the comprehensive application of environmental factors.

The study monitored forest canopy height in the Hainan Tropical Rainforest National Park over 20 years (2003–2023). The results revealed spatial and temporal trends in canopy height and provided valuable historical data for studying tropical rainforest ecological restoration. The findings indicated an overall increasing trend in canopy height from 2003 to 2023. In particular, from 2013 to 2023, following the implementation of a series of ecological protection measures, canopy height increased significantly. These results align with the analysis by Zhong et al. of ecological restoration projects in Hainan Province, highlighting the critical role of ecological protection and natural recovery in forest restoration [39]. The variation in canopy height within tropical forests may be influenced by several factors, including climate conditions, elevation, species characteristics, and the implementation of conservation policies. The canopy height of lowland rainforests shows a stable increasing trend, indicating that under favorable water and heat conditions and strengthened conservation measures, lowland rainforests can maintain high productivity [40]. In contrast, canopy height changes in montane cloud forests remain stable, likely constrained by lower temperatures and limited nutrient availability at higher elevations. The significant canopy height growth observed in seasonal rainforests may be attributed to species adaptability and ecosystem resilience under climate fluctuations. Seasonal rainforests exhibit strong growth potential and carbon sequestration capacity in favorable climates, which aligns with the findings of Chave et al. (2014) [41]. The growth trend in mountain rainforests further supports their strong recovery ability under relatively balanced water and heat conditions at mid-to-high elevations. Conversely, coniferous forests, which occupy relatively small areas, have shown limited canopy height growth, as many of them have reached maturity or over maturity during the survey. This long-term analysis enhances our understanding of the restoration process of canopy height and its relationship with environmental changes.

This study integrated GEDI and ICESat-2 ATLAS spaceborne LiDAR data to overcome the limitations of single data sources in spatial coverage and accuracy. The high-density footprint points from the combined datasets enhanced the resolution of forest canopy height estimation while reducing errors and striping effects commonly observed in traditional remote-sensing methods. The advantage of multi-modal data integration enabled more accurate canopy height estimation in Hainan’s complex tropical rainforest environment, particularly in high-canopy areas. Compared to the global forest canopy height map developed by Potapov et al. using GEDI and Landsat data, this study improved spatial continuity and avoided underestimation in high-canopy regions, achieving greater accuracy and spatial stability [14].

The “Third Geographic Law”, proposed by Zhu et al., states that in areas with similar geographic conditions, forest canopy characteristics (e.g., canopy height) tend to be more similar than in regions determined solely by spatial proximity. This theory provides a new perspective for studying forest canopy height by emphasizing the influence of geographic environment, climate conditions, and vegetation types, beyond spatial distance alone [42]. Environmental factors effectively capture canopy differences under varying geographic conditions, enhancing model generalization and accuracy. Thus, this study incorporated environmental factors such as elevation, slope, aspect, rainfall, temperature, and NDVI to optimize the canopy height estimation model. Future studies may include additional environmental factors to further improve model adaptability

In this study, modeling was performed for different percentiles of the data, with RH80 achieving the optimal balance in internal and external validations. The internal validation RRMSE for RH80 was 21.36%, while the external validation RRMSE was 33.05%. These results indicated that RH80 was highly stable and reliable for predicting overall canopy height changes in the Hainan Tropical Rainforest National Park. The strong performance of RH80 was likely due to its sensitivity to the mid-height canopy, which typically forms the forest’s core structure. This core structure encompasses most of the photosynthetic biomass and plays a crucial role in essential ecological functions, such as maintaining species diversity and facilitating carbon absorption. Asner et al. emphasized that in tropical forests, the mid-height canopy region (similar to RH80) contains most of the functional leaf area index and productive tree species, making RH80 a key variable for large-scale ecological studies [43].

Although RH80 effectively represents overall changes in forest canopy height, higher percentiles (RH90 and RH95) have unique research value in specific contexts. Higher percentiles focus on the extreme canopy height values, particularly the distribution of tall trees, which is essential for studying forest carbon storage, biomass, and structural complexity. In this study, the RF model achieved an internal validation R² of 0.60 and a low RRMSE of 18.06% for RH95, indicating high accuracy in capturing the heights of large individual trees. This finding supports Lefsky et al., who concluded that higher percentiles more accurately reflect the tallest structural features of forest canopies, making them important indicators for studying tropical rainforest carbon storage and biomass. High-percentile metrics capture the vertical structural complexity of forests, which is crucial for understanding ecosystem stability and resilience. Using high-percentile metrics such as RH95 provides a more accurate representation of tree growth trends [44]. Therefore, high-percentile metrics such as RH90 and RH95 are indispensable for studies focusing on tall tree distribution, carbon storage assessment, or ecological extremes. In this study, RH80 was primarily selected for dynamic canopy height prediction due to its balance of accuracy and applicability, and results for RH85, RH90, and RH95 were retained for reference in future studies. Future research could dynamically adjust percentile selection based on ecological contexts to explore the applicability of different percentiles. Additionally, integrating ground sample data with high-resolution remote-sensing data could further validate and optimize percentile selection methods, providing tailored percentile indicators for various research objectives.

This study also enhanced model credibility and stability by incorporating external validation using portable 3D LiDAR data. This validation framework offers a novel solution to the challenge of validating large-scale forest canopy height estimates with low-resolution data. Nonetheless, the limitation of this study lies in the relatively limited spatial coverage of the validation data. Future research could improve canopy height estimation accuracy by collecting data from diverse regions and elevations.

5. Conclusions

This study developed a model to estimate canopy height in the Hainan Tropical Rainforest National Park using multi-modal remote-sensing data and machine-learning algorithms. It accurately estimated canopy height from 2003 to 2023. By integrating GEDI and ICESat-2 ATLAS satellite LiDAR data, the study enhanced spatial accuracy and mitigated signal saturation issues found in traditional optical remote sensing. Environmental factors such as elevation, slope, aspect, temperature, precipitation, and NDVI were also incorporated to optimize the model, revealing trends in canopy height changes over the past two decades.

While the study produced valuable results, it also had some limitations. In areas with low vegetation cover or complex terrain, estimation accuracy was lower. The portable 3D LiDAR data had limited spatial coverage; although it improved validation accuracy, it did not fully represent all regions. Expanding data collection could enhance accuracy. Additionally, the study did not account for geolocation biases in the GEDI and ICESat-2 ATLAS LiDAR footprints.

This study provided a high-accuracy method for estimating canopy height in the Hainan Tropical Rainforest National Park. It demonstrated the potential of combining remote-sensing data with machine learning for monitoring tropical rainforests. Future research should incorporate more validation data and develop improved methods for processing raw GEDI and ICESat-2 ATLAS data to further enhance canopy height estimation accuracy.

Author Contributions

Conceptualization, Z.Q.; methodology, Q.L. and Z.Q.; software, Q.L.; validation, Q.L., Y.C., Z.F. and H.P.; formal analysis, Q.L., Z.Q. and H.P.; investigation, Q.L.; resources, Z.Q.; data curation, Q.L., Y.C., C.W., Z.Y. and Z.F.; writing—original draft preparation, Q.L.; writing—review and editing, Q.L. and Z.Q.; visualization, Q.L. and Y.C.; supervision, Z.Q.; project administration, Z.Q.; funding acquisition, Z.Q. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, grant number 32160364 and the National College Student Innovation and Entrepreneurship Training Program of China, grant number Qhys2023-209.

Data Availability Statement

https://figshare.com/articles/dataset/Canopy_Height_Monitoring_Data_and_Model_of_Hainan_Tropical_Rainforest_National_Park/28138559 (accessed on 5 January 2025).

Acknowledgments

The authors thank those students who assisted with fieldwork and data collection, as well as the instructors for their constructive comments on the improvement of this study.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Figure A1. Internal Validation Scatter Plots of BP Algorithm for RH80, RH85, RH90, and RH95. The deep blue line represents the fitted curve of the data points, while the light blue line is the y = x reference line, which helps evaluate the deviation of the data. (a) Internal Validation Scatter Plot for RH80; (b) Internal Validation Scatter Plot for RH85; (c) Internal Validation Scatter Plot for RH90; (d) Internal Validation Scatter Plot for RH95.

Figure A2. External Validation Scatter Plots of BP Algorithm for RH80, RH85, RH90, and RH95. The deep blue line represents the fitted curve of the data points, while the light blue line is the y = x reference line, which helps evaluate the deviation of the data. (a) External Validation Scatter Plot for RH80; (b) External Validation Scatter Plot for RH85; (c) External Validation Scatter Plot for RH90; (d) External Validation Scatter Plot for RH95.

Figure A3. Internal Validation Scatter Plots of CNN Algorithm for RH80, RH85, RH90, and RH95. The deep blue line represents the fitted curve of the data points, while the light blue line is the y = x reference line, which helps evaluate the deviation of the data. (a) Internal Validation Scatter Plot for RH80; (b) Internal Validation Scatter Plot for RH85; (c) Internal Validation Scatter Plot for RH90; (d) Internal Validation Scatter Plot for RH95.

Figure A4. External Validation Scatter Plots of CNN Algorithm for RH80, RH85, RH90, and RH95. The deep blue line represents the fitted curve of the data points, while the light blue line is the y = x reference line, which helps evaluate the deviation of the data. (a) External Validation Scatter Plot for RH80; (b) External Validation Scatter Plot for RH85; (c) External Validation Scatter Plot for RH90; (d) External Validation Scatter Plot for RH95.

Figure A5. Internal Validation Scatter Plots of GBDT Algorithm for RH80, RH85, RH90, and RH95. The deep blue line represents the fitted curve of the data points, while the light blue line is the y = x reference line, which helps evaluate the deviation of the data. (a) Internal Validation Scatter Plot for RH80; (b) Internal Validation Scatter Plot for RH85; (c) Internal Validation Scatter Plot for RH90; (d) Internal Validation Scatter Plot for RH95.

Figure A6. External Validation Scatter Plots of GBDT Algorithm for RH80, RH85, RH90, and RH95. The deep blue line represents the fitted curve of the data points, while the light blue line is the y = x reference line, which helps evaluate the deviation of the data. (a) External Validation Scatter Plot for RH80; (b) External Validation Scatter Plot for RH85; (c) External Validation Scatter Plot for RH90; (d) External Validation Scatter Plot for RH95.

Figure A7. Internal Validation Scatter Plots of RF Algorithm for RH80, RH85, RH90, and RH95. The deep blue line represents the fitted curve of the data points, while the light blue line is the y = x reference line, which helps evaluate the deviation of the data. (a) Internal Validation Scatter Plot for RH80; (b) Internal Validation Scatter Plot for RH85; (c) Internal Validation Scatter Plot for RH90; (d) Internal Validation Scatter Plot for RH95.

Figure A8. External Validation Scatter Plots of RF Algorithm for RH80, RH85, RH90, and RH95. The deep blue line represents the fitted curve of the data points, while the light blue line is the y = x reference line, which helps evaluate the deviation of the data. (a) External Validation Scatter Plot for RH80; (b) External Validation Scatter Plot for RH85; (c) External Validation Scatter Plot for RH90; (d) External Validation Scatter Plot for RH95.

References

Chopping, M.; Nolin, A.; Moisen, G.G.; Martonchik, J.V.; Bull, M. Forest canopy height from the Multiangle Imaging SpectroRadiometer (MISR) assessed with high resolution discrete return lidar. Remote Sens. Environ. 2009, 113, 2172–2185. [Google Scholar] [CrossRef]
Englhart, S.; Keuck, V.; Siegert, F. Aboveground biomass retrieval in tropical forests—The potential of combined X-and L-band SAR data use. Remote Sens. Environ. 2011, 115, 1260–1271. [Google Scholar] [CrossRef]
Asner, G.P.; Mascaro, J.; Muller-Landau, H.C.; Vieilledent, G.; Vaudry, R.; Rasamoelina, M.; Hall, J.S.; Van Breugel, M. A universal airborne LiDAR approach for tropical forest carbon mapping. Oecologia 2012, 168, 1147–1160. [Google Scholar] [CrossRef] [PubMed]
Goetz, S.; Dubayah, R. Advances in remote sensing technology and implications for measuring and monitoring forest carbon stocks and change. Carbon Manag. 2011, 2, 231–244. [Google Scholar] [CrossRef]
Kayitakire, F.; Hamel, C.; Defourny, P. Retrieving forest structure variables based on image texture analysis and IKONOS-2 imagery. Remote Sens. Environ. 2006, 102, 390–401. [Google Scholar] [CrossRef]
Irons, J.R.; Dwyer, J.L.; Barsi, J.A. The next Landsat satellite: The Landsat data continuity mission. Remote Sens. Environ. 2012, 122, 11–21. [Google Scholar] [CrossRef]
Chopping, M.; Moisen, G.G.; Su, L.; Laliberte, A.; Rango, A.; Martonchik, J.V.; Peters, D.P. Large area mapping of southwestern forest crown cover, canopy height, and biomass using the NASA Multiangle Imaging Spectro-Radiometer. Remote Sens. Environ. 2008, 112, 2051–2063. [Google Scholar] [CrossRef]
Xing, Y.; de Gier, A.; Zhang, J.; Wang, L. An improved method for estimating forest canopy height using ICESat-GLAS full waveform data over sloping terrain: A case study in Changbai mountains, China. Int. J. Appl. Earth Obs. Geoinf. 2010, 12, 385–392. [Google Scholar] [CrossRef]
Duncanson, L.; Niemann, K.; Wulder, M. Estimating forest canopy height and terrain relief from GLAS waveform metrics. Remote Sens. Environ. 2010, 114, 138–154. [Google Scholar] [CrossRef]
Sun, T.; Qi, J.; Huang, H. Discovering forest height changes based on spaceborne lidar data of ICESat-1 in 2005 and ICESat-2 in 2019: A case study in the Beijing-Tianjin-Hebei region of China. For. Ecosyst. 2020, 7, 53. [Google Scholar] [CrossRef]
Simard, M.; Pinto, N.; Fisher, J.B.; Baccini, A. Mapping forest canopy height globally with spaceborne lidar. J. Geophys. Res. Biogeosciences 2011, 116. [Google Scholar] [CrossRef]
Yang, T.; Wang, C.; Li, G.; Luo, S.; Xi, X.; Gao, S.; Zeng, H. Forest canopy height mapping over China using GLAS and MODIS data. Sci. China Earth Sci. 2015, 58, 96–105. [Google Scholar] [CrossRef]
Wang, X.; Pan, Z.; Glennie, C. A novel noise filtering model for photon-counting laser altimeter data. IEEE Geosci. Remote Sens. Lett. 2016, 13, 947–951. [Google Scholar] [CrossRef]
Potapov, P.; Li, X.; Hernandez-Serna, A.; Tyukavina, A.; Hansen, M.C.; Kommareddy, A.; Pickens, A.; Turubanova, S.; Tang, H.; Silva, C.E. Mapping global forest canopy height through integration of GEDI and Landsat data. Remote Sens. Environ. 2021, 253, 112165. [Google Scholar] [CrossRef]
Cao, C.; Ni, X.; Wang, X.; Lu, S.; Zhang, Y.; Dang, Y.; Singh, R.P. Allometric scaling theory-based maximum forest tree height and biomass estimation in the Three Gorges reservoir region using multi-source remote-sensing data. Int. J. Remote Sens. 2016, 37, 1210–1222. [Google Scholar] [CrossRef]
Huang, H.; Liu, C.; Wang, X. Constructing a finer-resolution forest height in China using icesat/glas, landsat and Alos Palsar data and height patterns of natural forests and plantations. Remote Sens. 2019, 11, 1740. [Google Scholar] [CrossRef]
Lin, X.; Xu, M.; Cao, C.; Dang, Y.; Bashir, B.; Xie, B.; Huang, Z. Estimates of forest canopy height using a combination of ICESat-2/ATLAS data and stereo-photogrammetry. Remote Sens. 2020, 12, 3649. [Google Scholar] [CrossRef]
Dubayah, R.; Blair, J.B.; Goetz, S.; Fatoyinbo, L.; Hansen, M.; Healey, S.; Hofton, M.; Hurtt, G.; Kellner, J.; Luthcke, S. The Global Ecosystem Dynamics Investigation: High-resolution laser ranging of the Earth’s forests and topography. Sci. Remote Sens. 2020, 1, 100002. [Google Scholar] [CrossRef]
Silva, C.A.; Duncanson, L.; Hancock, S.; Neuenschwander, A.; Thomas, N.; Hofton, M.; Fatoyinbo, L.; Simard, M.; Marshak, C.Z.; Armston, J. Fusing simulated GEDI, ICESat-2 and NISAR data for regional aboveground biomass mapping. Remote Sens. Environ. 2021, 253, 112234. [Google Scholar] [CrossRef]
Liu, X.; Liu, X.; Wang, Z.; Huang, G.; Shu, R. Classification of laser footprint based on random forest in mountainous area using GLAS full-waveform features. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 2284–2297. [Google Scholar] [CrossRef]
Zhang, J.; Wang, J.; Liu, G. Vertical structure classification of a forest sample plot based on point cloud data. J. Indian Soc. Remote Sens. 2020, 48, 1215–1222. [Google Scholar] [CrossRef]
Poorazimy, M.; Shataee, S.; McRoberts, R.E.; Mohammadi, J. Integrating airborne laser scanning data, space-borne radar data and digital aerial imagery to estimate aboveground carbon stock in Hyrcanian forests, Iran. Remote Sens. Environ. 2020, 240, 111669. [Google Scholar] [CrossRef]
Stovall, A.E.; Lagomasino, D.; Lee, S.-K.; Simard, M.; Tang, H.; Thomas, N.M.; Trettin, C.; Armston, J.D.; Dubayah, R.; Fatoyinbo, T. Terrestrial laser scanning improves LiDAR and radar biomass calibration in tallest mangrove forest on Earth. In Proceedings of the AGU Fall Meeting Abstracts, San Francisco, CA, USA, 9–13 December 2019; p. B11E–2376. [Google Scholar]
Askne, J.I.; Persson, H.J.; Ulander, L.M. Biomass growth from multi-temporal TanDEM-X interferometric synthetic aperture radar observations of a boreal forest site. Remote Sens. 2018, 10, 603. [Google Scholar] [CrossRef]
Newnham, G.J.; Armston, J.D.; Calders, K.; Disney, M.I.; Lovell, J.L.; Schaaf, C.B.; Strahler, A.H.; Danson, F.M. Terrestrial laser scanning for plot-scale forest measurement. Curr. For. Rep. 2015, 1, 239–251. [Google Scholar] [CrossRef]
Lefsky, M.A.; Harding, D.J.; Keller, M.; Cohen, W.B.; Carabajal, C.C.; Del Bom Espirito-Santo, F.; Hunter, M.O.; de Oliveira, R., Jr. Estimates of forest canopy height and aboveground biomass using ICESat. Geophys. Res. Lett. 2005, 32. [Google Scholar] [CrossRef]
Pourshamsi, M.; Xia, J.; Yokoya, N.; Garcia, M.; Lavalle, M.; Pottier, E.; Balzter, H. Tropical forest canopy height estimation from combined polarimetric SAR and LiDAR using machine-learning. ISPRS J. Photogramm. Remote Sens. 2021, 172, 79–94. [Google Scholar] [CrossRef]
Shah, S.A.A.; Manzoor, M.A.; Bais, A. Canopy height estimation at Landsat resolution using convolutional neural networks. Mach. Learn. Knowl. Extr. 2020, 2, 3. [Google Scholar] [CrossRef]
Stojanova, D.; Panov, P.; Gjorgjioski, V.; Kobler, A.; Džeroski, S. Estimating vegetation height and canopy cover from remotely sensed data with machine learning. Ecol. Inform. 2010, 5, 256–266. [Google Scholar] [CrossRef]
Huete, A.; Didan, K.; Miura, T.; Rodriguez, E.P.; Gao, X.; Ferreira, L.G. Overview of the radiometric and biophysical performance of the MODIS vegetation indices. Remote Sens. Environ. 2002, 83, 195–213. [Google Scholar] [CrossRef]
Jiankang, S.; Chen, G.; Xinwu, L.; Xiangxing, W.; Zhongchang, S. Classification of Hainan island natural forest based on multi-source remote sensing data. China Sci. Data 2019, 4, 40. [Google Scholar]
Zhang, X.; Zhao, T.; Xu, H.; Liu, W.; Wang, J.; Chen, X.; Liu, L. GLC_FCS30D: The first global 30 m land-cover dynamics monitoring product with a fine classification system for the period from 1985 to 2022 generated using dense-time-series Landsat imagery and the continuous change-detection method. Earth Syst. Sci. Data 2024, 16, 1353–1381. [Google Scholar] [CrossRef]
Yang, X.; Chen, Z.; Li, D. Classification and distribution of vegetation in Hainan, China. Sci. Sin. Vitae 2021, 51, 321–333. [Google Scholar] [CrossRef]
Ghosh, S.M.; Behera, M.D.; Paramanik, S. Canopy Height Estimation Using Sentinel Series Images through Machine Learning Models in a Mangrove Forest. Remote Sens. 2020, 12, 1519. [Google Scholar] [CrossRef]
Peng, X.; Zhao, A.; Chen, Y.; Chen, Q.; Liu, H.; Wang, J.; Li, H. Comparison of Modeling Algorithms for Forest Canopy Structures Based on UAV-LiDAR: A Case Study in Tropical China. Forests 2020, 11, 1324. [Google Scholar] [CrossRef]
Jin, S.; Su, Y.; Gao, S.; Hu, T.; Liu, J.; Guo, Q. The transferability of Random Forest in canopy height estimation from multi-source remote sensing data. Remote Sens. 2018, 10, 1183. [Google Scholar] [CrossRef]
Fayad, I.; Baghdadi, N.; Bailly, J.-S.; Barbier, N.; Gond, V.; Hajj, M.E.; Fabre, F.; Bourgine, B. Canopy Height Estimation in French Guiana with LiDAR ICESat/GLAS Data Using Principal Component Analysis and Random Forest Regressions. Remote Sens. 2014, 6, 11883–11914. [Google Scholar] [CrossRef]
Ghosh, S.M.; Behera, M.D.; Kumar, S.; Das, P.; Prakash, A.J.; Bhaskaran, P.K.; Roy, P.S.; Barik, S.K.; Jeganathan, C.; Srivastava, P.K.; et al. Predicting the Forest Canopy Height from LiDAR and Multi-Sensor Data Using Machine Learning over India. Remote Sens. 2022, 14, 5968. [Google Scholar] [CrossRef]
Zhong, J.; Cui, L.; Deng, Z.; Zhang, Y.; Lin, J.; Guo, G.; Zhang, X. Long-Term Effects of Ecological Restoration Projects on Ecosystem Services and Their Spatial Interactions: A Case Study of Hainan Tropical Forest Park in China. Environ. Manag. 2024, 73, 493–508. [Google Scholar] [CrossRef]
Gatti, L.V.; Basso, L.S.; Miller, J.B.; Gloor, M.; Gatti Domingues, L.; Cassol, H.L.; Tejada, G.; Aragão, L.E.; Nobre, C.; Peters, W. Amazonia as a carbon source linked to deforestation and climate change. Nature 2021, 595, 388–393. [Google Scholar] [CrossRef]
Chave, J.; Réjou-Méchain, M.; Búrquez, A.; Chidumayo, E.; Colgan, M.S.; Delitti, W.B.; Duque, A.; Eid, T.; Fearnside, P.M.; Goodman, R.C. Improved allometric models to estimate the aboveground biomass of tropical trees. Glob. Change Biol. 2014, 20, 3177–3190. [Google Scholar] [CrossRef]
Zhu, A.X.; Lu, G.; Liu, J.; Qin, C.Z.; Zhou, C. Spatial prediction based on Third Law of Geography. Ann. GIS 2018, 24, 225–240. [Google Scholar] [CrossRef]
Asner, G.P.; Martin, R.E.; Anderson, C.B.; Knapp, D.E. Quantifying forest canopy traits: Imaging spectroscopy versus field survey. Remote Sens. Environ. 2015, 158, 15–27. [Google Scholar] [CrossRef]
Yu, Z.; Qi, J.; Zhao, X.; Huang, H. Evaluating the reliability of bi-temporal canopy height model generated from airborne laser scanning for monitoring forest growth in boreal forest region. Int. J. Digit. Earth 2024, 17, 2345725. [Google Scholar] [CrossRef]

Figure 1. Technology Roadmap for Canopy Height Monitoring in Hainan Tropical Rainforest National Park.

Figure 2. Study area overview. The geographical locations of the following places: (a) China’s position on the world map; (b) Hainan Island within China; (c) the Hainan Tropical Rainforest National Park within Hainan Island; (d) and an enlarged study area of the Hainan Tropical Rainforest National Park showing the distribution of spaceborne LiDAR plots (ICESat-2 and GEDI) and field data plots (140 LiDAR-Scanned Plots and 315 UAV Photogrammetry Plots). This figure is based on the WGS 1984 geographic coordinate system and the WGS 1984 UTM 49N-projected coordinate system.

Figure 3. Forest Type Distribution in the Hainan Tropical Rainforest National Park. Subplots (a–e) represent forest type distributions for the years 2003, 2008, 2013, 2018, and 2023, respectively. This figure is based on the WGS 1984 geographic coordinate system and the WGS 1984 UTM 49N projected coordinate system.

Figure 4. Canopy height distribution of the Hainan Tropical Rainforest National Park from 2003 to 2023. Subplots (a–e) represent canopy height distributions for the years 2003, 2008, 2013, 2018, and 2023, respectively. This figure is based on the WGS 1984 geographic coordinate system and the WGS 1984 UTM 49N projected coordinate system.

Figure 5. Histogram of forest canopy height in the Hainan Tropical Rainforest National Park from 2003 to 2023. μ and σ represent the mean and standard deviation of the predicted forest canopy height, respectively, while N indicates the number of 30 m resolution forest canopy height pixels for each forest type. The histograms show canopy height distributions for five forest types—tropical lowland rainforests, tropical montane cloud forests, tropical seasonal rainforests, tropical montane rainforests, and tropical coniferous forests—across five years (2003, 2008, 2013, 2018, and 2023), corresponding to subplots (a–y).

Table 1. Internal accuracy validation of remote sensing models for tropical rainforest canopy height established by four machine-learning algorithms. RMSE—root mean square error, RRMSE—relative root mean square error.

Percentiles	Model Algorithms	Training Set R²	Testing Set R²	Testing Set Bias (m)	Testing Set Relative Bias (%)	Testing Set RMSE (m)	Testing Set RRMSE (%)
RH80	BP	0.44	0.43	0.09	0.62	3.73	25.79
	RF	0.71	0.60	−0.10	−0.67	3.11	21.36
	CNN	0.62	0.48	−0.16	−1.08	3.59	24.47
	GBDT	0.57	0.49	0.11	0.79	3.57	24.67
RH85	BP	0.42	0.37	−0.13	−0.81	3.97	25.55
	RF	0.73	0.56	0.03	0.22	3.31	21.28
	CNN	0.65	0.48	0.12	0.76	3.46	22.27
	GBDT	0.59	0.52	0.02	0.13	3.46	22.35
RH90	BP	0.44	0.39	0.03	0.21	3.76	22.56
	RF	0.73	0.59	0.07	0.44	3.25	19.63
	CNN	0.64	0.46	−0.06	−0.34	3.56	21.34
	GBDT	0.61	0.49	0.12	0.71	3.52	21.26
RH95	BP	0.45	0.41	−0.22	−1.24	3.88	21.30
	RF	0.75	0.60	0.21	1.16	3.17	18.06
	CNN	0.67	0.51	−0.32	−1.73	3.57	19.37
	GBDT	0.62	0.49	0.01	0.03	3.52	19.21

Table 2. External accuracy validation of remote sensing models for tropical rainforests canopy height established by four machine-learning algorithms.

Percentiles	Model Algorithms	R²	Bias (m)	Relative Bias (%)	RMSE (m)	RRMSE (%)
RH80	BP	0.38	1.24	9.84	4.98	39.63
	RF	0.45	1.13	8.99	4.15	33.05
	CNN	0.42	1.03	8.16	4.61	36.77
	GBDT	0.43	1.19	9.47	4.20	33.44
RH85	BP	0.43	1.67	13.26	4.77	37.96
	RF	0.45	2.14	17.00	4.52	36.00
	CNN	0.39	2.03	16.19	4.91	39.09
	GBDT	0.39	2.22	17.71	4.72	37.60
RH90	BP	0.43	2.95	23.48	5.27	41.98
	RF	0.44	3.14	25.01	5.13	40.82
	CNN	0.41	2.69	21.45	5.35	42.60
	GBDT	0.44	3.14	25.01	5.13	40.82
RH95	BP	0.45	4.23	33.65	6.00	47.78
	RF	0.44	4.74	37.72	6.24	49.71
	CNN	0.40	4.00	31.85	6.26	49.86
	GBDT	0.41	4.59	36.57	6.19	49.29

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ling, Q.; Chen, Y.; Feng, Z.; Pei, H.; Wang, C.; Yin, Z.; Qiu, Z. Monitoring Canopy Height in the Hainan Tropical Rainforest Using Machine Learning and Multi-Modal Data Fusion. Remote Sens. 2025, 17, 966. https://doi.org/10.3390/rs17060966

AMA Style

Ling Q, Chen Y, Feng Z, Pei H, Wang C, Yin Z, Qiu Z. Monitoring Canopy Height in the Hainan Tropical Rainforest Using Machine Learning and Multi-Modal Data Fusion. Remote Sensing. 2025; 17(6):966. https://doi.org/10.3390/rs17060966

Chicago/Turabian Style

Ling, Qingping, Yingtan Chen, Zhongke Feng, Huiqing Pei, Cai Wang, Zhaode Yin, and Zixuan Qiu. 2025. "Monitoring Canopy Height in the Hainan Tropical Rainforest Using Machine Learning and Multi-Modal Data Fusion" Remote Sensing 17, no. 6: 966. https://doi.org/10.3390/rs17060966

APA Style

Ling, Q., Chen, Y., Feng, Z., Pei, H., Wang, C., Yin, Z., & Qiu, Z. (2025). Monitoring Canopy Height in the Hainan Tropical Rainforest Using Machine Learning and Multi-Modal Data Fusion. Remote Sensing, 17(6), 966. https://doi.org/10.3390/rs17060966

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Monitoring Canopy Height in the Hainan Tropical Rainforest Using Machine Learning and Multi-Modal Data Fusion

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area

2.2. Data

2.2.1. Field Data

2.2.2. Spaceborne LiDAR Data

2.2.3. Terrain Feature Data

2.2.4. Climate Data

2.2.5. Vegetation Index Data

2.2.6. Other Data

2.3. Methods

2.3.1. Feature Selection

2.3.2. Model Construction

2.3.3. Model Evaluation and Validation

3. Results

3.1. Accuracy Validation of the Canopy Height Remote Sensing Estimation Model

3.2. Canopy Height Changes in the Hainan Tropical Rainforest National Park from 2003 to 2023

4. Discussion

4.1. Comparative Analysis of Four Machine Learning Algorithms for Forest Canopy Height Estimation

4.2. Research on Forest Canopy Height Estimation in the Hainan Tropical Rainforest National Park

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI