3.4.1. Data Preprocessing
The raw LiDAR data collected using the Hovermap underwent a preprocessing pipeline to prepare the datasets for subsequent tree stem modeling. This preprocessing workflow was meticulously designed to enhance data quality, reduce computational demands, and ensure the accuracy of the extracted forest metrics.
Initially, the raw point cloud data were down-sampled using a voxel grid filter. This process involved partitioning the point cloud into a three-dimensional grid of voxels and replacing all points within each voxel with a single representative point, typically the centroid [
68]. By reducing the density of the point cloud, the voxel grid filter significantly decreased the data volume, thereby optimizing processing time and computational resources while maintaining the structural integrity of the forest canopy and tree stems [
69].
Following down-sampling, ground segmentation was performed using the Simple Morphological Filter (SMRF) algorithm. The SMRF effectively separates ground points from non-ground points by analyzing local elevation variations within the point cloud [
41]. The algorithm applies morphological operations to construct a minimum elevation surface map, facilitating the accurate distinction of terrain points from vegetation and structural elements. Specifically, the SMRF can be mathematically represented as:
where
denotes the filtered ground elevation at coordinates
,
represents the structuring element defining the neighborhood window,
is a scaling factor controlling the influence of elevation offsets, and
is a height offset function applied within the window. This operation effectively captures the ground surface by minimizing the elevation values within the defined window, accounting for local terrain variability.
After removing ground points identified by the SMRF algorithm, the dataset was confined to vegetation and stem points, facilitating the focused analysis of the tree structure. Next, to standardize vertical metrics across differing terrains, elevation normalization was applied. Specifically, for each point with measured elevation , the corresponding ground elevation was subtracted to yield a normalized height . By providing a consistent reference frame that accounted for terrain variability, this normalization ensured that tree height and canopy attributes accurately reflected true vertical vegetation distribution across all plots. Subsequently, the processed point clouds were indexed and organized according to their geographic coordinates and assigned plot identifiers, thereby streamlining data retrieval and ensuring spatial coherence. This meticulous preparation of each experimental plot’s dataset fostered reliable and efficient downstream modeling of tree stems.
Finally, the preprocessed point clouds were organized and indexed based on geographic coordinates and plot identifiers. This organization allowed for the efficient access and management of the datasets during the tree stem modeling phase. Each experimental plot’s data were meticulously prepared, ensuring spatial consistency and integrity across all datasets.
3.4.2. Tree Segmentation and Extraction
The initial phase of tree extraction involved segmenting the preprocessed point cloud data to isolate individual trees, including crowns and stems. This process is crucial for accurately modeling tree structures and was divided into several detailed sub-steps:
To accurately identify core stem regions, the point cloud was first segmented into horizontal height bands ranging from 5.6 m to 5.9 m above ground level. This specific height band was chosen as it typically corresponds to the lower sections of tree stems, providing a stable basis for clustering [
70]. The number of clusters in K-means was set adaptively based on the local density of the height band. Within that height band, an enhanced K-means clustering algorithm was employed to group points belonging to individual tree stems [
71]. The objective function for this enhanced K-means clustering was defined as:
where
represents the total clustering objective function;
k is the number of clusters;
denotes the
ith cluster;
is a point within cluster
;
is the centroid of cluster
;
is the global mean of all centroids;
is a weight assigned to point
, reflecting its significance based on spatial density or other relevant criteria;
is a regularization parameter that controls the influence of the centroid dispersion term. The objective function
comprises two primary components: (1) Weighted Intra-cluster Variance:
. This term aims to minimize the weighted sum of squared distances between each point
and its respective cluster centroid
. The weights
can be adjusted based on factors such as point density or confidence levels derived from prior segmentation steps, enhancing the clustering performance in areas with varying point distributions. (2) Centroid Dispersion Regularization:
. This regularization term penalizes the dispersion of cluster centroids from the global mean
, promoting a more balanced distribution of clusters across the height band. The parameter
controls the strength of this penalty, allowing for flexibility in balancing intra-cluster compactness and inter-cluster separation. The inclusion of weighting and regularization in the K-means objective function facilitates a more nuanced clustering, particularly in environments with heterogeneous point distributions and varying tree stem characteristics [
72]. By assigning appropriate weights and controlling centroid dispersion, the algorithm effectively groups spatially proximate points within the specified height band, thereby isolating individual tree stems from the surrounding vegetation with higher precision [
71].
Post clustering, segments containing fewer than 20 points were discarded. This thresholding was implemented to eliminate noise and spurious detection, which are often the result of sensor inaccuracies, transient objects, or small underbrush that do not represent significant tree stems. Retaining only segments with 20 or more points ensured that the subsequent analysis focused on meaningful tree structures, thereby enhancing the reliability of the extraction process.
For each valid tree segment, the Bird’s Eye View (BEV) height compression was employed to transform the 3D point cloud into a 2D plane parallel to the ground [
73]. By discarding the vertical dimension and optionally applying rotation, scaling, and translation, this projection reduces the complexity of geometric analysis, enabling more efficient cross-sectional modeling of tree stems [
74,
75]. Specifically, let
represent the original 3D coordinates of a point. The BEV projection maps this point to a 2D coordinate system
through a combined linear transformation and translation:
Here, s is a scaling factor to adjust the spatial resolution, is a rotation angle to align the coordinate system with the forest plot orientation, and is a translation vector to position the projected data in a convenient 2D reference frame. By selecting , , and , the transformation reduces to a simple orthographic projection that discards the vertical dimension, retaining only the x and y coordinates of the original 3D points. In practice, these parameters were chosen to best represent the tree stem structure, facilitating subsequent 2D geometric fitting for the stem cross-sectional analysis.
After projecting the 3D points into the BEV plane, the next step was to determine the optimal circle that best represented the stem’s cross-sectional geometry. In practice, a robust weighted least squares approach was employed to minimize discrepancies between the observed points and the fitted circle, accommodating irregularities in point density and outliers. Let
be the coordinates of the BEV-projected points and
be the circle parameters, where
denotes the center and
r the radius of the circle. Let
be a weight associated with each point, reflecting its reliability (based on local point density or proximity to other high-confidence points), and let
be a regularization parameter that stabilizes the solution by penalizing large deviations from an initial radius estimate
[
76,
77]. The objective function is given by:
Here,
, defined as
ensures that the circle closely fits the BEV points while accommodating varying confidence levels through
.
, controlled by
, encourages the radius to remain near a chosen reference
, thus preventing overfitting and improving numerical stability. By iteratively solving this optimization problem, a precise and stable circle fit is achieved, providing an accurate estimate of the stem’s diameter for subsequent stem modeling and analysis [
77].
To account for stem variability such as bending, inclination, and canopy expansion, the fitted radius r was increased by 50%, yielding an adjusted radius . This enlargement ensured full coverage of the stem region, mitigating the risk of excluding points due to geometric distortions or natural stem tapering. The rationale for this adjustment was based on the understanding that tree stems are not perfectly cylindrical and often exhibit variations in thickness and curvature.
Using the adjusted radius , a cylindrical volume was defined around each tree segment’s centroid. All points within this cylinder were then classified as part of the respective tree stem. This segmentation effectively isolated the stem from surrounding vegetation and structural elements, enabling accurate modeling of the tree’s 3D structure.
3.4.3. Stem Point Extraction
The extraction of precise stem points was critical for accurate feature measurement and subsequent analysis. This process involved iterative cylinder fitting to model the tree stem’s 3D structure, ensuring robustness against occlusions and stem variability.
For each segmented tree, cylinder fitting was performed starting from the lower 5% height of the stem and progressing upwards. The Maximum Likelihood Estimation Sample Consensus (MLESAC) algorithm was utilized to robustly fit cylindrical models to the point cloud data. MLESAC enhances the reliability of cylinder fitting by maximizing the likelihood of inliers while minimizing the influence of outliers, effectively distinguishing the stem from branches and other structural elements, with thresholds for inlier distances set at 0.1 m to balance sensitivity and specificity [
78]. The cylinder fitting objective is formulated as:
where
represents the cylinder parameters (axis direction, radius, and position),
is the distance of point
i from the cylinder surface,
is the nominal distance threshold, and
is the standard deviation.
The iterative nature of this process allowed for the sequential refinement of the stem model, starting from the base and moving upwards, ensuring that each segment of the stem was accurately captured [
79].
Each fitted cylinder was validated by comparing the radius of the current segment with that of the preceding one. A decrease in radius typically indicates normal stem tapering, whereas an increase suggests the presence of branches erroneously identified as stem segments. In cases where an increase in radius was detected, the algorithm discarded the upper segment to prevent misclassification. This validation step was crucial for maintaining the integrity of the stem model, ensuring that only true stem points were retained.
This iterative process continued until the entire stem height was accurately modeled, resulting in a precise 3D representation of each tree’s stem.
3.4.4. Feature Extraction and Results Validation
The final phase of the methodology focused on extracting quantitative features from the accurately modeled tree stems and validating these features against ground-truth measurements to ensure accuracy and reliability.
From the 3D stem models, the diameter at breast height (DBH) was calculated at 1.3 m above ground level. This measurement was derived from the fitted cylindrical geometry of the stem, providing a standardized metric for comparing tree sizes. Additionally, tree height was determined by identifying the highest point within each stem segment, offering a complete understanding of each tree’s vertical structure. These parameters are fundamental for estimating forestry commercial timber volume storage and value assessment, which are critical components in assessing forest health and productivity.
Each stem segment’s center point and radius were meticulously recorded, enabling a detailed analysis of stem tapering and curvature. By examining the variation in radii across different segments of the stem, it is possible to assess the structural integrity of the tree. Consistent tapering indicates healthy growth patterns, while irregular variations may signal structural anomalies such as excessive bending or the presence of defects. This granular level of detail allows for the identification of subtle changes in tree morphology, which are essential for monitoring tree health and growth dynamics.
To further evaluate the structural integrity of the tree stems, deviation values were calculated for each segment, quantifying the stem’s curvature in both the
x and
y directions. Specifically, the deviations were defined as the angles between consecutive stem segments projected onto the respective axes. For each segment
i, the
x deviation angle (
) and
y deviation angle (
) were calculated using the following formulas:
where
is the change in the
x-coordinate between segment
i and segment
,
is the change in the
y-coordinate between segment
i and segment
, and
is the horizontal distance between segments
and
i, calculated as
.
By analyzing these deviation angles, the methodology assesses the degree of stem curvature, providing insights into the tree’s stability and resilience. High deviation values in either the x or y direction may indicate significant bending or irregular growth patterns, which could be symptomatic of environmental stressors or genetic factors affecting the tree’s development.