Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Next Article in Journal
Multipath Effects Mitigation in Offshore Construction Platform GNSS-RTK Displacement Monitoring Using Parametric Temporal Convolution Network
Previous Article in Journal
Influence of Roughness Digitisation Error on Predictions of Discontinuity Shear Strength
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Research on Hyperspectral Inversion of Soil Organic Carbon in Agricultural Fields of the Southern Shaanxi Mountain Area

1
Shaanxi Key Laboratory of Land Consolidation, School of Land Engineering, Chang’an University, Xi’an 710054, China
2
Command Center of Natural Resources Comprehensive Survey, China Geological Survey, Beijing 100055, China
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Remote Sens. 2025, 17(4), 600; https://doi.org/10.3390/rs17040600
Submission received: 13 January 2025 / Revised: 1 February 2025 / Accepted: 7 February 2025 / Published: 10 February 2025

Abstract

:
Rapidly obtaining information on the content and spatial distribution of soil organic carbon (SOC) in farmland is crucial for evaluating regional soil quality, land degradation, and crop yield. This study focuses on mountain soils in various crop cultivation areas in Shangzhou District, Shangluo City, Southern Shaanxi, utilizing ZY1-02D hyperspectral satellite imagery, field-measured hyperspectral data, and field sampling data to achieve precise inversion and spatial mapping of the SOC content. First, to address spectral bias caused by environmental factors, the Spectral Space Transformation (SST) algorithm was employed to establish a transfer relationship between measured and satellite image spectra, enabling systematic correction of the image spectra. Subsequently, multiple spectral transformation methods, including continuous wavelet transform (CWT), reciprocal, first-order derivative, second-order derivative, and continuum removal, were applied to the corrected spectral data to enhance their spectral response characteristics. For feature band selection, three methods were utilized: Variable Importance Projection (VIP), Competitive Adaptive Reweighted Sampling (CARS), and Stepwise Projection Algorithm (SPA). SOC content prediction was conducted using three models: partial least squares regression (PLSR), stepwise multiple linear regression (Step-MLR), and random forest (RF). Finally, leave-one-out cross-validation was employed to optimize the L4-CARS-RF model, which was selected for SOC spatial distribution mapping. The model achieved a coefficient of determination (R2) of 0.81, a root mean square error of prediction (RMSEP) of 1.54 g kg−1, and a mean absolute error (MAE) of 1.37 g kg−1. The results indicate that (1) the Spectral Space Transformation (SST) algorithm effectively eliminates environmental interference on image spectra, enhancing SOC prediction accuracy; (2) continuous wavelet transform significantly reduces data noise compared to other spectral processing methods, further improving SOC prediction accuracy; and (3) among feature band selection methods, the CARS algorithm demonstrated the best performance, achieving the highest SOC prediction accuracy when combined with the random forest model. These findings provide scientific methods and technical support for SOC monitoring and management in mountainous areas and offer valuable insights for assessing the long-term impacts of different crops on soil ecosystems.

1. Introduction

Soil organic carbon (SOC), a key indicator of soil health and productivity, is essential for sustainable land use. Rapid and accurate monitoring of the SOC content and its spatial distribution is valuable for understanding the impact of land use changes due to crop cultivation on soil organic carbon dynamics [1]. SOC is complex in composition, containing diverse functional groups. Traditional monitoring methods primarily rely on laboratory chemical analysis techniques, such as Atomic Absorption Spectroscopy (AAS) and Inductively Coupled Plasma Mass Spectrometry (ICP-MS) [2]. These methods can offer high precision but are often costly, limited in spatial coverage, and pose environmental risks due to processes like acid digestion [3]. Thus, there is an urgent need for a more efficient and environmentally friendly approach to monitor soil organic carbon.
Remote sensing technology, as a non-destructive and efficient method for soil organic carbon (SOC) estimation, has gained wide recognition in practical applications. This approach leverages the unique electromagnetic reflection characteristics of land features, allowing spectral data to reflect different soil properties and making SOC estimation feasible [4]. Most current studies rely on hyperspectral data from field spectrometers to analyze soil characteristics in laboratory settings and explore SOC’s spectral properties and quantitative estimation. For instance, Xueying Li et al. researched SOC prediction and feature extraction based on VNIR, enhancing SOC estimation accuracy [5]. While these methods offer high predictive accuracy, they generally cover only small areas, limiting their utility for large-scale SOC mapping especially in mountainous farmland where high spatial heterogeneity complicates small-scale SOC monitoring [6]. Other studies have leveraged multispectral satellite data from sensors like Landsat 8 OLI and Sentinel-2 MSI to predict SOC, though multispectral sensors require supplementary environmental variables due to limited spectral bands. Ren Biwu et al., for example, incorporated terrain, remote sensing vegetation indices, and climatic data in an RF model for SOC mapping in complex subtropical landscapes, achieving R2 = 0.89 [7].
Recent advancements in hyperspectral remote sensing, with the availability of airborne and satellite hyperspectral imagery, have introduced powerful tools for large-scale SOC monitoring, offering high-resolution, spatially continuous soil spectral data as well as the spatial insights that laboratory techniques cannot provide [8]. Such technology is particularly suitable for large-scale soil property estimation, as demonstrated by Haixia Jin et al., who successfully predicted SOC spatial distribution characteristics using GF-5 satellite hyperspectral imagery in the Yuncheng Basin (R2 = 0.69) [9]. Xiangtian Meng et al. further refined GF-5 hyperspectral data through discrete wavelet transformation denoising and remote sensing indices to improve SOC mapping accuracy (R2 = 0.83) [10]. Besides existing satellite sensors, drone-based hyperspectral technology has been utilized to map soil properties; for example, Song Qi et al. used UAV hyperspectral data to estimate SOC in the Huangshui River Basin’s farmlands (R2 = 0.88) [11]. However, airborne sensors are expensive and sensitive to atmospheric conditions, limiting their practicality for larger-scale SOC assessment [12]. Overall, previous studies have predominantly utilized multispectral data (e.g., Landsat, Sentinel-2) or airborne hyperspectral imagery; however, these data sources either lack the spectral resolution required to distinguish subtle SOC features or face limitations in terms of spatial coverage and cost-effectiveness.
Satellite hyperspectral sensors offer significant advantages for monitoring soil properties on a large scale, with spectral resolutions of 20 nm or less (Table 1), enabling the acquisition of spectral information over vast areas in a short amount of time. The main technical specifications of the various hyperspectral satellite systems are listed in Table 1 [10]. In contrast, the ZY1-02D hyperspectral satellite, a recent platform in China’s hyperspectral technology, offers substantial advantages for SOC estimation, covering a spectral range of 0.40–2.50 μm with 166 bands, a spatial resolution of 30 m, a spectral resolution of 10 nanometers in the visible-near-infrared and 20 nanometers in the shortwave infrared, and a swath width of 60 km [13,14]. These features make the ZY1-02D AHSI satellite particularly suitable for capturing subtle spectral variations in soil organic carbon (SOC) across large areas, with significant potential for use in mountainous regions characterized by steep terrain, diverse soil types, and varying microclimatic conditions. However, there is a lack of studies utilizing ZY1-02D AHSI imagery for SOC content estimation, especially in mountainous areas.
Despite the potential of hyperspectral imagery, SOC estimation accuracy is often lower in mountainous areas with complex terrain and diverse soil types, impacted by natural and human interferences [7]. Discrepancies between lab and field conditions, such as sample preprocessing (e.g., moisture removal, grinding) and environmental simulation (e.g., radiation, light, temperature), also contribute to the lower hyperspectral accuracy imagery compared to laboratory spectrometers [15]. Addressing these systematic discrepancies is key. Linear transformations have been used to adjust spectral differences due to instrumentation, measurement conditions, or environmental changes. For instance, Bo Zhang et al. used a Direct Standardization (DS) algorithm to calibrate field soil spectra with laboratory measurements for high-precision SOC prediction [16], while Wang Juxiang et al. applied Piecewise Direct Standardization (PDS) to instrument compatibility in near-infrared spectral models [17]. The DS algorithm is limited in handling complex nonlinear spectral changes [18], but the Spectral Space Transformation (SST) method offers more comprehensive statistical alignment for different spectral data [19].
This study used soil from agricultural areas in Shangzhou District, Shangluo, Shaanxi Province to develop an SOC prediction model and create a spatial distribution map by integrating ZY1-02D hyperspectral satellite imagery, field-measured hyperspectral data, and on-site sampling. Specifically, this article covers the following: (1) utilizing SST to establish a transfer relationship between measured and ZY1-02D image spectra for spectral calibration to reduce environmental impacts; (2) applying various spectral processing methods such as continuous wavelet transform, reciprocal transformations, first- and second-order derivatives, and continuum removal to enhance spectral response characteristics; (3) selecting feature bands using VIP, CARS, and SPA to assess their influence on prediction accuracy; and (4) employing PLSR, Step-MLR, and RF models to determine the optimal SOC prediction model for mapping the spatial distribution of SOC in the study area.

2. Materials and Methods

2.1. Study Area

The study area is located in Shangzhou District, Shangluo City, Shaanxi Province, between 33°38′ and 34°11′N and between 109°30′ and 110°14′E. Situated in southeastern Shaanxi, it lies on the southern slopes of the eastern section of the Qinling Mountains and in the upper reaches of the Danjiang River, covering a total area of 2672 square kilometers. The terrain is higher in the northwest and slopes down towards the southeast, with elevations ranging from 543 to 1544 m. Shangzhou District has a temperate monsoon climate, with an annual average temperature of 7.8–13.9 °C, an average annual rainfall of 699.7–969.7 mm, and an average annual sunshine duration of 2123.8 h. The region’s mountainous topography features significant elevation differences, and the soil distribution is primarily vertical, dominated by brown soil, reddish-brown soil, and cinnamon soil, with brown soil covering the largest area [20]. The local geographical environment provides favorable external conditions for the growth of various crops, with the area rich in both staple and cash crops. The sampling distribution is shown in Figure 1.

2.2. Data and Preprocessing

2.2.1. Sampling and Laboratory Analysis

The workflow of this study is shown in Figure 2. The sampling point distribution combined semi-random and grid methods, covering the main agricultural land uses (arable land, orchard land, forest land) and soil types (spodosols, alfisols, entisols) within the study area for an even spatial distribution. Field sampling was conducted over three days, from 13 to 15 June 2023. A total of 60 soil samples were collected at a depth of 10 cm, with precise coordinates for each sample recorded using GNSS. Sampling was conducted in 30 m × 30 m square plots in areas without vegetation cover, with samples collected from the four corners and the center of each plot. These five 0–10 cm topsoil samples were then mixed evenly and sealed as a single soil sample, labeled with a unique sample number and its coordinates.
In the laboratory, soil samples were air-dried, ground, and passed through a 0.25 mm (10-mesh) nylon sieve to remove large fragments, rocks, and plant material. Each sample was split into two portions: one for organic carbon content analysis and the other for spectral reflectance measurement.

2.2.2. Laboratory Spectral Measurement

The pretreated soil samples were measured in a controlled darkroom using an ASD FieldSpec 4 (Analytical Spectral Devices, Inc., Boulder, CO, USA) portable spectroradiometer, which offers a spectral resolution of 1 nm across the range of 350–2500 nm. Each sample was placed in a glass dish with a depth of 1.5 cm and a radius of 5 cm. Prior to measurement, the spectroradiometer was calibrated using a 99% reflectance white reference panel, and the device was preheated for 30 min to minimize any deviations. A 50 W halogen lamp, positioned at a 15° zenith angle and 30 cm above the surface of the soil sample, served as the light source. The sensor, with an 8° field of view, was positioned at a 90° angle, 15 cm from the sample. After the instrument stabilized, each sample was scanned 10 times to minimize errors caused by stray light. The final spectral reflectance for each sample was calculated by averaging the 10 individual scans.

2.2.3. Preprocessing ZY1-02D Hyperspectral Imagery

The ZY1-02D AHSI imagery used in this study was obtained from the Land Satellite Remote Sensing Application Center of the Ministry of Natural Resources. This imagery has a spatial resolution of 30 m, a swath width of 60 km, and consists of 166 spectral bands, including 76 visible near-infrared (VNIR) bands and 90 short-wave infrared (SWIR) bands, with spectral resolutions of 10 nm and 20 nm, respectively. Considering field sampling times and cloud cover, we selected images with cloud cover below 5% during the winter bare soil period in the Shangzhou District for November and December 2023. The L1A-level products were preprocessed with geometric and orthorectification corrections, leaving only radiometric calibration and atmospheric correction required for further preprocessing of the two scenes [21]. Unsuitable bands were removed due to issues such as water vapor absorption bands, spectral channel overlap, and signal-to-noise ratio concerns. The selected spectral ranges were 430–996 nm, 1040–1341 nm, 1425–1812 nm, and 1963–2484 nm.
In hyperspectral remote sensing, differences in acquisition conditions, sensor characteristics, and environmental factors often introduce systematic biases in the spectral signals of soil samples between satellite pixels and laboratory measurement conditions. The Spectral Space Transformation (SST) algorithm is designed to establish a linear mapping from the satellite spectral space to the reference spectral space. By transforming satellite pixel spectra, which are influenced by environmental and instrumental factors, into forms that more closely resemble reference spectra obtained under laboratory conditions, SST reduces the influence of non-target factors on spectral feature analysis and quantitative inversion [22].
Mathematically, SST utilizes a linear transformation. The key steps involve combining the data to be corrected with the reference data into a single composite matrix, performing singular value decomposition (SVD) on the composite matrix, and constructing a transformation matrix to map the satellite spectra to the reference spectral space. This process is fundamentally linear, relying on linear algebra tools such as SVD and the Moore–Penrose pseudoinverse [23]. The specific execution steps are as follows.
Assume matrices X1 and X2 represent the measured spectral matrix and pixel spectral matrix, respectively, which are combined into an enhanced matrix Xcomb as shown in Equation (1):
Χcomb = [Χ1, Χ2]
First, the SST formula is divided into two key steps: constructing the combined matrix and performing singular value decomposition (SVD) (Equation (2)) [24], and then using the correction formula to transform the spectrum into a unified space (Equations (3) and (4)).
X comb = U s , U n s 0 0 n   V s , V n T + E
In this context, Us and Un represent the singular vector matrices for the spectral signal component and noise component, respectively; Σs and Σn are the singular value matrices that indicate the strength of the spectral signal and noise; Vs and Vn are the right singular matrices, representing the spectral feature vectors; and E is the residual matrix, representing the fitting error.
T s P s T = T s P 1 T ,   P 2 T
X trans =   X 2 P 2 T + P 1 T +   X 2 -   X 2 P 1 T   P 2 T + P 2 T
In this context, Ts is the core matrix obtained after singular value decomposition, containing the primary spectral information; P 1 T is the loading matrix for the measured spectrum; P 2 T is the loading matrix for the pixel spectrum; and P 2 T + is the Moore–Penrose pseudoinverse of P 2 T , used to calibrate the pixel spectrum to the measured spectral space.
In spectral calibration, the representativeness and quantity of transfer samples are critical factors influencing calibration quality. This study utilized the Kennard–Stone (KS) algorithm to select transfer sample sets of varying sizes (t = 10, 20, 30, 40, 50, 60) from field-measured spectral data. To identify the optimal transfer sample set, the calibration quality of each scheme was assessed by performing spectral angle mapping (SAM) between the calibrated spectra of different sizes and the corresponding field-measured spectra, as described in Equation (5) [16].
θ = a r c c o s n i X M X T n i X M 2 n i X T 2
In the equation, X M and X T represent the reflectance at each wavelength for the measured and calibrated spectra, respectively, while i and n denote the total number of wavelengths (n = 140).

2.3. Methods

2.3.1. Spectral Transformation

To achieve data smoothing, the Savitzky–Golay algorithm (using a polynomial order of 3 and 11 points) was applied to both pixel and measured spectra to smooth and reduce noise in the spectral curves. To further reduce noise interference and enhance the spectral response, five mathematical transformations were applied to the spectral data: reciprocal, logarithm of the reciprocal, first derivative, second derivative, and continuum removal (Table 2) [25].
A continuous wavelet transform (CWT) was used to decompose the spectral data with basic functions, analyzing the correlation with organic carbon content at various decomposition scales [26]. This transformation employs a series of wavelet functions obtained through scaling and translation to analyze the signal, with the calculation expressed as follows:
CWT S , τ = 1 s - R λ ψ * λ τ s d λ
In the equation, CWT S , τ represents the continuous wavelet transform result at scale s and wavelength τ, and 1 s is a normalization factor to ensure comparability of transformations across different scales s. R λ is the soil hyperspectral reflectance, where λ denotes the wavelength band (430–2484 nm); ψ is the wavelet basis function; s is the scale factor; and τ is the translation factor.
Since soil spectral absorption characteristics approximate a Gaussian function, the Gaussian4 function with 2, 22, …, 210 decomposition scales was selected as the mother wavelet function. This function decomposes the one-dimensional soil hyperspectral data into two-dimensional wavelet coefficients, capturing information across bands and varying decomposition scales [27].

2.3.2. Feature Band Selection

Given the complexity of soil spectral features, where each functional group exhibits a broad spectrum that may sometimes overlap, the entire spectral range often contains noise and redundant information. This can lead to inaccurate model results [28]. Selecting bands that provide the most relevant information can reduce model complexity while improving prediction accuracy and robustness. Effectively extracting feature information from soil hyperspectral data aids in enhancing the estimation precision of soil organic carbon. There are various methods for feature band selection, each with its pros and cons. To derive the optimal subset of variables, three selection methods were compared: CARS (Competitive Adaptive Reweighted Sampling), VIP (Variable Importance in Projection), and SPA (Successive Projections Algorithm).
  • CARS
The CARS algorithm, developed by Liang’s team, combines partial least squares (PLS) regression with Monte Carlo sampling to gradually filter out the feature wavelengths that contribute the most to the model’s predictive performance. This algorithm introduces an exponential decay function to progressively reduce the number of wavelengths, and it employs ten-fold cross-validation to compute the cross-validated root mean square error (RMSECV). The subset of bands with the minimum RMSECV value is selected as the feature bands. CARS effectively reduces model complexity while significantly improving the accuracy and generalization ability of the predictive model [5,26].
2.
VIP
The VIP algorithm is a feature selection method based on the PLS model, quantifying the contribution of each band to the model’s predictive performance by calculating their weighted importance in the PLS components. The VIP value reflects the role of the feature bands in explaining the relationship between independent and dependent variables. When spectral inputs have similar explanatory power for SOC content, their values approach 1 [9]. In this study, a threshold of 1 was used to select bands that contribute significantly to the model. This algorithm effectively eliminates multicollinearity issues among modeling variables, particularly in cases of strong correlations and small sample sizes, thereby reducing model complexity and enhancing prediction accuracy [23].
3.
SPA
The SPA selects feature variables with minimal multicollinearity from high-dimensional data through successive projections [29]. It begins with an initial variable and progressively selects bands that exhibit the least linear correlation with the variables already chosen, thereby constructing a subset of features that are both informative and mutually independent. The strength of SPA lies in its ability to reduce collinearity among spectral data variables, minimize redundancy in modeling, and improve the robustness and generalization capacity of the model [30]. However, during the feature variable selection process, SPA tends to favor variables with lower collinearity and no redundancy. These variables may not necessarily be the most effective, potentially leading to instability in the selected feature variables [31].

2.3.3. Inversion Models

4.
Partial least squares regression model
Partial least squares regression (PLSR) is a multivariate statistical analysis method that establishes a regression relationship between independent and dependent variables. It is particularly effective for addressing multicollinearity among independent variables. The PLS method first projects the original independent and dependent variables into a new space, generating several mutually orthogonal components (latent variables). These components, which are linear combinations of the original variables, explain the maximum variance of the dependent variable [32]. Using these orthogonal components, the regression model is then constructed. Given potential collinearity among independent variables, the PLS regression approach is employed to establish the prediction model, effectively addressing multicollinearity while ensuring the model’s predictive capability and stability. Model construction and prediction are implemented using the PLS-Regression module in Python.
5.
Stepwise multiple linear regression model
The stepwise multiple linear regression (Stepwise MLR) model optimizes the SOC prediction model using backward stepwise regression. The backward stepwise regression starts with a full model containing all candidate bands and iteratively assesses the contribution of each band to SOC prediction. It gradually eliminates bands that are statistically insignificant (based on p-values) [33]. During each iteration, the band with the highest and non-significant p-value is removed, and significance is recalculated for the remaining bands, ensuring that each band retained in the final model has a significant impact on SOC prediction. This approach effectively reduces model complexity, mitigates overfitting risks, and enhances model robustness and predictive power. However, while backward stepwise regression selects bands with significant predictive value for SOC, its dependency on significance levels can result in potentially important bands being excluded in practical applications [34]. In this study, the p-value threshold for including variables was set at 0.05, while the exclusion threshold was set at 0.10. Additionally, the maximum number of iterations was limited to 100. These settings were designed to retain variables with significant contributions, reduce model complexity, and minimize the risk of overfitting. The appropriateness of these parameter settings has been validated by previous research conducted by other scholars [35]. Model construction and prediction are implemented using Python’s statsmodels module.
6.
Random forest model
The random forest (RF) algorithm, proposed by Breiman, is an ensemble learning method combining regression tree (CART) analysis with classification. RF utilizes bagging to generate multiple decision trees and performs classification or regression through majority voting, resulting in a high prediction accuracy and robustness. One of RF’s key advantages is its ability to evaluate the importance of each feature without bias and handle large amounts of missing data effectively [36]. Unlike traditional models, RF can process large-scale data without requiring feature dimensionality reduction, making it advantageous in many applications.
In constructing the random forest (RF) model, two critical parameters are n_estimators (the number of decision trees) and max_depth (the maximum depth of each tree). The n_estimators parameter controls model complexity and stability, while max_depth influences the model’s ability to fit the data. To optimize the performance of the RF model, a grid search (Grid Search CV) is employed. For the study’s small sample dataset, smaller values for n_estimators and moderate values for max_depth were selected to avoid overfitting. Combinations of n_estimators values (100, 200, 300, 400, and 500) and max_depth values (10, 11, 12, 13, 14, and 25) were tested [37,38]. After systematically evaluating parameter combinations, n_estimators = 200 and max_depth = 12 were identified as the optimal settings for soil organic carbon (SOC) prediction. Model construction and prediction were carried out using Python’s Random Forest Regressor module.

2.3.4. Model Accuracy Evaluation

To evaluate the accuracy of the soil organic carbon (SOC) inversion model, we applied leave-one-out cross-validation (LOOCV) and used three metrics: the coefficient of determination (R2), root mean square error (RMSEP), and mean absolute error (MAE). R2 indicates how well the model explains SOC variations, with values close to 1 suggesting a better fit. RMSEP measures the average deviation between predicted and actual values, with smaller values indicating higher accuracy. MAE shows the average absolute deviation, with values close to zero indicating better predictive accuracy. These metrics, combined with LOOCV validation, provide a comprehensive assessment of the model’s robustness and generalization ability.

3. Results

3.1. Descriptive Statistics of SOC

The statistical characteristics of the soil organic carbon content (g kg−1) measured from the samples are shown below (Figure 3). The SOC content ranges from 9.54 to 30.98 g kg−1, with an average content of 15.84 g kg−1. The coefficient of variation (CV) for the data is 26.68%, indicating a moderate level of variability, which reflects a significant difference in the organic carbon content among the soil samples in the study area.

3.2. Spectral Feature Analysis

The measured spectra of the soil samples (Figure 4a) and the pixel spectra from the ZY1-02D satellite (Figure 4b) exhibit similar spectral shapes, particularly in specific wavelength ranges that are crucial for analyzing slope characteristics and peak positions. However, some differences are observed, primarily due to the presence of crop residues in the soil resulting from the return of crop straw to the field [39]. Satellite spectra collected under field conditions are inevitably influenced by the natural environment. Despite applying SG smoothing to the satellite pixel spectra, the measured spectral curves remain smoother with smaller standard deviations compared to the satellite pixel spectra. Additionally, the higher moisture content in field soils, as compared to the dry laboratory samples, results in increased absorption of spectral energy by water, leading to lower field soil spectra from ZY1-02D compared to the corresponding laboratory-measured spectra.
For both the measured and pixel spectral data, reflectance increases rapidly in the range of 400–1000 nm due to the presence of iron ions and organic matter. In the measured spectra, three distinct absorption valleys appear at 1400, 1900, and 2200 nm, likely corresponding to water molecules and hydroxyl absorption bands in clay minerals. In the pixel spectra, noise generated by the connection of the two sensors is evident in the range of 900–1040 nm. Significant absorption peaks occur in the ranges of 1340–1425 nm and 1812–1963 nm due to atmospheric water vapor absorption. Within the wavelength range of 2100 to 2484 nm, the reflectance initially increases and then decreases. The differences in reflectance between the measured and pixel spectral data at various wavelengths indicate spectral discrepancies under different measurement conditions, potentially impacting the accuracy of SOC predictions [40].
Using the SST algorithm, we established transfer sample sets of various sizes (t = 10, 20, 30, 40, 50, 60) to quantify the relationship between the measured image spectra and their corresponding field-measured spectra. Figure 5 illustrates the calibration quality assessment results, showing that the spectral angle decreases as the number of transfer samples increases; specifically, when 60 transfer samples are used, the cosine similarity between the calibrated and measured spectra is the highest, suggesting the optimal calibration quality. Therefore, we employed the SST algorithm fitted with 60 transfer samples to calibrate the ZY1E AHSI image. Compared with the original spectra, the calibrated spectra show an increase in the mean reflectance range, from 0.03–0.18 to 0.1–0.51, with a smaller standard deviation that signifies a reduction in spectral variability, suggesting a decrease in noise levels. Moreover, the calibrated spectra more closely match the corresponding field-measured spectra (Figure 4a), clearly showing three distinct absorption troughs at 1400, 1900, and 2200 nm. This finding demonstrates that the SST algorithm can effectively eliminate environmental influences such as soil moisture, soil particle size, and meteorological conditions.

3.3. Correlation Analysis of Spectral Transformation

Given that the Gaussian function approximates the characteristics of soil spectral curves, it was chosen as the wavelet basis function. The decomposition scales were set to 2, 22, …, 210 and the calibrated spectra were transformed using continuous wavelet transform (CWT) at ten different scales [41]. As shown in Figure 5, CWT effectively eliminates spectral noise, resulting in smoother spectral curves.
With the increase in CWT decomposition scales, the spectral response is significantly enhanced. In stages L1–L5, high-frequency components are notably reduced, and some less obvious characteristic peaks are further highlighted, leading to a gradual increase in the spectral sensitivity to soil organic carbon (SOC). However, in stages L6–L10, significant removal of low-frequency components, which serve as spectral features, results in the loss of some spectral detail and leaves relatively sparse effective information.
Clearly, at higher decomposition scales, capturing useful characteristic spectral information becomes challenging. Continuous wavelet transform (CWT) is effective in emphasizing spectral feature bands. This wavelet-based analysis method allows CWT to provide more effective support than traditional methods when extracting valuable spectral characteristics.
To analyze the relationship between the corrected spectra and soil organic carbon (SOC) content, Pearson correlation analysis was performed between the SOC content and each band of the corrected spectra as well as the five mathematical transformations (Figure 6). In the corrected spectra, the maximum correlation coefficient between the original spectra and SOC content was 0.49. The correlation significantly increased after applying the first derivative, continuum removal, and CWT at scales L3, L4, and L5, with improvements of 0.23, 0.21, 0.28, 0.32, and 0.27 respectively, reaching a maximum coefficient of determination of 0.81. This improvement is attributed to the differentiation technique’s effectiveness in separating overlapping samples and reducing parallel noise interference.

3.4. Feature Band Selection

Hyperspectral data contain some bands that are unrelated to soil organic carbon (SOC), with a high overlap or collinearity among certain bands. Using spectral band variable extraction techniques can reduce the dimensionality of spectral data, enhancing the robustness and accuracy of hyperspectral SOC inversion models. VIP (Variable Importance in Projection), CARS (Competitive Adaptive Reweighted Sampling), and SPA (Successive Projections Algorithm) were further employed to select spectral band variables that best explained SOC [42,43].
Figure 7 shows the results of feature band selection using the VIP (Variable Importance in Projection) method. The higher the VIP score, the greater the importance of the associated band. By setting a score threshold, the optimal number of feature bands can be determined; in this study, the threshold was set at 1.0. When VIP > 1.0 (i.e., above the gray dashed line), the SOC characteristic bands of the corrected spectrum primarily lie around 430–900 nm, due to pigment absorption and molecular vibration modes of soil organic matter, particularly vibrations of bonds like C-H, N-H, and O-H. Some bands in the 2000–2450 nm range are attributed to overtone absorption of organic molecules and absorption caused by interactions between minerals and organic matter.
It is noteworthy that due to the different feature selection capacities at various CWT scales, low scales (L1–L2) generally capture high-frequency components in the spectral signal, which are more sensitive to noise, leading to possibly lower VIP scores and capturing some SOC-related local features, though these may be less robust. In the mid-scale range (L3–L5), wavelet transformation captures mid-frequency information in the spectral signal, which contains more meaningful features, potentially enhancing VIP score selection ability and identifying more stable SOC-related features. At high scales (L7–L10), as the scale increases further, the information may become overly smooth, weakening selection capability. Across all spectra, there were 56 averaged feature bands, covering approximately 40% of the total bandwidth.
Figure 8 demonstrates the SOC feature band selection process using the CARS (Competitive Adaptive Reweighted Sampling) algorithm, with L4 as an example. Results indicate that as the number of iterations increases, the number of retained bands gradually decreases, with RMSECV (root mean square error of cross-validation) initially decreasing and then increasing with further iterations. At 62 iterations, RMSECV reaches its minimum, as bands weakly correlated with SOC content are progressively removed, reducing RMSECV. After 62 iterations, bands relevant to SOC may have been excluded, causing RMSECV to increase. For L4, the optimal bands for predicting SOC total 19, representing approximately 13.57% of the total bands. Notably, the CARS algorithm selects fewer bands, likely due to its emphasis on variable stability [44].

3.5. Comparison of Modeling Results

The performance of the PLSR, SMLR, and RF models in predicting the SOC content was evaluated using R2, MAE, and RMSEP values (Figure 9). In the PLSR model, the highest accuracy was achieved by the model built with the feature bands selected by CARS at the L5 scale (R2: 0.73). In the SMLR model, the highest accuracy was achieved by the model built with the feature bands selected by CARS at the L3 scale (R2: 0.69). In the RF model, the highest accuracy was achieved by the model built with the feature bands selected by CARS at the L4 scale (R2: 0.81). Compared to L5-CARS-PLSR and L3-CARS-SMLR, L4-CARS-RF improved model performance, respectively.
In addition, when comparing the combinations of different band selection algorithms and inversion models with spectral data processed by five mathematical transformations and continuous wavelet transform (CWT), it was found that the highest-accuracy inversion models were all based on spectral data at CWT scales L2, L3, L4, and L5, with R2 values greater than 0.6. This indicates that SOC content prediction accuracy at scales L2 to L5 is superior to other transformations. Notably, in all inversion models, as the CWT scale increases, the accuracy gradually rises, peaking at L4 or L5, then decreasing. Furthermore, when comparing the effectiveness of different band selection algorithms on inversion accuracy within the same model, it was observed that the features selected using the CARS algorithm yielded the highest modeling accuracy, followed by SPA and VIP.
The results in Figure 10 show that the RF model demonstrates high accuracy in predicting SOC. Most of the predicted values are distributed close to the 1:1 line with the measured values, indicating that this inversion model effectively captures the spatial variability of SOC. Although there are slight overestimations or underestimations in a few sample points, no significant overfitting is observed overall, showcasing the strong predictive performance of the RF model. By incorporating a variable importance analysis strategy, the RF model can identify and effectively leverage key spectral features, further improving prediction accuracy. Overall, the RF model shows high practicality and reliability in estimating the SOC content based on spectral data, making it a suitable tool for this field.

3.6. SOC Content Mapping

Using ZY1-02D hyperspectral imagery, in combination with the optimal prediction model, we generated a spatial distribution map of the soil organic carbon (SOC) content (Figure 11). Overall, the SOC distribution demonstrated a highly fragmented spatial pattern. The RF model predicted a mean SOC content of 17.13 g kg⁻¹, ranging from 5.38 to 34.86 g kg⁻¹. Areas with medium–high SOC values (>28 g kg⁻¹) were mainly found in low-altitude regions such as hills and foothills, accounting for 18.65% of the total study area. Meanwhile, areas with medium–low SOC values (<15 g kg⁻¹) were primarily distributed in the central urbanized zone and high-altitude mountainous areas, constituting 32.42% of the overall region. In addition, the predicted standard deviation (SD) of the SOC content was 3.86 g kg⁻¹, and the coefficient of variation (CV) was 23.43, which was notably lower than the observed values, in line with previous SOC prediction studies [43]. The predicted SOC range, mean, and standard deviation showed good consistency with the measured sample data, further indicating that this model maintains a relatively stable accuracy in simulation and in capturing spatial heterogeneity.
The marked influence of a geographical vertical distribution within the study area significantly affects the spatial patterns of SOC. Low-altitude hills and foothills, characterized by warm and humid conditions conducive to forest vegetation, typically exhibit higher SOC contents. The abundant organic matter input in forests, coupled with relatively stable soil environments, promotes efficient SOC accumulation [44]. As altitude increases, vegetation types shift from low-altitude forests to high-altitude meadows, resulting in remarkable differences in SOC content. In contrast, in high-altitude mountainous areas with rugged terrain and steep slopes, the soil quality is relatively poor, and the cold climate further suppresses microbial activity, slowing organic matter decomposition and accumulation and leading to lower SOC contents. This vertical stratification intensifies the spatial heterogeneity of SOC within the study area, suggesting that the interplay of natural environmental conditions and altitude plays a critical role in SOC accumulation and distribution.
Moreover, given the complex terrain dominated by high mountains and hills, large contiguous tracts of farmland are exceedingly scarce, confining agricultural activities to a small range of localized areas. On the SOC distribution map, the high-SOC zones appear highly fragmented, underscoring that localized land-use practices have a profound impact on SOC accumulation.

4. Discussion

4.1. Application of SST Algorithm in Calibrating Pixel Spectra

When estimating soil organic carbon (SOC) using satellite hyperspectral data, spectral acquisition conditions—such as atmospheric interference, soil moisture, particle size distribution, and sensor characteristics—introduce systematic biases between satellite observations and laboratory measurements. Currently, the commonly used calibration algorithms are Direct Standardization (DS) and Piecewise Direct Standardization (PDS). The DS algorithm typically assumes that a global linear relationship exists across the entire spectral space, thereby enabling the calibration of satellite data to the measured spectra [16]. However, when varying measurement conditions and environmental variables cause spectral changes to exhibit non-uniformity and local characteristics, the global DS mapping may be overly simplistic, making it difficult to fine-tune complex local deviations. PDS addresses this issue by incorporating a segmented calibration approach, dividing the spectrum into sections and independently constructing linear mappings for each segment, thereby partially mitigating the problem [17]. In contrast, the core concept of the Spectral Space Transformation (SST) algorithm involves using a standard spectral library as an intermediary. Through linear algebraic projections and transformations, SST maps satellite pixel spectra affected by environmental and instrumental factors into the reference spectral space, thereby minimizing the interference of non-target factors in spectral feature extraction and quantitative inversion. Mathematically, SST achieves linear mapping from the satellite spectral domain to the reference spectral domain by constructing composite matrices, performing singular value decomposition (SVD), and utilizing the Moore–Penrose pseudoinverse, among other linear tools [23]. The representativeness and quantity of transfer samples significantly influence the accuracy and stability of SST calibration. In this study, the Kennard–Stone (KS) algorithm was employed to select a set of spectrally diverse samples from the field-measured spectral dataset, forming multiple transfer sets of varying sizes (t = 10, 20, 30, 40, 50, 60) to investigate the impact of transfer sample quantity on SST calibration performance. To quantitatively evaluate the effectiveness of SST calibration under different t-value schemes, spectral angle mapping (SAM) was used as the performance metric. SAM measures the morphological similarity between the calibrated spectra and reference spectra by calculating the angle θ between spectral vectors; a smaller θ indicates that the calibrated spectra more closely resemble the ideal reference conditions in terms of absorption peak positions and spectral curve shapes.

4.2. Feature Band Selection

In this study, three feature band selection algorithms demonstrated varying degrees of effectiveness in soil organic carbon (SOC) inversion. Table 3 presents the number of feature bands selected by each algorithm and their average correlation coefficients. The results indicate that the Competitive Adaptive Reweighted Sampling (CARS) algorithm has advantages in selecting strongly correlated bands and in the number of bands selected, which corresponds with the modeling accuracy results (Figure 9). Regarding the distribution of feature bands, the SOC feature bands identified in this study are primarily located in the ranges of 480~489 nm, 500~519 nm, 560~579 nm, 1150~1169 nm, 1210~1219 nm, 1240~1279 nm, 1380~1389 nm, 1910~1919 nm, 2180~2189 nm, 2250~2259 nm, and 2310~2319 nm, encompassing multiple spectral regions including visible, near-infrared, and mid-infrared wavelengths.
Previous studies have demonstrated that the CARS algorithm exhibits high stability in identifying feature bands related to the SOC content [45]. CARS employs Monte Carlo sampling to randomly select variable subsets multiple times, thereby avoiding a reliance on a single initial condition and local optimal solutions. This approach ensures a more comprehensive search of the hyperspectral feature space, reducing the likelihood of missing critical bands. Furthermore, CARS adaptively reweights and progressively eliminates weakly correlated or irrelevant bands, meaning the final retained bands more accurately reflect the characteristic spectral signals of SOC and effectively reducing redundancy and noise [46]. Additionally, the coupling of CARS with the partial least squares (PLS) model further enhances the comprehensive capability of feature selection, addressing issues such as multicollinearity, weak absorption features, and nonlinear relationships in hyperspectral data.

4.3. SOC Prediction and Spatial Distribution

The results presented in Figure 10 demonstrate that the random forest (RF) model exhibits high accuracy in predicting soil organic carbon (SOC). Most predicted values align closely with the 1:1 line relative to actual measurements, indicating that the inversion model effectively captures the spatial variability of SOC. However, some sample points exhibit overestimation or underestimation. Overall, the RF model demonstrates significant practicality and reliability in estimating the SOC content based on calibrated spectral data, and for those reasons it is a commonly used tool in this field. Utilizing ZY1-02D hyperspectral images, we generated spatial distribution maps of the SOC content (Figure 11) based on the optimal predictive model. The predicted SOC content ranged from 5.38 to 34.86 g·kg⁻¹, with an average of 17.13 g·kg⁻¹ and a standard deviation of 3.86 g·kg⁻¹. The relatively small standard deviation of the SOC distribution map produced by the model indicates a robust predictive performance across different regions, effectively reflecting the spatial distribution characteristics of SOC. Furthermore, the range, mean, and standard deviation of the predicted SOC content are consistent with the characteristics of measurement data from sampling points, further confirming the model’s stability in simulation accuracy and its ability to capture spatial heterogeneity.
The spatial distribution of SOC in the study area is characterized by high concentrations in the low mountainous hills and piedmont regions and lower concentrations in the high-altitude mountainous areas. This indicates that the pronounced vertical geographical differences within the region significantly influence the spatial distribution of SOC. In the low mountainous hills and piedmont areas, the relatively low altitude and warm, humid climate encourage the growth of forest vegetation, leading to a higher SOC content. The abundant organic inputs from forests, coupled with a relatively stable soil environment, foster the effective accumulation of SOC [47]. In contrast, SOC concentrations are low in the central river valley area. This may be due to the region’s dense urban development and extensive impervious surfaces, which significantly reduce soil aeration and water infiltration [48]. Additionally, the discharge of pollutants such as heavy metals and organic contaminants can inhibit soil microbial activity. These environmental conditions limit vegetation growth and organic matter inputs, thereby reducing the sources of soil organic carbon.

4.4. Study Limitations and Alternatives

This study establishes a quantitative relationship between in situ measurements and satellite spectral data by using a spectral correction algorithm, offering an effective approach for predicting the soil organic carbon content based on ZY1-02D AHSI imagery. However, any hyperspectral imaging-based method for estimating soil physicochemical parameters is invariably influenced by multiple factors, such as soil moisture, structure, shadowing, and surface roughness [41]. Therefore, further systematic analysis and validation are necessary to assess the effectiveness of the spectral correction algorithm under various environmental and soil conditions. Additionally, when agricultural regions are selected as the study area, the temporal mismatch between soil sampling and satellite image acquisition can undermine the reliability of soil organic carbon estimations. Agricultural activities typically follow seasonal or staged patterns, and processes such as tillage, fertilization, irrigation, and crop growth can substantially affect the dynamic changes in soil organic carbon. Discrepancies between sampling and satellite observation dates may lead to a temporal offset in the soil spectral data and the collected field samples with respect to organic carbon levels, negatively impacting the accuracy and representativeness of the inversion model. To alleviate this shortcoming, future research should endeavor to minimize the lag between sampling and satellite imaging or utilize multi-temporal images coupled with field monitoring data to capture soil organic carbon variations across different time periods more precisely, thereby enhancing model generalizability and result reliability.

5. Conclusions

In this study, we predicted the SOC content in the Shangzhou District of Shangluo City using a combination of ZY1-02D hyperspectral satellite imagery, measured hyperspectral data, and field sampling data, and a regional SOC distribution map was created. The main research findings are as follows:
  • The SST algorithm was used to establish the transmission relationship between the measured spectra of the samples and the pixel spectra, allowing for spectral calibration of the ZY1-02D hyperspectral satellite imagery. After applying the SST algorithm, the optimal inversion model achieved an accuracy of 0.81, indicating that the SST algorithm is a feasible and reliable method for calibrating ZY1-02D hyperspectral images.
  • Continuous wavelet transformation (CWT) was more effective than other spectral processing methods in removing noise from satellite hyperspectral data. However, as the decomposition scale increases, the spectral feature differences between different SOC contents gradually decrease, resulting in reduced SOC prediction accuracy at higher decomposition scales.
  • The three feature band selection methods can effectively preserve the integrity and physical significance of hyperspectral data, but their ability to improve SOC prediction accuracy varies. Among them, CARS proved to be the most effective in enhancing the accuracy of SOC prediction models.
  • The significant geographic vertical distribution differences and the fragmented topography in the study area jointly contribute to the spatial distribution patterns of soil organic carbon (SOC) in the region. The random forest (RF) model captures more spatial variability information and expresses spatial heterogeneity more accurately, making it an efficient method for predicting the SOC content in complex topographical areas.

Author Contributions

Conceptualization, F.Y.; formal analysis, F.Y. and Y.H.; funding acquisition, F.Y.; investigation, Y.H., J.Y. and F.Y.; methodology, Y.H. and F.Y.; project administration, F.Y.; resources, F.Y.; writing—original draft, Y.H.; writing—review and editing, B.W., F.Y., J.Y. and L.H.; Y.H. and J.Y. made equal contributions. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (42071258) and by the Key Research and Development Program of Shaanxi (2024SF-YBXM-570).

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Li, Y.; Duan, X.; Li, Y.; Li, Y.; Zhang, L. Interactive Effects of Land Use and Soil Erosion on Soil Organic Carbon in the Dry-Hot Valley Region of Southern China. Catena 2021, 201, 105187. [Google Scholar] [CrossRef]
  2. Hong, Y.; Chen, S.; Chen, Y.; Linderman, M.; Mouazen, A.M.; Liu, Y.; Guo, L.; Yu, L.; Liu, Y.; Cheng, H.; et al. Comparing Laboratory and Airborne Hyperspectral Data for the Estimation and Mapping of Topsoil Organic Carbon: Feature Selection Coupled with Random Forest. Soil Tillage Res. 2020, 199, 104589. [Google Scholar] [CrossRef]
  3. Shi, T.; Guo, L.; Chen, Y.; Wang, W.; Shi, Z.; Li, Q.; Wu, G. Proximal and Remote Sensing Techniques for Mapping of Soil Contamination with Heavy Metals. Appl. Spectrosc. Rev. 2018, 53, 783–805. [Google Scholar] [CrossRef]
  4. Imani, M.; Ghassemian, H. An Overview on Spectral and Spatial Information Fusion for Hyperspectral Image Classification: Current Trends and Challenges. Inf. Fusion 2020, 59, 59–83. [Google Scholar] [CrossRef]
  5. Li, X.; Qiu, H.; Fan, P. A Review of Spectral Feature Extraction and Multi-Feature Fusion Methods in Predicting Soil Organic Carbon. Appl. Spectrosc. Rev. 2024, 60, 1–24. [Google Scholar] [CrossRef]
  6. Zheng, M.; Wang, X.; Li, S.; Zhang, L.; Song, K. Remote sensing inversion of soil organic matter and total nitrogen in black soil region. Sci. Geogr. Sin. 2022, 42, 1336–1347. [Google Scholar]
  7. Ren, B.; Chen, H.; Zhang, L.; Nie, X.; Xing, S.; Fan, X. Comparison of machine learning for predicting and mapping soil organic carbon in cultivated land in a subtropical complex geomorphic region. Chin. J. Eco-Agric. 2021, 29, 1042–1050. [Google Scholar]
  8. Zhao, W.; Ma, H.; Zhou, C.; Zhou, C.; Li, Z. Soil Salinity Inversion Model Based on BPNN Optimization Algorithm for UAV Multispectral Remote Sensing. IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens. 2023, 16, 6038–6047. [Google Scholar] [CrossRef]
  9. Jin, H.; Peng, J.; Bi, R.; Tian, H.; Zhu, H.; Ding, H. Comparing Laboratory and Satellite Hyperspectral Predictions of Soil Organic Carbon in Farmland. Agronomy 2024, 14, 175. [Google Scholar] [CrossRef]
  10. Meng, X.; Bao, Y.; Liu, J.; Liu, H.; Zhang, X.; Zhang, Y.; Wang, P.; Tang, H.; Kong, F. Regional Soil Organic Carbon Prediction Model Based on a Discrete Wavelet Analysis of Hyperspectral Satellite Data. Int. J. Appl. Earth Obs. Geoinf. 2020, 89, 102111. [Google Scholar] [CrossRef]
  11. Song, Q.; Gao, X.; Song, Y.; Li, Q.; Chen, Z.; Li, R.; Zhang, H.; Cai, S. Estimation of soil organic carbon content in farmland based on UAV hyperspectral images: A case study of farmland in the Huangshui River basin. Remote Sens. Nat. Resour. 2024, 36, 160–172. [Google Scholar]
  12. Song, Q.; Gao, X.; Song, Y.; Li, Q.; Chen, Z.; Li, R.; Zhang, H.; Cai, S. Estimation and Mapping of Soil Texture Content Based on Unmanned Aerial Vehicle Hyperspectral Imaging. Sci. Rep. 2023, 13, 14097. [Google Scholar] [CrossRef]
  13. Han, Y.; Ke, Y.; Wang, Z.; Liang, D.; Zhou, D. Classification of the Yellow River Delta wetland landscape based on ZY-1 02D hyperspectral imagery. Natl. Remote Sens. Bull. 2023, 27, 1387–1399. [Google Scholar] [CrossRef]
  14. Huang, Y.; Tian, Y.; Zhang, Q.; Tao, J.; Zhang, Y.; Yang, Y.; Lin, J. Estimation of Aboveground Biomass of Mangroves in Maowei Sea of Beibu Gulf Based on ZY-1-02D Satellite Hyperspectral Data. Spectrosc. Spectr. Anal. 2023, 43, 3906–3915. [Google Scholar]
  15. Liu, L.; Ji, M.; Buchroithner, M. Transfer Learning for Soil Spectroscopy Based on Convolutional Neural Networks and Its Application in Soil Clay Content Mapping Using Hyperspectral Imagery. Sensors 2018, 18, 3169. [Google Scholar] [CrossRef]
  16. Zhang, B.; Guo, B.; Zou, B.; Wei, W.; Lei, Y.; Li, T. Retrieving Soil Heavy Metals Concentrations Based on GaoFen-5 Hyperspectral Satellite Image at an Opencast Coal Mine, Inner Mongolia, China. Environ. Pollut. 2022, 300, 118981. [Google Scholar] [CrossRef] [PubMed]
  17. Wang, J.; Meng, F.; Liu, L.; Cui, W. Application of Sample Selection and PDS-PLS Algorithms in Near Infrared Spectra Analysis Model Transfer. Acta Armamentarii 2016, 37, 91–96. [Google Scholar]
  18. Liu, L.; Ji, M.; Buchroithner, M. Combining Partial Least Squares and the Gradient-Boosting Method for Soil Property Retrieval Using Visible Near-Infrared Shortwave Infrared Spectra. Remote Sens. 2017, 9, 1299. [Google Scholar] [CrossRef]
  19. Liu, Z.; Xu, L.; Chen, X. Near Infrared Spectroscopy Transfer Based on Deep Autoencoder. Spectrosc. Spectr. Anal. 2020, 40, 2313–2318. [Google Scholar]
  20. Li, C.; Xie, W.; Wang, Q.; Yu, F.; Hao, Z.; Yuan, Z. Spatial-temporal variations of coupling relationship between ecosystem services and human well-being in Shangluo City. Chin. J. Ecol. 2024, 43, 2694–2701. [Google Scholar]
  21. Ye, B.; Tian, S.; Cheng, Q.; Ge, Y. Application of Lithological Mapping Based on Advanced Hyperspectral Imager (AHSI) Imagery Onboard Gaofen-5 (GF-5) Satellite. Remote Sens. 2020, 12, 3990. [Google Scholar] [CrossRef]
  22. Yang, N.; Li, L.; Han, L.; Gao, K.; Qu, S.; Li, J. Retrieving Heavy Metal Concentrations in Urban Soil Using Satellite Hyperspectral Imagery. Int. J. Appl. Earth Obs. Geoinf. 2024, 132, 104079. [Google Scholar] [CrossRef]
  23. Li, X.; Fan, P.; Hou, G.; Qiu, H.; Lv, H. A Review of Calibration Transfer Based on Spectral Technology. Spectrosc. Spectr. Anal. 2021, 41, 1114–1118. [Google Scholar]
  24. Xu, Z.; Fan, S.; Liu, J.; Liu, B.; Tao, L.; Wu, J.; Hu, S.; Zhao, L.; Wang, Q.; Wu, Y. A Calibration Transfer Optimized Single Kernel Near-Infrared Spectroscopic Method. Spectrochim. Acta Part A Mol. Biomol. Spectrosc. 2019, 220, 117098. [Google Scholar] [CrossRef] [PubMed]
  25. Guo, Z.; Li, Y.; Wang, X.; Gong, X.; Chen, Y.; Cao, W. Remote Sensing of Soil Organic Carbon at Regional Scale Based on Deep Learning: A Case Study of Agro-Pastoral Ecotone in Northern China. Remote Sens. 2023, 15, 3846. [Google Scholar] [CrossRef]
  26. Ye, M.; Zhu, L.; Liu, X.; Huang, Y.; Chen, B.; Li, H. Hyperspectral Inversion of Soil Organic Matter Content Based on Continuous Wavelet Transform, SHAP, and XGBoost. Environ. Sci. 2024, 45, 2280–2291. [Google Scholar]
  27. Gou, Y.; Zhao, Y.; Li, Y.; Zhuo, Z.; Cao, M.; Huang, Y. Soil Organic Matter Content in Dryland Farmland in Northeast China with Hyperspectral Reflectance Based on CWT-sCARS. Trans. Chin. Soc. Agric. Mach. 2022, 53, 331–337. [Google Scholar]
  28. Guo, J.; Zhu, Q.; Zhao, X.; Guo, X.; Han, Y.; Xu, Z. Hyper-spectral inversion of soil organic carbon content under different land use types. Chin. J. Appl. Ecol. 2020, 31, 863–871. [Google Scholar]
  29. Xu, Z.; Chen, L.; Xiang, S.; Deng, X.; Li, Y.; Yu, H.; He, A.; Li, Z.; Guo, X. Band Selection and Its Construction for the Normalized Shadow Vegetation Index (NSVI) of ZY1-02D AHSI Image. Spectrosc. Spectr. Anal. 2024, 44, 2626–2637. [Google Scholar]
  30. Li, Z.; Gao, S.; Wang, C.; Liu, G.; Hu, D. Study on Space-based Hyperspectral Detection Parameters for Quantitative Retrieval of Organic Carbon in Multiple Types of Soil. Soils 2024, 56, 639–645. [Google Scholar]
  31. Li, T.; Xia, A.; Mclaren, T.I.; Pandey, R.; Xu, Z.; Liu, H.; Manning, S.; Madgett, O.; Duncan, S.; Rasmussen, P.; et al. Preliminary Results in Innovative Solutions for Soil Carbon Estimation: Integrating Remote Sensing, Machine Learning, and Proximal Sensing Spectroscopy. Remote Sens. 2023, 15, 5571. [Google Scholar] [CrossRef]
  32. Ding, S.; Zhang, X.; Shang, K.; Li, R.; Sun, W. Estimating soil heavy metal from hyperspectral remote sensing images base on fractional order derivative. Natl. Remote Sens. Bull. 2023, 27, 2191–2205. [Google Scholar] [CrossRef]
  33. Yuan, K.; Zhang, C.; Zhao, J.; Wang, Z.; Yang, J.; Xu, Z. Comparative Analysis on Models for Predicting the Spatial Distribution of Soil Organic Carbon Density with Limited Samples. Res. Soil Water Conserv. 2024, 31, 173–181+191. [Google Scholar]
  34. Jing, C.; Zhou, W.; Qian, Y. A new approach to mapping tree diversity based on remote sensing imagery. Acta Ecol. Sin. 2019, 39, 8383–8391. [Google Scholar]
  35. Chang, R.; Chen, Z.; Wang, D.; Guo, K. Hyperspectral Remote Sensing Inversion and Monitoring of Organic Matter in Black Soil Based on Dynamic Fitness Inertia Weight Particle Swarm Optimization Neural Network. Remote Sens. 2022, 14, 4316. [Google Scholar] [CrossRef]
  36. Zou, Z.; Wang, Q.; Wu, Q.; Li, M.; Zhen, J.; Yuan, D.; Zhou, M.; Xu, C.; Wang, Y.; Zhao, Y.; et al. Inversion of Heavy Metal Content in Soil Using Hyperspectral Characteristic Bands-Based Machine Learning Method. J. Environ. Manag. 2024, 355, 120503. [Google Scholar] [CrossRef]
  37. Li, D.; Wang, X.; Li, K.; Guo, Y. Estimation of Soil Organic Carbon Content in the Bohu Basin Based on Synthetic Images and Multi-variables. Environ. Sci. 2024, 45, 1–15. [Google Scholar]
  38. Wang, W.; Peng, J.; Zhu, W.; Yang, B.; Liu, Z.; Gong, H.; Wang, J.; Yang, T.; Lou, J.; Sun, Z. Study on Retrieval Method of Soil Organic Matter in Salinity Soil Using Unmanned Aerial Vehicle Remote Sensing. J. Geo-Inf. Sci. 2024, 26, 736–752. [Google Scholar]
  39. Zhang, S.; Lu, M.; Wen, C.; Song, Y.; Kang, L.; Shen, J.; Yang, M. Study of soil salinity remote sensing inversion method integrating crop type. Bull. Surv. Mapp. 2024, 32, 1–7. [Google Scholar]
  40. Laamrani, A.; Berg, A.A.; Voroney, P.; Feilhauer, H.; Blackburn, L.; March, M.; Dao, P.D.; He, Y.; Martin, R.C. Ensemble Identification of Spectral Bands Related to Soil Organic Carbon Levels over an Agricultural Field in Southern Ontario, Canada. Remote Sens. 2019, 11, 1298. [Google Scholar] [CrossRef]
  41. Gu, X.; Wang, Y.; Sun, Q.; Yang, G.; Zhang, C. Hyperspectral Inversion of Soil Organic Matter Content in Cultivated Land Based on Wavelet Transform. Comput. Electron. Agric. 2019, 167, 105053. [Google Scholar] [CrossRef]
  42. Tian, Y.; Wang, Z.; Xie, P. Quantitative Hyperspectral Inversion of Soil Heavy Metals based on Feature Screening Combined with PSO-BPNN and GA-BPNN Algorithms. Remote Sens. Technol. Appl. 2024, 39, 259–268. [Google Scholar]
  43. Tang, X.; Li, H.; Cui, L.; Zhao, X.; Zhai, X.; Lei, Y.; Li, J.; Wang, J.; Li, W. Inversion of Wetland Plant Species Diversity Using UAV Hyperspectral Data. J. Geo-Inf. Sci. 2024, 26, 1954–1974. [Google Scholar]
  44. Wang, Y.; Zou, B.; Li, S.; Tian, R.; Zhang, B.; Feng, H.; Tang, Y. A Hierarchical Residual Correction-Based Hyperspectral Inversion Method for Soil Heavy Metals Considering Spatial Heterogeneity. J. Hazard. Mater. 2024, 479, 135699. [Google Scholar] [CrossRef] [PubMed]
  45. Guo, H.; Zhang, R.; Dai, W.; Zhou, X.; Zhang, D.; Yang, Y.; Cui, J. Mapping Soil Organic Matter Content Based on Feature Band Selection with ZY1-02D Hyperspectral Satellite Data in the Agricultural Region. Agronomy 2022, 12, 2111. [Google Scholar] [CrossRef]
  46. Zhang, W.; Zhang, X.; Wu, W.; Liu, H. The Spatial Variability of Temporal Changes in Soil Organic Carbon and Its Drivers in a Mountainous Agricultural Region of China. Catena 2024, 246, 108402. [Google Scholar] [CrossRef]
  47. Guo, H.; Du, E.; Terrer, C.; Jackson, R.B. Global Distribution of Surface Soil Organic Carbon in Urban Greenspaces. Nat. Commun. 2024, 15, 806. [Google Scholar] [CrossRef] [PubMed]
  48. Calvo de Anta, R.; Luís, E.; Febrero-Bande, M.; Galiñanes, J.; Macías, F.; Ortíz, R.; Casás, F. Soil Organic Carbon in Peninsular Spain: Influence of Environmental Factors and Spatial Distribution. Geoderma 2020, 370, 114365. [Google Scholar] [CrossRef]
Figure 1. Geographic location of Shangzhou District and distribution of soil sampling sites. (ac) Field conditions of the sample area.
Figure 1. Geographic location of Shangzhou District and distribution of soil sampling sites. (ac) Field conditions of the sample area.
Remotesensing 17 00600 g001
Figure 2. Technical route for hyperspectral inversion of soil organic carbon in farmland in the mountainous area of southern Shaanxi, China.
Figure 2. Technical route for hyperspectral inversion of soil organic carbon in farmland in the mountainous area of southern Shaanxi, China.
Remotesensing 17 00600 g002
Figure 3. Statistical characterization of SOC content.
Figure 3. Statistical characterization of SOC content.
Remotesensing 17 00600 g003
Figure 4. Spectral reflectance curves of soil samples and ZY1-02D AHSI image element spectra before and after DS algorithm correction. (a) laboratory spectra measured by ASD; (b) field spectra retrieved from ZY1-02D AHSI image; (c) field spectra calibrated by SST algorithm using laboratory spectra; (d) spectral angle mapper (θ) used to evaluate the performance of the SST algorithm.
Figure 4. Spectral reflectance curves of soil samples and ZY1-02D AHSI image element spectra before and after DS algorithm correction. (a) laboratory spectra measured by ASD; (b) field spectra retrieved from ZY1-02D AHSI image; (c) field spectra calibrated by SST algorithm using laboratory spectra; (d) spectral angle mapper (θ) used to evaluate the performance of the SST algorithm.
Remotesensing 17 00600 g004
Figure 5. Differences in soil reflectance curves at different CWT decomposition scales.
Figure 5. Differences in soil reflectance curves at different CWT decomposition scales.
Remotesensing 17 00600 g005
Figure 6. Results of correlation analysis between different spectral transformations and SOC contents.
Figure 6. Results of correlation analysis between different spectral transformations and SOC contents.
Remotesensing 17 00600 g006
Figure 7. Feature bands screened by VIP. The gray dashed lines indicate bands with VIP > 1 (marked with a red “×” on the line) and bands with VIP < 1 (marked with a blue “×” on the line).
Figure 7. Feature bands screened by VIP. The gray dashed lines indicate bands with VIP > 1 (marked with a red “×” on the line) and bands with VIP < 1 (marked with a blue “×” on the line).
Remotesensing 17 00600 g007
Figure 8. Screening process and characteristic band distribution by CARS method. (a) Process for selecting the optimal number of bands through CARS; (b) distribution of characteristic bands at the L4 scale; the red-highlighted boxes indicate the characteristic bands selected by CARS corrected data.
Figure 8. Screening process and characteristic band distribution by CARS method. (a) Process for selecting the optimal number of bands through CARS; (b) distribution of characteristic bands at the L4 scale; the red-highlighted boxes indicate the characteristic bands selected by CARS corrected data.
Remotesensing 17 00600 g008
Figure 9. Comparison of the accuracies of different models.
Figure 9. Comparison of the accuracies of different models.
Remotesensing 17 00600 g009
Figure 10. Scatterplot of the best model for the calibrated spectral data.
Figure 10. Scatterplot of the best model for the calibrated spectral data.
Remotesensing 17 00600 g010
Figure 11. Spatial distribution map of SOC in the study area.
Figure 11. Spatial distribution map of SOC in the study area.
Remotesensing 17 00600 g011
Table 1. Technical specifications of a few hyperspectral satellite systems.
Table 1. Technical specifications of a few hyperspectral satellite systems.
ImagerSpectral BandsSpectral Range (nm)Spatial
Resolution
Spectral
Resolution
Time
Resolution
SNRSNR Condition
Hyperion220400–250030 m10 nm16 d161 @ 550 nm
147 @ 700 nm
110 @ 1125 nm
40 @ 2125 nm
Nadir looking 60°, sun-zenith angle, 0.3 Earth albedo
CHRIS37400–105017/34 m6–33 nm3 d1.3 nm@410 nm
12 nm@1050 nm
Solar synchronous orbit, altitude 615 km, obliquity 97.89°
HJ-1128459–956100 m5 nm4 d2–3 nm@460 nm
3–5 nm@552 nm
5–7 nm@716 nm
7–8 nm@848 nm
±30° side swaying for global repetitive observations
MODIS36400–1400250 m (1–2 bands)
500 m (3–7)
1000 m (8–36)
5–10 nm1–2 d128@620–720 nm 201@841–876 nmDerailed transit at 10:30 a.m., transiting at 1:30 p.m. transit; sun synchronization; near-polar circular orbit
GaoFen-5330400–250030 m5 nm (400–1000 nm)
10 nm (1000–2500 nm)
7 dThe maximum SNR
450–2500 nm is 500
Sun-synchronous orbit with an inclination of 98.2°, orbital height is approximately 705 km
ZY1E-02D166400–250030 m10 nm (400–1000 nm)
20 nm (1000–2500 nm)
3 d654@500 nm
379@900 nm
511@1200 nm
447@1800 nm
285@2400 nm
Sun-synchronous orbit with an inclination of 98.5°, orbital height is approximately 778 km
Note: The SNR column reports the signal-to-noise ratio values and the wavelength at which they are calculated. The specific meaning of @ refers to the position corresponding to the calculation of the SNR.
Table 2. Method of spectral transformation.
Table 2. Method of spectral transformation.
Spectral Math TransformationFormulasDescription
1/R 1 / R i i is the wavelength; Δ i is the wavelength spacing;   R i is the spectral reflectance at wavelength i ; R i + Δ i is the spectral reflectance at interval R i with Δ i ; R i is the first-order differential spectrum at wavelength i ;   R i + Δ i is the one-section differential spectrum at interval R i with Δ i ; S i is the original spectral profile; C i is the continuum profile.
lg(1/R) lg 1 / R i
FDR R i + Δ i   -   R i / Δ i
SDR R i + Δ i -   R i / Δ i
CR S i / C i
Table 3. Feature band statistics.
Table 3. Feature band statistics.
Selection AlgorithmSpectral TransformationNumber of Feature BandsAverage Correlation Coefficient
VIP1/R490.32
lg(1/R)540.36
FDR480.41
SDR460.38
CR430.40
L3450.46
L4430.49
L5460.43
L6480.45
SPA1/R300.38
lg(1/R)340.42
FDR260.44
SDR280.43
CR360.41
L3280.52
L4320.46
L5250.54
L6330.47
CARS1/R100.42
lg(1/R)130.40
FDR160.54
SDR150.48
CR200.44
L3130.52
L4190.56
L5170.45
L6130.42
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Han, Y.; Wang, B.; Yang, J.; Yin, F.; He, L. Research on Hyperspectral Inversion of Soil Organic Carbon in Agricultural Fields of the Southern Shaanxi Mountain Area. Remote Sens. 2025, 17, 600. https://doi.org/10.3390/rs17040600

AMA Style

Han Y, Wang B, Yang J, Yin F, He L. Research on Hyperspectral Inversion of Soil Organic Carbon in Agricultural Fields of the Southern Shaanxi Mountain Area. Remote Sensing. 2025; 17(4):600. https://doi.org/10.3390/rs17040600

Chicago/Turabian Style

Han, Yunhao, Bin Wang, Jingyi Yang, Fang Yin, and Linsen He. 2025. "Research on Hyperspectral Inversion of Soil Organic Carbon in Agricultural Fields of the Southern Shaanxi Mountain Area" Remote Sensing 17, no. 4: 600. https://doi.org/10.3390/rs17040600

APA Style

Han, Y., Wang, B., Yang, J., Yin, F., & He, L. (2025). Research on Hyperspectral Inversion of Soil Organic Carbon in Agricultural Fields of the Southern Shaanxi Mountain Area. Remote Sensing, 17(4), 600. https://doi.org/10.3390/rs17040600

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop