Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Next Article in Journal
A Novel Framework for Integrally Evaluating the Impacts of Climate Change and Human Activities on Water Yield Services from Both Local and Global Perspectives
Previous Article in Journal
Field Measurements of Spatial Air Emissions from Dairy Pastures Using an Unmanned Aircraft System
Previous Article in Special Issue
Exploring the Potential of PRISMA Satellite Hyperspectral Image for Estimating Soil Organic Carbon in Marvdasht Region, Southern Iran
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Technical Note

Estimation of Soil Organic Carbon Density on the Qinghai–Tibet Plateau Using a Machine Learning Model Driven by Multisource Remote Sensing

1
College of Resources and Environment, Shandong Agricultural University, Taian 271018, China
2
Chongqing Jinfo Mountain Karst Ecosystem National Observation and Research Station, School of Geographical Sciences, Southwest University, Chongqing 400715, China
3
Key Laboratory of Agricultural Remote Sensing, Institute of Agricultural Resources and Regional Planning, Chinese Academy of Agricultural Sciences, Beijing 100081, China
*
Author to whom correspondence should be addressed.
Remote Sens. 2024, 16(16), 3006; https://doi.org/10.3390/rs16163006
Submission received: 20 June 2024 / Revised: 31 July 2024 / Accepted: 14 August 2024 / Published: 16 August 2024

Abstract

:
Soil organic carbon (SOC) plays a vital role in the global carbon cycle and soil quality assessment. The Qinghai–Tibet Plateau is one of the largest plateaus in the world. Therefore, in this region, SOC density and the spatial distribution of SOC are highly sensitive to climate change and human intervention. Given the insufficient understanding of the spatial distribution of SOC density in the Qinghai–Tibet Plateau, this study utilized machine learning (ML) algorithms to estimate the density and distribution pattern of SOC density in the region. In this study, we first collected multisource data, such as optical remote sensing data, synthetic aperture radar) (SAR) data, and other environmental variables, including socioeconomic factors, topographic factors, climate factors, and soil properties. Then, we used ML algorithms, namely random forest (RF), extreme gradient boosting (XGBoost), and light gradient boosting machine (LightGBM), to estimate the topsoil SOC density and spatial distribution patterns of SOC density. We also aimed to investigate any driving factors. The results are as follows: (1) The average SOC density is 5.30 kg/m2. (2) Among the three ML algorithms used, LightGBM showed the highest validation accuracy (R2 = 0.7537, RMSE = 2.4928 kgC/m2, MAE = 1.7195). (3) The normalized difference vegetation index (NDVI), valley depth (VD), and temperature are crucial in predicting the spatial distribution of topsoil SOC density. Feature importance analyses conducted using the three ML models all showed these factors to be among the top three in importance, with contribution rates of 14.08%, 12.29%, and 14.06%; 17.32%, 20.73%, and 24.62%; and 16.72%, 11.96%, and 20.03%. (4) Spatially, the southeastern part of the Qinghai–Tibet Plateau has the highest topsoil SOC density, with recorded values ranging from 8.41 kg/m2 to 13.2 kg/m2, while the northwestern part has the lowest density, with recorded values ranging from 0.85 kg/m2 to 2.88 kg/m2. Different land cover types showed varying SOC density values, with forests and grasslands having higher SOC densities compared to urban and bare land areas. The findings of this study provide a scientific basis for future soil resource management and improved carbon sequestration accounting in the Qinghai–Tibet Plateau.

1. Introduction

Soil organic carbon (SOC) is the largest terrestrial carbon sink on Earth; as such, it plays a critical role in ecosystem processes and in climate change [1,2,3]. The global SOC reserves are between 1395 and 2200 Pg (1 Pg = 1015 g) [4,5]. Even small fluctuations in soil carbon pools can significantly affect global levels of greenhouse gases, with consequent impacts on climate change [6]. SOC also impacts the quality, function, and health of soil, with implications for plant health and global food security [7]. Physical properties of soil such as hydrostructural stability, specific bulk volume, and water retention can all affect SOC [8]. In addition, the content and stability of SOC may impact soil fertility, plant growth, soil structure, erosion control, and vegetation productivity [9]. Topsoil SOC is also important for SOC storage. In China, for example, topsoil SOC storage accounts for 40% of total SOC storage [10]. Topsoil (0–20 cm) SOC plays a pivotal role in soil fertility, productivity, and water retention, and forms an integral part of the atmospheric–terrestrial carbon exchange cycle mediated via photosynthesis [11]. In the context of global change, topsoil SOC is directly affected by changes in temperature and precipitation, and also by human interventions.
In recent years, with the rapid development of optical, microwave, and hyperspectral remote sensing technologies, multisource remote sensing data have provided rich data support for SOC modeling. In particular, hyperspectral remote sensing technology can provide direct spectral information for SOC estimation in small-scale bare soil areas [12,13]. This method achieves accurate evaluation of SOC density by directly analyzing the spectral characteristics of soil. In a wider range of applications, visible light, near-infrared, and short-wave infrared (VNIR-SWIR, 400–2500 nm) remote sensing technologies have become important estimation methods for key indicators in soil monitoring [14]. In recent years, hyperspectral technology carried on small unmanned aerial vehicles has also confirmed its effectiveness in quickly and accurately obtaining high-resolution SOC spatial information [15]. However, despite the enormous potential of synthetic aperture radar (SAR) data, such as those provided by Sentinel-1, in mapping soil characteristics [16], its application in predicting SOC density of large-scale ecosystems in the Qinghai Tibet Plateau is still limited. The uniqueness of SAR data lies in the ability to penetrate cloud cover and vegetation cover, providing ground reflectance information, which may bring new opportunities for predicting the spatial distribution of soil characteristics, especially in complex terrain or high humidity areas that are difficult to reach with traditional optical remote sensing [17,18,19]. In addition, the application of multisource remote sensing data combined with ML models in SOC simulation has become increasingly widespread. For example, some studies have used land satellite thematic mapper (TM) images combined with enhanced regression tree (BRT) models to predict the organic carbon content of topsoil in an alpine environment [20]. There are also studies that use Sentinel-2 data combined with ML algorithms to map the distribution of organic carbon in farmland soil [21]. In estimation models, ML models typically have higher accuracy than linear regression models [14,22]. Common ML models include random forest (RF), support vector machine (SVM), and extreme gradient boost (XGBoost) [23]. Compared with traditional gradient boosting machine (GBM) algorithms, LightGBM exhibits more efficient performance and higher accuracy, and has been widely applied in various fields [24,25,26]. However, there is relatively little research using LightGBM to estimate the SOC of alpine ecosystems. In addition to remote sensing data, environmental covariates such as terrain, climate factors, and soil characteristics are also widely used in SOC mapping at the national and regional levels [27,28]. These covariates provide additional information that helps to correct and refine the SOC estimation results of remote sensing data. For example, terrain features can reveal the impact of hydrological processes and erosion on SOC distribution [29], while climate parameters affect the speed and efficiency of biogeochemical cycles [30]. There is a good correlation between soil characteristics, such as soil bulk density, organic matter content, and total phosphorus content, and environmental factors [31]. A study of soil landscape quantitative models shows a significant correlation between soil properties and terrain factors, which further proves the necessity of considering soil characteristics in SOC prediction models [32]. Therefore, integrating these covariates in SOC prediction models is crucial as they can help us better understand the spatial variability and dynamic change patterns of SOC and improve the accuracy of SOC spatial prediction [33]. Although multisource remote sensing technology and ML models have made progress in SOC prediction, the following urgent issues still exist: (1) Although SAR data have unique advantages in penetrating clouds and vegetation cover, how to effectively utilize SAR data to predict SOC density in the large-scale ecosystems of the Qinghai–Tibet Plateau remains a challenge. (2) A further challenge lies in how to effectively integrate multisource remote sensing data (such as optical and SAR data) to improve the accuracy of SOC prediction, especially in complex terrain conditions. (3) It is yet to be determined which environmental covariates can most improve the accuracy of SOC prediction models and the ensemble method of these covariates is yet to be optimized. (4) Although LightGBM performs well in many applications, further research is needed to select the most suitable ML algorithm for specific SOC prediction needs. (5) How to use remote sensing and ML technology to capture the spatiotemporal variation patterns of SOC, especially in the context of climate change, requires further attention.
The Qinghai–Tibet Plateau is a sensitive region to global change, and its SOC storage plays a crucial role in the carbon cycling process of the ecosystem [34]. The region is highly sensitive to climate change, and global warming has led to the continuous expansion of the permafrost melting area, exacerbating the instability of soil organic carbon in the permafrost wetlands [35]. Therefore, the estimation of SOC on the Qinghai–Tibet Plateau has attracted widespread attention from Chinese scholars [36,37,38]. Their research indicates that the SOC storage in the grasslands of the Qinghai–Tibet Plateau can reach 33.52 Pg C, accounting for 2.5% of the global soil carbon pool [39]. With continuous global warming and human intervention, in-depth research on SOC density and its change patterns in the Qinghai–Tibet Plateau will provide a scientific basis for improving carbon sequestration and ecosystem protection in high-altitude ecosystems. To compensate for the shortcomings of a single data source and to conduct effective data collection and analysis under various weather conditions, this study combined SAR and optical remote sensing data, as well as multiple environmental variables, and used three ML models, namely RF, SVM, and LightGBM, to predict the SOC density of the Qinghai–Tibet Plateau. Of these models, using the LightGBM model to predict the topsoil organic carbon density in the Qinghai–Tibet Plateau region has been rarely seen in previous studies. The purpose of this study is as follows: (1) to calculate the SOC density level over the 2015; (2) to explore the application of SAR polar data in SOC density estimation; and (3) to identify the main driving factors affecting the spatial variation of SOC density in the Qinghai–Tibet Plateau. These efforts will provide a new scientific basis for future soil resource management and carbon sink accounting in the Qinghai–Tibet Plateau, and provide substantive guidance for the protection and management of alpine ecosystems.

2. Materials and Methods

2.1. Study Area

The Qinghai–Tibet Plateau is located in southwestern China, with geographical coordinates of 26°00′~39°47′N and 73°19′~104°47′E (Figure 1). It covers a total area of around 2.5 million km2, of which approximately 1.5 million km2 is covered by grassland. Average elevation is more than 4000 m. The terrain is complex and diverse, and includes high mountains, plateaus, valleys, basins, and other landforms. The region is characterized by both a cold arid climate and a plateau monsoon climate, with large variations in precipitation and temperature through the year. With respect to vegetation, the Qinghai–Tibet Plateau region is dominated by alpine grasslands, including alpine meadows, alpine steppes, and alpine desert. The region is also characterized by rich soil types, including alpine meadow soil, grass felt soil, and mountain forest soil. Due to the combined influence of climate change and human activities, 75% of the Qinghai–Tibet Plateau is now classified as an ecologically fragile region, with moderate-to-severe degradation [40].

2.2. Data Resources

The satellite data used in this study were sourced from 2015, primarily due to the availability of comprehensive livestock data for that year, which is a critical factor in the inversion of SOC density on the Qinghai–Tibet Plateau. The 2015 data were chosen for their high quality and consistency, the temporal stability of key variables, and to align with the ground data collection period (2012–2016), ensuring a direct and temporally consistent comparison between the satellite data and the ground observations.
Detailed information about the data used in this study can be found in Table 1.

2.2.1. Soil Sampling Data

The topsoil (0–20 cm) SOC density data used in this study area were obtained from the Soil Series of China published in 2020. Soil samples were taken from a total of 325 sites in the Qinghai–Tibet Plateau region (Figure 1). Samples were mainly taken between 2012 and 2016. SOC density (kg C/m2) was calculated by combining the bulk density (kg/m3) and percentage of coarse fractions (%) [41], as in Equation (1):
SOCD = S O C × B D × S D 100 × 1 C F 100
where SOC, BD, SD, and CF represent soil organic carbon content(g/kg), bulk density, soil depth (cm), and coarse fractions, respectively, in the specific soil horizon.
The SOC density values of the 0–20 cm topsoil used in this study were obtained using the weighted average method.

2.2.2. Remote Sensing Data

Sentinel-1A VV and VH polarization data, as well as NDVI data, were obtained from the Google Earth Engine (GEE) platform (https://earthengine.google.com/) accessed on 2015. The NDVI data were obtained from the MODIS satellite through the GEE platform’s code. The coordinate system was WGS_1984 and the spatial resolution was 500 m. For NDVI data, we used annual averages to reduce the impact of seasonal fluctuations and obtain stable vegetation activity indicators. To ensure the quality of the data, we implemented cloud masking processing to remove pixels obscured by clouds. The SAR data processing of Sentinel-1A involves radiometric correction and geometric correction to eliminate terrain effects and ensure data consistency. In the preprocessing stage of the SAR data, we also performed terrain correction to compensate for changes in radar echo intensity caused by terrain undulations. We used annual averages of SAR data.
The climatic data for the study area included values for monthly precipitation, monthly average temperature, and annual land surface temperature (LST), with the spatial resolution of precipitation and temperature being 500 m. These data were obtained from the WorldClim database (http://www.worldclim.org/) accessed on 2015. Annual LST data were downloaded from the GEE platform (https://earthengine.google.com/) accessed on 2015. These LST data were obtained from the MODIS satellite through the GEE platform’s code, again with spatial resolution of 500 m.

2.2.3. Other Data

The digital elevation model (DEM) data for the terrain elements of the study area were obtained from ASTER GDEM (http://www.gscloud.cn/) accessed on 2015. These had a spatial resolution of 500 m. Terrain-derived indexes were calculated based on DEM; these included elevation, slope, aspect, VD, a terrain ruggedness index (TRI), and a topographic wetness index (TWI). These parameters provided a comprehensive reflection of topographic relief, water distribution, and vegetation growth, thereby enhancing any explanation for the spatial distribution of topsoil SOC content on the Qinghai–Tibet Plateau.
Soil property data included soil texture and soil moisture. Considering the important role of clay particles in soil moisture retention and organic carbon adsorption under the unique environmental conditions of the Qinghai–Tibet Plateau, clay particle content information was selected to represent soil texture data. The clay particle information was obtained from the Soil Sub-Center, National Earth System Science Data Center, and National Science and Technology Infrastructure of China (http://soil.geodata.cn) accessed on 2015 [42,43], with a resolution of 90 m, which was resampled to 500 m to match the resolution of other datasets. Soil moisture data were obtained from NASA’s Soil Moisture Active Passive (SMAP) satellite project which offers global observation data of soil moisture at a resolution of 500 m.
Conventional research often uses socioeconomic indicators such as GDP and population density to analyze the impact of human activities on SOC dynamics. However, given the uniqueness of the Qinghai–Tibet Plateau, we turned to more targeted indicators—the number of livestock, especially large livestock such as cows, buffalo, goats, sheep, and horses, which play a central role in the region’s livestock industry and significantly affect soil carbon cycling and properties. We obtained the distribution data of these livestock in 2015 from the Gridded Livestock of the World (GLW v4) database, using the high-resolution data from the dasymetric method (DA), and resampled it to 500 m to match the accuracy required for the study.
Data on land cover types were obtained from the Big Earth Data Platform for Three Poles (https://poles.tpdc.ac.cn/) accessed on 2015 [44]. The initial resolution of the dataset was 300 m. Subsequently, the data were downscaled to a coarser resolution of 500 m to meet our specific analytical requirements.

2.3. Calculation of Importance of Driving Factors

In this study, we used R programming language and compared the importance of topsoil SOC feature variables using RF, XGBoost, and LightGBM models. The calculation methods for feature importance were as follows:
RF: Uses two indicators, “mean decrease in accuracy” (MDA) and “mean decrease in Gini” (MDG), to evaluate feature importance. MDA measures the degree of decrease in model accuracy after feature removal, while MDG evaluates the contribution of feature reduction to model impurity.
XGBoost: Evaluates feature importance through three methods: “gain”, “coverage”, and “frequency”. Gain measures the contribution of features to model performance improvement, coverage measures the importance of features in all samples, and frequency is the number of times features are used for splitting.
LightGBM: Similar to XGBoost, LightGBM also uses three importance metrics: “gain”, “split”, and “cover”. Gain represents the contribution of features to model improvement, split is the frequency at which features are used for splitting in the decision tree, and cover represents the number of samples covered by features.
In the analysis process, we calculated the above indicators for all features in each model and compared the importance of SOC features in the different models to reveal their relative contributions to the prediction.

2.4. Topsoil SOC Density Estimation Model

In this research, we predicted SOC density using three ML algorithms: RF, XGBoost, and LightGBM, all implemented in R. For RF, the “randomForest” and “caret” packages were used, with “mtry” optimized via grid search. XGBoost and LightGBM were executed using their respective libraries, with parameters tuned through “caret” functions. A test set was reserved for unbiased performance evaluation post-cross-validation. Hyperparameters were selected within standard ranges: “mtry” for RF, various boosting parameters for XGBoost, and tree complexity parameters for LightGBM. Model performance was evaluated by RMSE, R2, and MAE, with the best model chosen based on these metrics after k-fold cross-validation. Random segmentation in R was used to split the dataset into training and test sets at an 80:20 ratio, ensuring a fair assessment of the models’ ability to generalize to unseen data. The “createDataPartition” function from the “caret” package facilitated this random split, maintaining the integrity of the model evaluation process.
Although we adopted an 8:2 data segmentation method, k-fold cross-validation was still used to further ensure the accuracy and stability of the model evaluation. This method reduces the randomness of evaluation results, improves the stability of performance estimation, and reduces the variance of model performance estimation by dividing the training set and validation set multiple times. Ultimately, retaining an independent validation set can prevent the model from overfitting to the cross-validation process, providing an objective performance evaluation benchmark and ensuring the reliability of model selection. The combination of these two methods allowed us to better evaluate and select the best model.

2.5. Model Accuracy Comparison

To assess the overall accuracy and stability of the three models—RF, XGBoost, and LightGBM—we used three key metrics: coefficient of determination (R2); root mean square error (RMSE); and mean absolute error (MAE). The closer the value of R2 is to 1, the greater the variance explained by the principal components of the target variable, and the better the model fit. The smaller the values of RMSE and MAE, the smaller the prediction error and the better the model fit. Calculation formulas were as follows:
R M S E = 1 n i = 1 n P i M i 2
R 2 = i = 1 n M i M ¯ P i P ¯ i = 1 n M i M ¯ 2 i = 1 n P i P ¯ 2 2
M A E = 1 n i = 1 n M i P i
where n represents the total number of observations in the dataset, P i is the actual value for observation i, M i is the predicted value for observation i, M ¯ is the mean of predictions, and P ¯ is the mean of actual values.

2.6. Mapping Spatial Distribution of SOC

In order to identify the key factors affecting SOC density and avoid model overfitting, we used the Boruta feature selection technique. Boruta is an iterative method for evaluating feature importance, which can effectively identify the variables that are crucial for SOC density prediction. In this study, we used Boruta to select the most relevant covariates from the environmental and remote sensing variables (Figure 2; Table 1). Among them, the green box plot represents the important features confirmed in the model, yellow represents the features that can be selectively added, and red represents the features that are not important and should be excluded. The blue shadowMax and shadowMin are shadow features used as noise benchmarks to evaluate the importance of the original features; The black shadowMean represents the average importance of shadow features as a criterion for judgment. The circles in the figure represent outliers, reflecting the abnormal performance of certain features in individual iterations. Overall, Figure 2 not only helps identify the truly important features in the model, but also provides a reference benchmark for evaluating the importance of the original features through shadow features. Meanwhile, we experimented with various cross-validation fold settings with the aim of finding the optimal configuration that would ensure both model stability and high prediction accuracy (Table 2). After determining the optimal feature combination and fold, we adjusted and trained the selected ML model, optimizing its hyperparameters to achieve optimal performance.
Finally, we applied this optimization model to the entire study area to draw a spatial distribution map of SOC density. This map illustrates the continuous changes in SOC density, providing intuitive and detailed data support for a deeper understanding of the spatial patterns of soil carbon storage, simplifying the presentation of complex information, and making it easier to understand and apply.

3. Results

3.1. Statistical Analysis of Topsoil SOC Density

Based on field measurements at 325 soil sampling points on the Qinghai–Tibet Plateau, we calculated the average, skewness, kurtosis, coefficient of variation, and standard deviation of SOC density to gain a deeper understanding of the spatial distribution characteristics and variability of SOC (Table 3). The average SOC density was 5.30 kgC/m2. Values for kurtosis and skewness provide insights into the distribution pattern of SOC. The skewness of SOC density was 1.25, indicating a slight right skewness, implying fewer larger values and more smaller values without any significant severity. The coefficient of variation (CV) was 70.59%, and the standard deviation (SD) was 3.74 kg/m2, i.e., more than half the average value of the SOC density. These results indicate a moderate spatial variation in SOC, possibly as a result of the extensive study area and the large range in altitude of the soil sampling sites.

3.2. Comparison of Accuracy of Different Models in Estimating Topsoil SOC Density

Comparative analysis of the RF, XGBoost, and LightGBM models shows that LightGBM is the best model for predicting SOC density. In order to achieve optimal performance, LightGBM was fine-tuned through a rigorous hyperparameter tuning process and selected 5-fold cross-validation. The best-performing LightGBM model used the following hyperparameters: learning rate of 0.1, maximum tree depth of 6, minimum sub weight of 1, sub sampling rate of 0.8, etc. The adjustment process involved multiple model runs to ensure the robustness of the final model configuration. As shown in Table 4, LightGBM had the lowest prediction error, with an RMSE of 2.5914 kgC/m2, MAE of 1.7195, and the highest R2 value of 0.7537, which was better than RF and XGBoost. This discovery confirms earlier research [45,46,47] and strengthens the advantages of LightGBM in managing large-scale datasets and capturing complex nonlinear relationships with significant computational efficiency [48,49,50].
Although RF has a confirmed level of proficiency in predicting SOC levels due to its robust nature and reduced risk of overfitting [39,51,52,53,54], LightGBM provided more benefits than RF in this study by optimizing feature selection and utilizing a wide dataset. The similarity of the mean, median, standard deviation, and coefficient of variation predicted by RF, XGBoost, and LightGBM (Table 5) highlights the consistency of the model in performance metrics, but the overall advantage of LightGBM highlights its effectiveness and reliability in SOC density prediction.
The observed differences in RMSE, MAE, and R2 values between the models can be attributed to their different architectures and learning mechanisms. The ability of each model to handle noise, overfitting, and feature selection varies, resulting in differences in prediction accuracy [55,56]. The variability of feature importance also contributes to the performance of models, as certain predictive factors may be more prominent in one model than in another, thereby affecting the model’s generalization ability from training data to unseen data [57,58].

4. Discussion

4.1. Spatial Distribution Analysis of Topsoil SOC Density

We used the best-performing LightGBM to predict topsoil SOC density across the Qinghai–Tibet Plateau. We identified significant regional differences in the spatial distribution of topsoil SOC density on the Qinghai–Tibet Plateau (Figure 3a). The western and northern areas of the plateau yielded lower SOC density values. These lower values may be explained with reference to climatic conditions [59] and vegetation cover. Because these areas are mainly covered by alpine steppe and desert, they typically experience extreme climatic conditions, including low temperatures and scarce rainfall, which restrict the accumulation and preservation of organic carbon [60].
In contrast, a higher SOC density was recorded for the eastern part of the Qinghai–Tibet Plateau, in line with the findings of a previous study [61]. This is an area of alpine meadows, swamp meadows, and shrublands. The eastern region is also the source of several major rivers, such as the Yangtze, Yellow, Lancang, Yarlung, and Tsangpo. The eastern region of the Qinghai–Tibet Plateau is renowned for its superior natural conditions, which collectively promote the accumulation of SOC. Firstly, the region has good vegetation coverage, mainly due to its lower altitude and mild climate conditions. Due to the unclear vertical characteristics of the climate zone, the vegetation species in this area are abundant, widely distributed, and have high coverage, providing a stable carbon sink for the ecosystem [62]. The climate characteristics of this region are typical of a high-altitude plateau climate, with dry, cold, and long winters and cool and pleasant summers. During the same period of rain and heat, the frost-free period is short and the sunshine is strong, creating favorable conditions for plant growth. The average annual precipitation is approximately 412 mm, mainly concentrated from June to September. This concentrated precipitation pattern is beneficial for plant growth and organic matter accumulation [63]. In terms of soil types, the eastern region of the Qinghai–Tibet Plateau is distributed with the orders of leached soil and primary cultivated soil, with leached soil mainly distributed in the southeast and eastern regions [64]. These soil types not only have high fertility but also suitable physical properties, which contribute to the accumulation and preservation of organic matter. Hydrological conditions are also an important factor affecting SOC accumulation. The precipitation in this region is concentrated in summer, providing sufficient water for vegetation and promoting its growth and the accumulation of organic matter [63]. In addition, the region has abundant groundwater resources, which help maintain soil moisture and further promote plant growth [65]. Good land use and management practices have also contributed to the accumulation of SOC, especially in the river source regions, where research has found that SOC density can reach up to 13.2 kg/m2. These comprehensive factors work together to make the eastern region of the Qinghai–Tibet Plateau an important carbon storage area. High levels of topsoil SOC were also recorded in lower-altitude areas of the eastern part of the Qinghai–Tibet Plateau, including areas of wetland, forest, and grassland, again in line with the findings of previous studies [66]. These results may be related to precipitation levels and vegetation types, with moist climatic conditions and abundant vegetation cover contributing to the accumulation of organic carbon.
In the present study, we also found that the spatial distribution of SOC on the Qinghai–Tibet Plateau is influenced by land cover types, as shown in Figure 3b. Levels of topsoil SOC in forests, grasslands, and wetlands were significantly higher than in arable land, urban land, glaciers, and bare land. The highest SOC density (13.2 kg/m2) was recorded at a forest site. However, very low SOC density values were recorded in the northern area of the plateau, consistent with the findings of previous research [59], and mainly due to the bare land cover type in this area. The authors of [67] previously found that land cover types and land use management practices are the dominant factors influencing SOC accumulation at low altitudes. Further analysis shows a significant negative correlation between NDVI and land use and cover change (LUCC), with correlation coefficients of −0.45 and −0.35, respectively (Figure 3c). These findings emphasize the importance of NDVI and precipitation as the main driving factors of LUCC. Meanwhile, the correlation of other variables was weak or not significant. To verify these correlations, we conducted a comprehensive analysis using SOC density data and LUCC data from 325 sampling points (Figure 3d). The results are consistent with our research: the average SOC density of forest and grassland is highest, both exceeding 7 kg/m2. In contrast, the SOC density of glacial areas and urban land is extremely low, close to 0 kg/m2. In addition, feature importance analysis using the three ML models further confirmed the impact of NDVI on SOC density distribution. In all models, the feature importance of NDVI ranks among the top three (Figure 4), indicating that the strong correlation between SOC density distribution and LUCC distribution can be largely attributed to the influence of NDVI.

4.2. Variable Importance Analysis

Figure 4 shows the relative importance of the spatial variation characteristic variables of SOC density in the Qinghai–Tibet Plateau for the three models. The model results indicate that climate and terrain factors have significant statistical importance in the distribution of SOC density. The distribution of SOC density is significantly correlated with climate factors such as temperature, precipitation, and LST. In the three ML models, the importance contribution rates of the characteristic variables of these climate factors are 31.4%, 36%, and 30.7%, respectively. Temperature and precipitation are directly related to vegetation growth and the decomposition process of soil organic matter, while LST, as a reflection of soil thermal characteristics, has a regulatory effect on the stability of SOC levels and their decomposition processes. Global warming, especially the melting of permafrost, is associated with a decrease in SOC content [68].
Terrain factors, including variables such as VD, elevation, TWI, TRI, and slope, show relative importance in the model [69]. The contribution rates of feature variable importance for these terrain factors in the three ML models are 33.7%, 32.6%, and 29.2%, respectively. The rainwater factor is significantly correlated with the distribution and accumulation of topsoil SOC [70], as they affect the distribution of soil moisture [71,72], the activity of soil microorganisms, and the rate of organic matter decomposition.
Although NDVI, VD, and temperature have different importance rankings as feature variables in the three ML models, they are all in the top three importance rankings, with contribution rates of 14.08%, 12.29%, and 14.06%; 17.32%, 20.73%, and 24.62%; and 16.72%, 11.96%, and 20.03%. These results indicate that vegetation cover, terrain factors, and temperature have significant statistical importance for the target variable. NDVI is a reflection of vegetation growth status, while VD and temperature are related to the decomposition and accumulation of organic matter in soil by changing humid and hot conditions. Previous studies on the Qinghai–Tibet Plateau [36] have found that NDVI has significant statistical importance for SOC content at different spatial resolution scales. Another study [73] showed that SOC content is mainly related to soil moisture content and vegetation communities in the plateau, while the statistical importance of active layer thickness, vegetation coverage, and terrain factors is relatively low. The differences in these conclusions may be due to the selection of study areas and the varying soil depths investigated; this study mainly focused on the SOC density of 0–20 cm topsoil, while previous studies have considered the 0–150 cm layer.
Soil moisture also shows a certain statistical importance for SOC density levels, but the degree of influence is smaller than the above factors (6.11%, 4.46%, and 9.86%). However, previous studies [74,75] have shown that SOC in the Qinghai–Tibet Plateau is mainly influenced by soil moisture. The differences in these results may be due to the selection of study areas and the varying soil depths investigated. In the previously cited research, only a small portion of the area was considered, to a soil depth of one meter.
The statistical importance of human activities on SOC density cannot be ignored (4.53%, 4.22%, and 6.69%). The ecosystem of the Qinghai–Tibet Plateau is extremely fragile [76], and human activities, such as overgrazing and large-scale cultivation, may lead to vegetation degradation and water loss, thereby affecting soil carbon sequestration and ecosystem functions [77,78,79].
SAR technology can provide important information about soil properties in certain situations, such as surface moisture or soil moisture [80,81], but its statistical importance was not significant in this study (10.2%, 5.39%, 6.81%). This is mainly because SAR data are primarily used to measure other soil properties and has certain limitations when dealing with complex terrain and vegetation cover [82]. Therefore, further development and improvement of SAR technology is needed to enhance its application effectiveness in SOC estimation.
Overall, all three ML models demonstrated the statistical importance of terrain and climate factors in SOC density distribution, with NDVI, VD, and temperature being particularly significant in individual features.

4.3. Comparison with Other Soil Products

In this study, our model used the LightGBM algorithm with a resolution of 500 m and was compared with the established global digital soil mapping (DSM) products SoilGrids1km and SoilGrids250m [83] (https://www.soilgrids.org/) accessed on 2016. By comparison, unlike the validation strategy detailed in Table 4 which uses a specified validation subset, our method integrated the entire soil sampling dataset, providing an overall perspective of model performance.
Through the verification of comprehensive soil sampling data, we observed that both SoilGrids250m and SoilGrids1km showed a trend of underestimating SOC density. The validation results of our study (Figure 5a) demonstrate excellent performance indicators, with an R2 value of 0.81 and a RMSE of 0.01 kgC/m2, significantly better than SoilGrids250m (Figure 5b, R2 = 0.21, RMSE = 0.83 kgC/m3) and SoilGrids1km (Figure 5c, R2 = 0.42, RMSE = 1.24 kgC/m2).
It is worth noting that our model’s prediction is highly consistent with the spatial distribution pattern of SOC density observed in SoilGrids1km (Figure 6), and the most obvious divergence from the SoilGrids250m prediction occurs in the northeast of the Qinghai–Tibet Plateau. Although SoilGrids250m and SoilGrids1km are widely used at both global and national scales, their accuracy has been found to be relatively low when applied to the Qinghai–Tibet Plateau, with insufficient expression of spatial heterogeneity.
The superior performance of our model compared to SoilGrids1km and SoilGrids250m can be attributed to various factors. Firstly, our LightGBM 500 m model was meticulously calibrated for the Qinghai–Tibet Plateau, taking into account the unique environmental and climatic conditions of the region. This specialization enabled our model to more accurately capture complex spatial patterns of SOC density, surpassing global DSM products that lack this regional specificity. Secondly, the selected resolution of our model (500 m) strikes a balance between capturing fine scale variations and maintaining computational feasibility. This intermediate resolution may help our model predict a closer correspondence between the observed SOC density spatial distribution, surpassing the performance of SoilGrids1km and SoilGrids250m. Finally, the training data for our model included soil samples from local sources, carefully selected to represent the unique soil characteristics of the Qinghai–Tibet Plateau. This localized data input helped our model to address the specific challenges posed by the soil heterogeneity in the region proficiently.
These findings are consistent with a previous study conducted in the karst region of southwestern China [84], which drew comparable conclusions about the limitations of global DSM products in accurately estimating SOC.

4.4. Uncertainty

Although this study provides robust methods and novel insights, there are several sources of uncertainty worth considering. Firstly, it must be acknowledged that there are inherent sampling errors in on-site measurements. Soil sampling is influenced by natural changes and may not fully represent the true spatial distribution of soil characteristics, especially considering the heterogeneity of the study area.
Secondly, uncertainties related to depth and proximity to bedrock may affect the accuracy of the predictions, especially considering the limited number of samples available for deep soil. The point values measured on the surface may not fully represent deeper characteristics, especially compared to the larger support of 500 m covariate pixels. This difference may introduce errors in predicting deeper soil layers.
Thirdly, although selecting the model with the best performance is common practice, it is worth considering whether ensemble modeling or model fusion may improve prediction accuracy. The integrated method combines the predictions of multiple models and has been proven to reduce prediction errors and improve the stability of environmental modeling [53]. Therefore, exploring integrated methods may improve the reliability of our soil property prediction.
Finally, the sample size of 325 points may be insufficient for a region as large and diverse as the Qinghai–Tibet Plateau, highlighting the need for additional sampling in future studies to better capture the spatial variability of soil properties.

5. Conclusions

In this study, field sample points and multisource environmental variables (Sentinel-1A SAR and MODIS NDVI, climatic factors, topographic, socioeconomic, and soil properties) were selected as input variables for a SOC estimation model. RF, XGBoost, and LightGBM machine models were used to build a SOC density estimation model of the Qinghai–Tibet Plateau. The results showed the following: (1) The mean value of SOC density was 5.30 kg/m2. (2) Among the three ML algorithms used, LightGBM showed the highest validation accuracy (R2 = 0.7537, RMSE = 2.4928 kgC/m2, MAE = 1.7195). (3) The NDVI, VD, and temperature are crucial in predicting the spatial distribution of topsoil SOC density. Feature importance analyses conducted using the three ML models all showed these factors to be among the top three in importance, with contribution rates of 14.08%, 12.29%, and 14.06%; 17.32%, 20.73%, and 24.62%; and 16.72%, 11.96%, and 20.03%. (4) SOC density in the southeastern region of the Qinghai–Tibet Plateau ranged from 8.41 kg/m2 to 13.2 kg/m2; for the northwestern region, the range was from 0.85 kg/m2 to 2.88 kg/m2. SOC density also varied between different land cover types, with forest and grassland areas having higher SOC density values than urban and bare land areas. In conclusion, the results of this study provide a scientific basis for future soil resource management and improved carbon sink accounting on the Qinghai–Tibet Plateau, as well as guidance in pursuit of these objectives.

Author Contributions

Conceptualization, W.S.; methodology, W.Z.; software, Q.C.; validation, Q.C. and W.Z.; formal analysis, W.Z.; writing—original draft preparation, Q.C.; writing—review and editing, W.S. and W.Z.; visualization, W.S. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by the National Natural Science Foundation of China (41501575, 42330707, 42171338) and was supported through open bidding for selecting the best candidates of Fuzhou City (2022JDA07), the Project of Chongqing Science and Technology Bureau (cstc2021jcyj-msxmX0384), and the Sichuan Science and Technology Program (2023NSFSC1916).

Data Availability Statement

The data that support the findings of this study are openly available in the respective repositories and can be accessed using the provided URLs.

Acknowledgments

We acknowledge data support from the Soil Sub-Center, National Earth System Science Data Center, and National Science and Technology Infrastructure of China (http://soil.geodata.cn). We thank the reviewers for their valuable feedback on the paper.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Caddeo, A.; Marras, S.; Sallustio, L.; Spano, D.; Sirca, C. Soil organic carbon in Italian forests and agroecosystems: Estimating current stock and future changes with a spatial modelling approach. Agric. For. Meteorol. 2019, 278, 107654. [Google Scholar] [CrossRef]
  2. Domke, G.M.; Perry, C.H.; Walters, B.F.; Nave, L.E.; Woodall, C.W.; Swanston, C.W. Toward inventory-based estimates of soil organic carbon in forests of the United States. Ecol. Appl. 2017, 27, 1223–1235. [Google Scholar] [CrossRef]
  3. Kumar, A.; Sharma, M. Estimation of soil organic carbon in the forest catchment of two hydroelectric reservoirs in Uttarakhand, India. Hum. Ecol. Risk Assess. Int. J. 2016, 22, 991–1001. [Google Scholar] [CrossRef]
  4. Houghton, R.A. Land-use change and the carbon cycle. Glob. Change Biol. 1995, 1, 275–287. [Google Scholar] [CrossRef]
  5. Post, W.M.; Emanuel, W.R.; Zinke, P.J.; Stangenberger, A.G. Soil carbon pools and world life zones. Nature 1982, 298, 156–159. [Google Scholar] [CrossRef]
  6. Johnston, C.A.; Johnston, C.A.; Groffman, P.; Breshears, D.D.; Cardon, Z.G.; Currie, W.; Emanuel, W.; Gaudinski, J.; Jackson, R.B.; Lajtha, K.; et al. Carbon cycling in soil. Front. Ecol. Environ. 2004, 2, 522–528. [Google Scholar] [CrossRef]
  7. Lal, R. Soil health and carbon management. Food Energy Secur. 2016, 5, 212–222. [Google Scholar] [CrossRef]
  8. Boivin, P.; Schäffer, B.; Sturny, W. Quantifying the relationship between soil organic carbon and soil physical properties using shrinkage modelling. Eur. J. Soil Sci. 2009, 60, 265–275. [Google Scholar] [CrossRef]
  9. Zhang, S.; Huang, Y.; Shen, C.; Ye, H.; Du, Y. Spatial prediction of soil organic matter using terrain indices and categorical variables as auxiliary information. Geoderma 2012, 171, 35–43. [Google Scholar] [CrossRef]
  10. Genxing, P. Study on Carbon Reservoir in Soils of China. Sci. Technol. Bull. 1999, 15, 330–332. [Google Scholar]
  11. Wielopolski, L.; Hendrey, G.; Johnsen, K.H.; Mitra, S.; Prior, S.A.; Rogers, H.H.; Torbert, H.A. Nondestructive system for analyzing carbon in the soil. Soil Sci. Soc. Am. J. 2008, 72, 1269–1277. [Google Scholar] [CrossRef]
  12. Ward, K.J.; Chabrillat, S.; Brell, M.; Castaldi, F.; Spengler, D.; Foerster, S. Mapping Soil Organic Carbon for Airborne and Simulated EnMAP Imagery Using the LUCAS Soil Data-base and a Local PLSR. Remote Sens. 2020, 12, 3451. [Google Scholar] [CrossRef]
  13. Gholizadeh, A.; Žižala, D.; Saberioon, M.M.; Borůvka, L. Soil organic carbon and texture retrieving and mapping using proximal, airborne and Sentinel-2 spectral imaging. Remote Sens. Environ. 2018, 218, 89–103. [Google Scholar] [CrossRef]
  14. Angelopoulou, T.; Tziolas, N.; Balafoutis, A.; Zalidis, G.; Bochtis, D. Remote Sensing Techniques for Soil Organic Carbon Estimation: A Review. Remote Sens. 2019, 11, 676. [Google Scholar] [CrossRef]
  15. Zhu, Y.; Wang, D.; Zhang, H.; Shi, P. Soil organic carbon content retrieved by UAV-borne high-resolution spectrometer. Trans. Chin. Soc. Agric. Eng. (Trans. CSAE) 2021, 37, 66–72. [Google Scholar]
  16. Poggio, L.; Gimona, A. Assimilation of optical and radar remote sensing data in 3D mapping of soil properties over large areas. Sci. Total Environ. 2017, 579, 1094–1110. [Google Scholar] [CrossRef] [PubMed]
  17. Bousbih, S.; Zribi, M.; Lili-Chabaane, Z.; Baghdadi, N.; El Hajj, M.; Gao, Q.; Mougenot, B. Potential of Sentinel-1 Radar Data for the Assessment of Soil and Cereal Cover Parameters. Sensors 2017, 17, 2617. [Google Scholar] [CrossRef]
  18. Mengen, D.; Montzka, C.; Jagdhuber, T.; Fluhrer, A.; Brogi, C.; Baum, S.; Schüttemeyer, D.; Bayat, B.; Bogena, H.; Coccia, A.; et al. The SARSense Campaign: Air- and Space-Borne C- and L-Band SAR for the Analysis of Soil and Plant Parameters in Agriculture. Remote Sens. 2021, 13, 825. [Google Scholar] [CrossRef]
  19. Gorrab, A.; Zribi, M.; Baghdadi, N.; Mougenot, B.; Chabaane, Z.L. Potential of X-Band TerraSAR-X and COSMO-SkyMed SAR Data for the Assessment of Physical Soil Parameters. Remote Sens. 2015, 7, 747–766. [Google Scholar] [CrossRef]
  20. Yang, R.; Rossiter, D.G.; Liu, F.; Lu, Y.; Yang, F.; Yang, F.; Zhao, Y.; Li, D.; Zhang, G. Predictive Mapping of Topsoil Organic Carbon in an Alpine Environment Aided by Landsat TM. PLoS ONE 2015, 10, e0139042. [Google Scholar] [CrossRef]
  21. Nie, X.; Chen, H.; Niu, Z.; Zhang, L.; Liu, W.; Xing, S.; Fan, X.; Li, J. Digital SOC Mapping in Croplands Using Agricultural Activity Factors Derived from Time-Series Data in Western Fujian. Geo-Inf. Sci. 2022, 24, 1835–1852. [Google Scholar]
  22. Tziachris, P.; Aschonitis, V.; Chatzistathis, T.; Papadopoulou, M. Assessment of spatial hybrid methods for predicting soil organic matter using DEM derivatives and soil parameters. Catena 2019, 174, 206–216. [Google Scholar] [CrossRef]
  23. Ho, W.K.O.; Tang, B.-S.; Wong, S.W. Predicting property prices with machine learning algorithms. J. Prop. Res. 2021, 38, 48–70. [Google Scholar] [CrossRef]
  24. Zhang, J.; Mucs, D.; Norinder, U.; Svensson, F. LightGBM: An effective and scalable algorithm for prediction of chemical toxicity–application to the Tox21 and mutagenicity data sets. J. Chem. Inf. Model. 2019, 59, 4150–4158. [Google Scholar] [CrossRef]
  25. Sun, X.; Liu, M.; Sima, Z. A novel cryptocurrency price trend forecasting model based on LightGBM. Financ. Res. Lett. 2020, 32, 101084. [Google Scholar] [CrossRef]
  26. Rufo, D.D.; Debelee, T.G.; Ibenthal, A.; Negera, W.G. Diagnosis of diabetes mellitus using gradient boosting machine (LightGBM). Diagnostics 2021, 11, 1714. [Google Scholar] [CrossRef]
  27. Wang, L.; Zeng, H.; Zhang, Y.J.; Zhao, G.; Chen, N.; Li, J.X. A review of research on soil carbon storage and its influencing factors in the Tibetan Plateau. Chin. J. Ecol. 2019, 38, 3506. [Google Scholar]
  28. Northeast Institute of Geography Has Made Important Progress in the Stability Mechanism of Soil Organic Carbon in the Yarlung Zangbo River Basin on the Tibetan Plateau--Northeast Institute of Geography and Agroecology. Chinese Academy of Sciences. Available online: https://www.cas.cn/syky/202308/t20230823_4965292.shtml (accessed on 22 August 2023).
  29. Zhang, P.; Li, L.; Wang, J.; Zhang, S.; Zhu, Z. Effects of Hydraulic Erosion on the Spatial Redistribution Characteristics of Soil Aggregates and SOC on Pisha Sandstone Slope. Sustainability 2023, 15, 13276. [Google Scholar] [CrossRef]
  30. Buke, B.; Peng, Z.; Zhang, X.; Zhou, J.; Li, D.; Wang, Z. The Biogeochemical Cycling Model DNDC and Its Applications. Chin. J. Soil Sci. 2007, 38, 1208–1212. [Google Scholar]
  31. Lian, G.; Guo, X.; Fu, B.; Hu, C. Prediction of the spatial distribution of soil properties based on environmental correlation and geostatistics. Trans. Chin. Soc. Agric. Eng. (Trans. CSAE) 2009, 25, 237–242. [Google Scholar]
  32. Sun, X.; Zhao, Y.; Zhao, L.; Li, D.; Zhang, G. Prediction and mapping of spatial distribution of soil attributes by using soil-landscape models. Soils 2008, 40, 837–842. [Google Scholar]
  33. Zhou, Q.; Zhao, X.; Guo, X.; Zhou, Y. Prediction of Spatial Distribution of Soil Organic Carbon in Cultivated Land Based on Phenology and Extreme Climate Information. Acta Pedol. Sin. 2024, 61, 648–661. [Google Scholar]
  34. Yu, W.; Zhou, W.; Wang, T.; Xiao, J.; Peng, Y.; Li, H.; Li, Y. Significant Improvement in Soil Organic Carbon Estimation Using Data-Driven Machine Learning Based on Habitat Patches. Remote Sens. 2024, 16, 688. [Google Scholar] [CrossRef]
  35. Hengl, T.; Mendes de Jesus, J.; Heuvelink, G.B.M.; Ruiperez Gonzalez, M.; Kilibarda, M.; Blagotić, A.; Shangguan, W.; Wright, M.N.; Geng, X.; Bauer-Marschallinger, B.; et al. SoilGrids250m: Global gridded soil information based on machine learning. PLoS ONE 2017, 12, e0169748. [Google Scholar] [CrossRef] [PubMed]
  36. Zhou, Y.; Webster, R.; Viscarra Rossel, R.A.; Shi, Z.; Chen, S. Baseline map of soil organic carbon in Tibet and its uncertainty in the 1980s. Geoderma 2019, 334, 124–133. [Google Scholar] [CrossRef]
  37. Yang, R.M.; Huang, L.M.; Liu, F. Evaluation and mapping soil organic carbon in seasonally frozen ground on the Tibetan Plateau. Catena 2024, 235, 107631. [Google Scholar] [CrossRef]
  38. Yang, Y.; Fang, J.; Tang, Y.; Ji, C.; Zheng, C.; He, J.; Zhu, B. Storage, patterns and controls of soil organic carbon in the Tibetan grasslands. Glob. Change Biol. 2008, 14, 1592–1599. [Google Scholar] [CrossRef]
  39. Wang, G.; Qian, J.; Cheng, G.; Lai, Y. Soil organic carbon pool of grassland soils on the Qinghai-Tibetan Plateau and its global implication. Sci. Total Environ. 2002, 291, 207–217. [Google Scholar]
  40. Yu, B.H.; Lv, C.H. Assessment of ecological vulnerability on the Tibetan Plateau. Geogr. Res. 2011, 30, 2289–2295. [Google Scholar]
  41. Poggio, L.; de Sousa, L.M.; Batjes, N.H.; Heuvelink, G.B.M.; Kempen, B.; Ribeiro, E.; Rossiter, D. SoilGrids 2.0: Producing soil information for the globe with quantified spatial uncertainty. Soil 2021, 7, 217–240. [Google Scholar] [CrossRef]
  42. Liu, F.; Wu, H.; Zhao, Y.; Li, D.; Yang, J.; Song, X.; Shi, Z.; Zhu, A.; Zhang, G. Mapping high resolution National Soil Information Grids of China. Sci. Bull. 2021, 67, 328–340. [Google Scholar] [CrossRef]
  43. Liu, F.; Zhang, G.; Song, X.; Li, D.; Zhao, Y.; Yang, J.; Wu, H.; Yang, F. High-resolution and three-dimensional mapping of soil texture of China. Geoderma 2020, 361, 114061. [Google Scholar] [CrossRef]
  44. Xu, E. Land Use of the Tibet Plateau in 2015 (Version 1.0); A Big Earth Data Platform for Three Poles; Northwest Institute of Eco-Environment and Resources: Lanzhou, China, 2019. [Google Scholar]
  45. Ye, Z.; Sheng, Z.; Liu, X.; Ma, Y.; Wang, R.; Ding, S.; Liu, M.; Li, Z.; Wang, Q. Using Machine Learning Algorithms Based on GF-6 and Google Earth Engine to Predict and Map the Spatial Distribution of Soil Organic Matter Content. Sustainability 2021, 13, 14055. [Google Scholar] [CrossRef]
  46. Wang, H.; Zhang, X.; Wu, W.; Liu, H. Prediction of Soil Organic Carbon under Different Land Use Types Using Sentinel-1/-2 Data in a Small Wa-tershed. Remote Sens. 2021, 13, 1229. [Google Scholar] [CrossRef]
  47. Zhou, T.; Geng, Y.; Ji, C.; Xu, X.; Wang, H.; Pan, J.; Bumberger, J.; Haase, D.; Lausch, A. Prediction of soil organic carbon and the C:N ratio on a national scale using machine learning and satellite data: A comparison between Sentinel-2, Sentinel-3 and Landsat-8 images. Sci. Total Environ. 2021, 755, 142661. [Google Scholar] [CrossRef]
  48. Wang, Z.; Tang, Z.; Zhou, P.; Lai, J.; Dai, Y.; Zhou, L.; Wang, Y.; Chen, G.; Jiang, Y.; Guo, X.; et al. Comparison of Four Machine Learning Models in Predicting Soil Organic Carbon Content in a Subtropical Hilly Watershed. Res. Agric. Mod. 2023, 44, 558–566. [Google Scholar]
  49. Chen, L.; Zhang, S.; Fu, B.; Peng, H. Correlation analysis on spatial pattern of land use and soil at catchment scale. Acta Ecol. Sin. 2003, 23, 2497–2505. [Google Scholar]
  50. Yan, J.; Xu, Y.; Cheng, Q.; Jiang, S.; Wang, Q.; Xiao, Y.; Ma, C.; Yan, J.; Wang, X. LightGBM: Accelerated genomically designed crop breeding through ensemble learning. Genome Biol. 2021, 22, 271. [Google Scholar] [CrossRef] [PubMed]
  51. Wu, J.; Lu, J.; Li, L.; Min, X.; Luo, Y. Pollution, ecological-health risks; sources of heavy metals in soil of the northeastern Qinghai-Tibet Plateau. Chemosphere 2018, 201, 234–242. [Google Scholar] [CrossRef]
  52. Yang, J.; Fan, J.; Lan, Z.; Mu, X.; Wu, Y.; Xin, Z.; Miping, P.; Zhao, G. Improved Surface Soil Organic Carbon Mapping of SoilGrids250m Using Sentinel-2 Spectral Images in the Qinghai–Tibetan Plateau. Remote Sens. 2023, 15, 114. [Google Scholar] [CrossRef]
  53. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  54. Matsuki, K.; Kuperman, V.; Van Dyke, J.A. The Random Forests statistical technique: An examination of its value for the study of reading. Sci. Stud. Read. 2016, 20, 20–33. [Google Scholar] [CrossRef] [PubMed]
  55. Ledgerwood, A.; Shrout, P.E. The trade-off between accuracy and precision in latent variable models of mediation processes. J. Personal. Soc. Psychol. 2011, 101, 1174–1188. [Google Scholar] [CrossRef] [PubMed]
  56. Kennedy, M.C.; O’Hagan, A. Bayesian calibration of computer models. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 2001, 63, 425–464. [Google Scholar] [CrossRef]
  57. Liu, B.; Udell, M. Impact of accuracy on model interpretations. arXiv 2020, arXiv:2011.09903. [Google Scholar]
  58. Zhao, Y.; Chen, J.; Oymak, S. On the role of dataset quality and heterogeneity in model confidence. arXiv 2020, arXiv:2002.09831. [Google Scholar]
  59. Li, C.; Cao, Z.; Chang, J.; Zhang, Y.; Zhu, G.; Zong, N.; He, Y.; Zhang, J.; He, N. Elevational gradient affect functional fractions of soil organic carbon and aggregates stability in a Tibetan alpine meadow. Catena 2017, 156, 139–148. [Google Scholar] [CrossRef]
  60. Davidson, E.A.; Janssens, I.A. Temperature sensitivity of soil carbon decomposition and feedbacks to climate change. Nature 2006, 440, 165–173. [Google Scholar] [CrossRef] [PubMed]
  61. Zhang, W.; Zhang, H. Distribution characteristics of soil organic carbon of alpine meadow in the Eastern Qinghai-Tibet Plateau. Wuhan Univ. J. Nat. Sci. 2009, 14, 274–280. [Google Scholar] [CrossRef]
  62. Pan, H.; Huang, P.; Xu, J. The spatial and temporal pattern evolution of vegetation NPP and its driving forces in middle-lower areas of the Min river based on geographical detector analyses. Acta Ecol. Sin. 2019, 39, 7621–7631. [Google Scholar]
  63. Ping, X.; Li, C.; Li, C.; Tang, S.; Fang, H.; Cui, S.; Chen, J.; Wang, E.; He, Y.; Cai, P.; et al. The distribution, population and conservation status of Przewalski’s gazelle, Procapra przewalskii. Biodivers. Sci. 2018, 26, 177–184. [Google Scholar] [CrossRef]
  64. Tian, Y.; Ouyang, H.; Xu, X.; Song, M.; Zhou, C. Distribution Characteristics of Soil Organic Carbon Storage and Density on the Qinghai-Tibet Plateau. Acta Pedol. Sin. 2008, 45, 933–942. [Google Scholar]
  65. Du, H.; Li, C. Influence of Hydrogeological Characteristics on Soil Groundwater Pollution Diffusion—A Case Study of an Agricultural Pharmaceutical Factory. Adv. Environ. Prot. 2023, 13, 302–311. [Google Scholar] [CrossRef]
  66. Bangroo, S.; Najar, G.; Rasool, A. Effect of altitude and aspect on soil organic carbon and nitrogen stocks in the Himalayan Mawer Forest Range. Catena 2017, 158, 63–68. [Google Scholar] [CrossRef]
  67. Choudhury, B.U.; Verma, B.C.; Ramesh, T.; Hazarika, S. Altitude regulates accumulation of organic carbon in soil: Case studies from the hilly ecosystem of northeastern region of India. Adv. Crop Environ. Interact. 2018, 137–149. [Google Scholar]
  68. Wang, G.; Li, Y.; Wang, Y.; Wu, Q. Effects of permafrost thawing on vegetation and soil carbon pool losses on the Qinghai–Tibet Plateau, China. Geoderma 2008, 143, 143–152. [Google Scholar]
  69. Li, X.; McCarty, G.W.; Karlen, D.L.; Cambardella, C.A. Topographic metric predictions of soil redistribution and organic carbon in Iowa cropland fields. Catena 2018, 160, 222–232. [Google Scholar] [CrossRef]
  70. Lozano-García, B.; Parras-Alcántara, L.; Brevik, E.C. Impact of topographic aspect and vegetation (native and reforested areas) on soil organic carbon and nitrogen budgets in Mediterranean natural areas. Sci. Total Environ. 2016, 544, 963–970. [Google Scholar] [CrossRef]
  71. Dyer, J.M. Assessing topographic patterns in moisture use and stress using a water balance approach. Landsc. Ecol. 2009, 24, 391–403. [Google Scholar] [CrossRef]
  72. Silveira, C.T.; Oka-Fiori, C.; Santos, L.J.C.; Sirtoli, A.E.; Silva, C.R.; Botelho, M.F. Soil prediction using artificial neural networks and topographic attributes. Geoderma 2013, 195, 165–172. [Google Scholar] [CrossRef]
  73. Wu, X.; Zhao, L.; Chen, M.; Fang, H.; Yue, G.; Chen, J.; Pang, Q.; Wang, Z.; Ding, Y. Soil organic carbon and its relationship to vegetation communities and soil properties in permafrost areas of the Central Western Qinghai-Tibet Plateau, China. Permafr. Periglac. Process. 2012, 23, 162–169. [Google Scholar] [CrossRef]
  74. Liu, W.; Chen, S.; Qin, X.; Baumann, F. Storage, patterns; control of soil organic carbon and nitrogen in the northeastern margin of the Qinghai–Tibetan Plateau. Environ. Res. Lett. 2012, 7, 035401. [Google Scholar] [CrossRef]
  75. Baumann, F.; He, J.; Schmidt, K.; Kühn, P.; Scholten, T. Pedogenesis, permafrost, and soil moisture as controlling factors for soil nitrogen and carbon contents across the Tibetan Plateau. Glob. Change Biol. 2009, 15, 3001–3017. [Google Scholar] [CrossRef]
  76. Sun, J.; Cheng, G.; Li, W. Meta-analysis of relationships between environmental factors and aboveground biomass in the alpine grassland on the Tibetan Plateau. Biogeosciences 2013, 10, 1707–1715. [Google Scholar] [CrossRef]
  77. Alhassan, A.R.M.; Ma, W.; Li, G.; Jiang, Z.; Wu, J.; Chen, G. Response of soil organic carbon to vegetation degradation along a moisture gradient in a wet mead-ow on the Qinghai–Tibet Plateau. Ecol. Evol. 2018, 8, 11999–12010. [Google Scholar] [CrossRef] [PubMed]
  78. Luo, R.; Fan, J.; Wang, W.; Luo, J.; Kuzyakov, Y.; He, J.-S.; Chu, H.; Ding, W. Nitrogen and phosphorus enrichment accelerates soil organic carbon loss in alpine grassland on the Qinghai-Tibetan Plateau. Sci. Total Environ. 2019, 650, 303–312. [Google Scholar] [CrossRef] [PubMed]
  79. Li, H.; Li, T.; Sun, W.; Zhang, W.; Zhang, Q.; Yu, L.; Qin, Z.; Guo, B.; Liu, J.; Zha, X. Degradation of wetlands on the Qinghai-Tibetan Plateau causing a loss in soil organic carbon in 1966–2016. Plant Soil 2021, 467, 253–265. [Google Scholar] [CrossRef]
  80. Mathieu, R.; Sbih, M.; Viau, A.A.; Anctil, F.; Parent, L.E.; Boisvert, J. Relationships between Radarsat SAR data and surface moisture content of agricultural organic soils. Int. J. Remote Sens. 2003, 24, 5265–5281. [Google Scholar] [CrossRef]
  81. Kerr, Y.H.; Waldteufel, P.; Wigneron, J.P.; Martinuzzi, J.; Font, J.; Berger, M. Soil Moisture Retrieval from Space: The Soil Moisture and Ocean Salinity (SMOS) Mission. IEEE Trans. Geosci. Remote Sens. 2001, 39, 1729–1735. [Google Scholar] [CrossRef]
  82. Monerris, A.; Benedicto, P.; Vall-llossera, M.; Camps, A.; Santanach, E.; Piles, M.; Prehn, R. Assessment of the topography impact on microwave radiometry at L-band. Solid Earth 2008, 113. [Google Scholar] [CrossRef]
  83. Liu, F.; Zhang, G. Basic Soil Property Dataset of High-Resolution China Soil Information Grids (2010–2018); A Big Earth Data Platform for Three Poles; Northwest Institute of Eco-Environment and Resources: Lanzhou, China, 2021. [Google Scholar] [CrossRef]
  84. Wang, T.; Zhou, W.; Xiao, J.; Li, H.; Yao, L.; Xie, L.; Wang, K. Soil Organic Carbon Prediction Using Sentinel-2 Data and Environmental Variables in a Karst Trough Val-ley Area of Southwest China. Remote Sens. 2023, 15, 2118. [Google Scholar] [CrossRef]
Figure 1. The spatial location of the study area and the soil sampling points.
Figure 1. The spatial location of the study area and the soil sampling points.
Remotesensing 16 03006 g001
Figure 2. Ranking of key influencing factors of SOC density based on Boruta feature selection method.
Figure 2. Ranking of key influencing factors of SOC density based on Boruta feature selection method.
Remotesensing 16 03006 g002
Figure 3. SOC density mapping of Qinghai–Tibet Plateau based on (a) LightGBM model, (b) land cover types, (c) correlation matrix of LUCC and input variables, and (d) analysis of the relationship between SOC density and LUCC.
Figure 3. SOC density mapping of Qinghai–Tibet Plateau based on (a) LightGBM model, (b) land cover types, (c) correlation matrix of LUCC and input variables, and (d) analysis of the relationship between SOC density and LUCC.
Remotesensing 16 03006 g003
Figure 4. Analysis of Relative Importance of Variables using (a) RF Model (b) XGBoost Model and (c) LightGBM Model.
Figure 4. Analysis of Relative Importance of Variables using (a) RF Model (b) XGBoost Model and (c) LightGBM Model.
Remotesensing 16 03006 g004
Figure 5. Validation of measured SOC using (a) this study’s LightGBM 500 m, (b) SoilGrids250m prediction, and (c) SoilGrids1km prediction.
Figure 5. Validation of measured SOC using (a) this study’s LightGBM 500 m, (b) SoilGrids250m prediction, and (c) SoilGrids1km prediction.
Remotesensing 16 03006 g005
Figure 6. Spatial distribution of topsoil SOC density using (a) this study’s LightGBM 500 m, (b) SoilGrids250m prediction, and (c) SoilGrids1km prediction.
Figure 6. Spatial distribution of topsoil SOC density using (a) this study’s LightGBM 500 m, (b) SoilGrids250m prediction, and (c) SoilGrids1km prediction.
Remotesensing 16 03006 g006
Table 1. Input variables for SOC density estimation model.
Table 1. Input variables for SOC density estimation model.
Variable CategoryIndexObtained Time
(Year)
Spatial Resolution (m)
Radar remote sensing dataVertical–vertical (VV)2015500 m
Vertical–horizontal (VH)2015500 m
Vegetation indexNormalized difference vegetation index (NDVI)2015500 m
Human activityLarge Livestock2015500 m
Topographical factorElevation2015500 m
Topographic wetness index (TWI)2015500 m
Slope2015500 m
Terrain ruggedness index (TRI)2015500 m
Valley depth (VD)2015500 m
Climate factorLand surface temperature (LST)2015500 m
Precipitation2015500 m
Temperature2015500 m
Soil propertiesClay percentage2015500 m
Soil moisture2015500 m
Table 2. Performance metrics of ML models for SOC density prediction using different cross-validation strategies.
Table 2. Performance metrics of ML models for SOC density prediction using different cross-validation strategies.
ModelCross-ValidationR2RMSEMAE
RF3-fold0.69712.6471.940672
5-fold0.73342.62381.951615
10-fold0.70072.65421.937285
XGBoost3-fold0.67582.73251.901836
5-fold0.70932.59901.863551
10-fold0.69072.73251.901836
LightGBM3-fold0.71872.59981.832515
5-fold0.75372.49281.719547
10-fold0.74922.59141.820192
Table 3. Topsoil SOC density statistics.
Table 3. Topsoil SOC density statistics.
Sample SizeMaximum (kg/m²)Minimum (kg/m2)Average (kg/m2)Standard Deviation (kg/m2)KurtosisSkewnessCoefficient of Variation (%)
32528.20.155.303.743.521.2570.59
Table 4. Comparison of model performance in SOC estimation based on verification test.
Table 4. Comparison of model performance in SOC estimation based on verification test.
ModelRMSE
(kgC/m2)
MAER2
RF2.62381.95160.7334
XGBoost2.59901.86360.7093
LightGBM2.59141.71950.7537
Table 5. Comparison of SOC density values based on estimation models and sampling sites.
Table 5. Comparison of SOC density values based on estimation models and sampling sites.
ModelAverage
(kg C/m2)
Median (kg/m2)Standard
Deviation
(kg C/m2)
Coefficient of Variation (%)
RF5.645.342.1343.10
XGBoost4.954.582.3744.84
LightGBM5.294.831.8132.16
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Chen, Q.; Zhou, W.; Shi, W. Estimation of Soil Organic Carbon Density on the Qinghai–Tibet Plateau Using a Machine Learning Model Driven by Multisource Remote Sensing. Remote Sens. 2024, 16, 3006. https://doi.org/10.3390/rs16163006

AMA Style

Chen Q, Zhou W, Shi W. Estimation of Soil Organic Carbon Density on the Qinghai–Tibet Plateau Using a Machine Learning Model Driven by Multisource Remote Sensing. Remote Sensing. 2024; 16(16):3006. https://doi.org/10.3390/rs16163006

Chicago/Turabian Style

Chen, Qi, Wei Zhou, and Wenjiao Shi. 2024. "Estimation of Soil Organic Carbon Density on the Qinghai–Tibet Plateau Using a Machine Learning Model Driven by Multisource Remote Sensing" Remote Sensing 16, no. 16: 3006. https://doi.org/10.3390/rs16163006

APA Style

Chen, Q., Zhou, W., & Shi, W. (2024). Estimation of Soil Organic Carbon Density on the Qinghai–Tibet Plateau Using a Machine Learning Model Driven by Multisource Remote Sensing. Remote Sensing, 16(16), 3006. https://doi.org/10.3390/rs16163006

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop