1. Introduction
Landslide refers to the natural phenomenon in which the soil or rock mass on a slope slides down as a mass or as fragments along a certain soft surface or soft zone under the influence of gravity, which is affected by river scour, groundwater activity, rainwater soaking, earthquake, and artificial slope cutting [
1]. Approximately 1000 deaths and billions of dollars in property damage are attributed to landslides annually [
2]. Under the influence of the Qinghai–Tibet plateau uplift, China’s geological structure and topography are extremely complex and unique, creating favorable geological conditions and abundant material conditions for landslides. In addition, mountainous regions account for approximately 67% of the total land area, and more than 56% of the population resides in mountainous regions [
3], making China one of the countries most severely impacted by landslides in the world. For this reason, research into the prediction and evaluation of landslide susceptibility in complex mountainous areas is an effective means of preventing and controlling large-scale landslides. It is possible to develop prevention and control measures by mastering the spatial and temporal landslide distribution and by analyzing the probability of landslide instability given a particular geological environment.
Currently, domestic and international susceptibility evaluation methods for landslide disasters are primarily divided into three categories: empirical models, statistical analysis models, and machine learning models [
4]. (1) Empirical model refers to the process by which professional and technical personnel evaluate the occurrence of landslide disasters based on a comprehensive understanding of the research area, informed by expert experience knowledge, and rationale justification. Expert knowledge-based approach [
5,
6,
7] and analytical hierarchy process [
8,
9,
10] are the primary methods. This method is highly subjective and overly dependent on the knowledge and experience of experts. (2) Model for statistical analysis. This technique focuses on analyzing the relationship between slope, aspect, DEM, curvature, lithology, and other environmental factors and landslide hazards. This model creates a landslide prediction model in order to compute the susceptibility index. The information value model [
11,
12], the logistic regression model [
13,
14,
15], the frequency ratio model [
16,
17], the weights of evidence method [
18,
19,
20], the certainty factor method [
21,
22], etc., are widely utilized models. This method has the advantage of strong interpretability, allowing quantitative analysis of the close relationship between landslide and influencing factors and the mutual relationship between these variables. Due to the complexity of landslides and topographic, geomorphic, geological, and other factors, however, it is difficult to adhere to a strictly linear relationship, leading to erroneous evaluation results. (3) Model of machine learning. With the rapid advancement of computer technology, machine learning technology offers a novel approach to studying the susceptibility of landslides. It improves the accuracy of landslide susceptibility evaluation with its potent nonlinear mapping capability. The principal models consist of support vector machine [
23,
24], random forest [
25,
26,
27], decision tree algorithm [
28,
29], artificial neural network [
30,
31,
32], etc. The effectiveness of the aforementioned models in predicting and evaluating landslide susceptibility is demonstrated. However, the current evaluation model does not fully account for the nonlinear characteristics of landslide displacement on time series in complex mountainous regions with frequent earthquakes. In reality, the probability of landslide risk varies across time scales. Even if a coupling model is utilized, it is difficult to improve the accuracy of the evaluation. Notably, it is more difficult to precisely control the model accuracy when evaluating the susceptibility to landslide hazards over a large area [
33].
Time-series InSAR technology has rapidly advanced in recent years, offering new technical support for the early detection, monitoring, warning, and risk assessment of landslide disasters. In surface deformation monitoring, time-series InSAR technology has a greater monitoring range than other technologies and is not affected by the weather. As such technology is able to obtain images and earth surface data at any time of day or night [
34], rendering it a one-of-a-kind remote sensing method for deformation monitoring based on surface data [
35]. In complex mountainous regions with frequent landslide disasters, the combination of time-series SBAS-InSAR technology and high-resolution optical remote sensing images can achieve early landslide identification and inverse the slow deformation development process of landslide points in time series [
36,
37,
38]. Random Forest is the most popular and effective algorithm for supervised learning, taking into account the ability to solve regression and classification problems. It has been successfully applied to the evaluation and forecasting of the susceptibility of a variety of geological disasters. In comparison to the conventional optimization algorithm based on a gradient descent, the particle swarm optimization (PSO) algorithm has the advantages of enhanced robustness, excellent scalability, and resistance to local optimum. In comparison to optimization methods based on the natural evolution process, such as evolutionary programming and genetic algorithm, the information sharing mechanism of the particle swarm optimization algorithm accelerates the population convergence to the optimal value [
39]. The combination of SBAS-InSAR technology and the PSO-RF algorithm provides a theoretical and practical foundation for the prediction and evaluation of landslide susceptibility.
In light of the issue with existing landslide susceptibility evaluation methods, this paper proposes a combined SBAS-InSAR and PSO-RF algorithm for evaluating the susceptibility of landslide disasters in complex mountainous regions. The specific contents of the research are as follows: (1) Using SBAS-InSAR technology, the deformation variables of existing landslide points and potential landslide points were inversed. According to the deformation variables, high-resolution optical remote sensing images were utilized to locate potential landslide points. (2) Employing traditional geological factors, terrain factors, environmental factors, and human engineering activities, as well as the landslide time series deformation and seismic factors, the PSO-RF algorithm is applied to construct a model, and the landslide susceptibility index is obtained through learning, training, and testing. The algorithm improved the effectiveness and precision of landslide susceptibility evaluation, thereby preventing the loss of life and property caused by inaccurate evaluations.
3. Methodology
Figure 3 depicts the technical path of this research method, which consists primarily of landslide deformation acquisition and identification, evaluation unit division, evaluation factor selection, and PSO-RF model development.
3.1. Landslide Deformation Acquisition and Identification
In the study area, there were 122 existing landslide hazards that were recorded in the landslide cataloging data by December 2021. However, the identification and recording of some potential landslide hazards had not been completed for these hazards. The 61 ascending and 58 descending orbits Sentinel-1A radar datasets were downloaded from the European Space Agency (ESA) website (
https://scihub.copernicus.eu/dhus/#/home accessed on 8 December 2021) in order to obtain the deformation data of each landslide hazard in the study area and to identify potential landslide potential hazards. The orbit was corrected using the precise orbit determination (POD) data. The systematic error introduced by orbit error can be effectively eliminated. Simultaneously, the image located in the middle of the time baseline and the frequency center of the Doppler sequence centroid are chosen as the super master image. Throughout the entire processing, the super master image served as the reference image, and all images were registered to it. Afterward, interference processing was conducted on all registered image pairs. To suppress speckle noise more effectively, the range looks and azimuth looks were set to 4:1, and the unwrapping and filtering techniques were Minimum Cost Flow and Goldstein, respectively. After removing unsatisfactory interference data from the interferograms. DEM data were utilized to eliminate the flat land phase and topographic effect in order to generate the time series interference phase. The interference phase of the master and slave image [
40,
41,
42] was as follows:
where
is the terrain phase,
is the deformation phase,
is the atmospheric delay phase,
is the flat phase, and
is the noise phase. The effective deformation data was extracted using phase unwrapping, and the deformation rate was inversed using singular value decomposition (SVD). Finally, the deformations of the time series were geocoded and projected onto the study area. LOS (Line of Sight) direction deformation rate in the study area was obtained using data from ascending and descending orbits.
Since the rate of deformation in the LOS direction was the projection of the rate of surface deformation in the radar sight line direction, any surface deformation can be expressed by three components: SN (N), EW (E), and vertical (U) [
43]. The contribution rate of vertical deformation to the LOS direction of satellite movement is more than 90% regardless of ascending and descending orbits, according to the geometric relationship between radar side-view imaging and the relationship between LOS deformation and surface deformation observed by InSAR [
44]. Therefore, the vertical deformation calculation formula presented in this paper was:
The study area was located in complex mountainous regions with frequent earthquakes, a large difference in terrain height, dense vegetation, and deep river valleys, as well as severe decoherence issues. This paper considered the use of ascending and descending radar data to obtain accurate and comprehensive monitoring results while avoiding the geometric distortion caused by single orbit data. Additionally, the normalized difference vegetation index (NDVI) was implemented to analyze the vegetation cover in the study area to eliminate the decoherence prompted by vegetation cover.
It is impossible to determine whether a potential landslide hazard exists in the study area based solely on the deformation monitoring results. As a corollary, it is essential to thoroughly assess the deformation range, slope, slope aspect, elevation, curvature, and other data in order to determine whether it is a potential landslide hazard. To prevent inaccurate assessments arising from excessive reliance on surface deformation monitoring results. The cross-validation monitoring results from ascending and descending orbits were chosen in order to confirm the accuracy of identification and the precision of SBAS-InSAR monitoring results, and the field investigation confirmed the precision of potential landslide identification.
3.2. Grid Cell Division and Evaluation Factors Selection
The evaluation cell, which can be regular or irregular, is the smallest spatial graph element used in the susceptibility evaluation for landslide hazards [
45]. The evaluation of landslide susceptibility necessitates the selection of an appropriate evaluation cell. Five categories can be used to summarize the commonly used evaluation cells: regional cell, slope cell, grid cell, terrain cell, and uniform condition cell. Regional cell is the basis of geographical spatial division and regional policy. Slope cell is the basic unit of the development of landslide disasters, which can obviously reflect the difference of regional geological environment conditions. Grid cell, which divides the study area into regular grids according to a certain size. It is the most widely used assessment cell for landslide susceptibility assessment. Terrain cell refers to the basic cell of land resource survey. It is divided according to the relationship between slope damage and geomorphic environment and is applicable to the assessment of regional landslide susceptibility in small regions and large scales. Unique condition unit is used to obtain several irregular evaluation cells with different sizes by superimposing and analyzing all evaluation factors, which is applicable to a large-scale study area. The best options for mountainous regions with complex terrain are grid cell and slope cell [
4]. The slope cell can account for the original natural geographic data, such as the topography and natural slope of the area under study. However, the operation of the slope unit is complicated, subject to subjective factors, and discontinuous, which makes it impossible to ensure accuracy. Despite the fact that the grid cell cannot preserve the original surface morphology of the research area, it is favored by the majority of researchers due to its simple operation, fast calculation speed, timely error correction, and effective visualization of calculation results. Therefore, evaluation cells are divided in this paper using grid cells. Tang et al. [
46] proposed an empirical formula for determining the basic size of grid cells:
where
is the grid size, and
is the reciprocal of the basic data scale. By calculation, the size of grid cells in this study was 30 m × 30 m. In consideration of the landslide area, mapping requirements, and other factors, the 60 m × 60 m evaluation grid cell was finally selected for this study. The study area was divided into a total of 412,585 grid cells.
The formation and progression of landslides depend primarily on geological environment conditions and inducing factors. The degree of landslide hazard is closely related to human engineering activities. Ludian county is situated in an active fault zone with frequent earthquakes, making it extremely vulnerable to landslide devastations. Based on previous research, 12 influencing factors, including geological factors, topographic factors, environmental factors, and human engineering activities, were selected [
47,
48,
49]. As evaluation factors, four influencing factors were considered: landslide time series deformation of ascending and descending orbit, seismic intensity released by the Ludian earthquake on 3 August 2014, and epicentral distance with a 2 km radius. Through ArcGIS multi-value extraction to points, 16 evaluation factor attribute values were extracted from the respective grid cells of the landslide area and non-landslide area. The validity and dependability of the 16 evaluation factors were then determined using the Pearson correlation coefficient analysis and multicollinearity analysis using SPSSPRO software.
3.3. PSO-RF Model
Random Forest (RF) is a parallel enhanced machine learning algorithm proposed by Breiman in 2001 [
50] that integrates the bagging method and classification regression tree (CART). It extracts multiple samples from the original samples using the Bootstrap resampling method. Modeling of decision trees is performed on each Bootstrap sample. The predictions of multiple decision trees are then combined, and the final prediction result is determined by voting [
51].
The user specifies two parameters in the random forest algorithm: first, the number of features (max_features) used in generating the decision tree determines the classification strength of the decision tree in the random forest. The prediction accuracy of the decision tree would suffer if there is an insufficient number of features to reliably classify the data, while an overabundance of features will cause certain boundary values to distort the normal classification result. The second is the number of trees in the random forest (n_estimators). The number of trees in the random forest has a significant impact on its influence. When generating a random forest, if there are too few trees, an underfitting phenomenon may occur, while if there are too many, an overfitting phenomenon may occur. For this reason, this paper presents the particle swarm optimization algorithm to optimize the number of selection trees and the number of features in the random forest, in order to find the best “collocation combination” of these two parameters.
The computational model of the Particle Swarm Optimization (PSO) algorithm [
52] is derived from the foraging mode of birds. Initialized with a collection of random particles, the PSO optimization algorithm iteratively determines the optimal solution for the current function. All particles update their position and velocity during each iteration using two “extreme values”. The first “extreme” is the optimal solution found by the particle itself, which is the portion extreme pbest. The other “extreme value” is the optimal solution discovered under the current conditions of the entire particle swarm, specifically the global extreme value gbest (gbest is the best value in pbest).
The algorithm process of the PSO-RF model is as follows: as independent variables, the number of trees in the random forest and the number of constructed random forest features utilized. As the dependent variable, the evaluation index of the model classification result was selected. Assuming that the three are linear, linear regression is applied to the classification results of random forest. To obtain the maximum value, the PSO algorithm was then given the function that was obtained through regression. The number of random forest trees and the number of random forest features with the maximum point were obtained. The random forest is reconstructed, and the data set is classified once more to procure the final classification result, which is, by default, the best classification result. The algorithm building process of PSO-RF model is shown in
Figure 4.
Figure 4 shows the construction process of PSO-RF model in detail, and the whole process can be divided into two parts. The first part is the construction of the RF model by continuously adjusting the number of trees and the number of features in the random forest to obtain the set of triples (number of trees, number of features, and the classification accuracy of the random forest with this parameter). In the second part, PSO is used for parameter optimization. The final classification result is obtained by substituting the optimal parameters into the random model and classifying them again.
5. Discussion
5.1. Model Precision Analysis
The purpose of accuracy evaluation is to assess the predictive performance of a model. The comparison of the classification results with the actual results served as an example of the model performance (how accurate the prediction is). From a qualitative perspective, the outcomes of landslide susceptibility prediction are shown in
Figure 9, demonstrating that the distribution patterns of landslide susceptibility predicted by the four models developed in this study are identical in Ludian County, proving the applicability and reliability of machine learning models in landslide susceptibility prediction. The receiver-operating characteristic (ROC) curve, the area under the curve (AUC), and the accuracy (ACC) are utilized for quantitative evaluation. The closer the ROC curve is to the upper left, the better it is, whereas the closer it is to the lower right, the worse it is, and a curve below the reference line indicates that the model is completely unusable. The AUC value ranges from 0 to 1. When the value is higher, it indicates that the model is more accurate. On the basis of AUC values, model accuracy levels can be categorized as follows: 0.5 to 0.6 (poor), 0.6 to 0.7 (moderate), 0.7 to 0.8 (good), 0.8 to 0.9 (excellent), and 0.9 to 1.0 (near perfect) [
53]. As shown in
Figure 10, the performance of the BP, SVM, and RF models for assessing landslide susceptibility in Ludian County is above excellent, with the random forest model performing the best, followed by the SVM model, and then the BP model. To further quantify the performance of the prediction model, the ACC value was selected to evaluate the model’s performance. The ACC values were computed using a confusion matrix that reveals the relationship between the model’s predicted and actual results.
Table 7 displays the results of the calculations, which revealed that the random forest model had the best performance among the single models, with an ACC value of 0.8531, which was 2.57 and 2.20 percentage points higher than BP and SVM, respectively. Using the fast global optimization search function of the PSO algorithm, the particle swarm algorithm optimized the number of decision trees (n_estimators) and the number of random forest features (max_features) to choose the best random forest model. The AUC and ACC values of the PSO-RF model were 0.9567 and 0.8874, outperforming the random forest model by 2.74 and 3.43 percentage points for the same set of input features of the landslide prediction model. The results indicated that the PSO-RF model indicates a near-perfect prediction performance in predicting the landslide susceptibility in complex mountainous regions and was more applicable to the evaluation of landslide susceptibility prediction in this study area than the other three models.
5.2. Comparison with the Grading Evaluation Factor
According to the reviewed literature [
47,
54], first, we graded the input variables (15 evaluation factors) and calculated the frequency ratios for each factor after grading. The results of the grading factors and frequency ratio calculation are shown in
Table 8. Finally, the frequency ratio of the evaluation factor was input into the PSO-RF model constructed in this paper and the other three machine learning models (BP, SVM, and RF) to predict the landslide probability of 412,585 grid cells in the study area. The landslide susceptibility of the study area was graded based on the probabilities of each grid point using the natural breakpoint method in the ArcGIS 10.2 software platform. During the experiments, the input variables were graded based on the literature [
53]. For example, the slope directions varying from 0 to 360° were divided into eight directions, i.e., north, northeast, east, southeast, south, southwest, west, and northwest, values close to 360° and 0° were combined as the north direction, and the frequency ratios were calculated for each direction. In the modeling process, the model parameters were the same as the prediction model when the input variables are not graded. Considering the graded input variables, the predicted results of the three models, BP, RF, and PSO-RF, were consistent with the trend of ungraded input variables. However, the SVM model was used to input the graded variables. Compared with the prediction results of the input variables without grading, the prediction results of the northeast, east, and southeast directions of the study area were very poor. To further describe the accuracy of the prediction results, we counted the AUC and ACC values for the four methods, as shown in
Table 9. It can be seen from
Table 9 that, after grading the input variables, the AUC and ACC values were lower than those of the ungraded input variables. The reason may be that the study area was located in the north bank of Niulan River with huge terrain elevation differences, crisscross canyons, active fault zones, strong tectonic movement, and frequent earthquakes, which make the rock and soil mass in the region broken. Under the influence of special geological conditions, rainfall, and earthquake, the randomness of landslide occurrences is very large. If the input variables are graded, the effect of some environmental factors will be ignored, which will reduce the prediction accuracy. Therefore, this paper chose to directly input evaluation factors for the landslide susceptibility evaluation.
5.3. Landslide Susceptibility Evaluation Model Analysis
The study of the landslide susceptibility evaluation yielded a large number of successful examples from both domestic and international researchers. However, there are still drawbacks, such as the inability to detect landslide activity, the lack of timely landslide disaster data sources, and the requirement of a large number of experts to participate in statistics. Targeting the issues of slow updates and ineffectiveness of data sources for landslide disasters, in this paper, the SBAS-InSAR technology, a Google satellite image with a resolution of 0.5 m, and other auxiliary data were used to identify landslide disasters in complex mountainous regions with frequent earthquakes, deep valleys, and high topographic elevation. The surface deformation rate was inversed by calculating the phase variation of the ascending and descending orbit radar images. Resultantly, the active situation of landslides and potential landslide hazards could be more accurately identified. The accuracy of the InSAR recognition results could be enhanced by incorporating a Google satellite image with a resolution of 0.5 m and auxiliary data. This paper proposed using a PSO-RF model to predict the susceptibility of landslides in an effort to mitigate the disadvantage of requiring a large number of experts to participate in statistics. During the modeling procedure, the susceptibility index of the grid cells in the study area was predicted by inputting various grid cell learning evaluation factors. This effectively avoided a large number of expert statistics and reduced the manual participation error in the calculations, thereby improving the accuracy of the evaluation model. This paper integrated the SBAS-InSAR technique to obtain the surface deformation rate under different orbit (ascending and descending orbit) operations of the satellite to address the problem that landslide activity cannot be detected. This method was used to identify existing landslides and potential landslides in the study area, thereby increasing the efficacy of the data source for landslide disasters. Due to the relatively high weight of the evaluation factors, some stable landslide points without deformation were prevented from being evaluated as extremely high or high areas.
Compared with traditional landslide hazard survey techniques, the method proposed in this paper can quickly update landslide data sources, detect landslide activity, and effectively avoid a large number of statistical calculations with experts. A landslide susceptibility evaluation in complex mountainous areas can be quickly carried out. However, there are some shortcomings in the selected evaluation factors. For example, due to the lack of detailed formation of the lithology data during the experiment, the formation lithology was simply divided into four categories: hard rocks, loose soil, soft rocks, and harder rocks. Different formation lithology has different shear strengths, and the possibilities of landslides are not same. In the next study, we will obtain more detailed evaluation factors data and explore the general applicability of the model.
6. Conclusions
By analyzing the problems of the existing landslide hazard susceptibility evaluation model, such as poor effectiveness and inaccuracy of the landslide hazard data and the need for experts to participate in the calculation of a large number of evaluation factor weight classification statistics, in this paper, a combined SBAS-InSAR and PSO-RF algorithm was proposed to evaluate the susceptibility of landslide disasters in complex mountainous regions. In the experiment, 61 ascending and 58 descending orbits Sentinel-1A radar datasets were used to invert the times-series deformation of Ludian County from January 2020 to December 2021. Then, potential landslide hazards in the study area were identified with the support of auxiliary data, such as high-resolution optical remote sensing, DEM, and the slope, and the landslide cataloguing data sources were updated after the identification results were verified in the field. Finally, the PSO-RF model was constructed to evaluate the landslide susceptibility of the study area. Based on the study, the following conclusions can be drawn:
- (1)
Compared to traditional landslide disaster survey techniques (such as field investigation, GNSS monitoring, etc.), the SBAS-InSAR technology can quickly determine the surface deformation of the study area. The technique identified 97 and 122 potential landslide hazards in the ascending and descending deformation rate field, respectively, updating the existing landslide cataloging data to 329.
- (2)
Through analysis and verification, the ascending and descending orbit deformation rates obtained by the SBAS-InSAR technique can be used as a significant factor in the classification of landslide susceptibility.
- (3)
By analyzing real landslide and non-landslide data, the performances of the PSO-RF algorithm and three other machine learning algorithms, BP (back propagation), SVM (support vector machines), and RF (random forest) algorithms, were compared. The results showed that the PSO-RF model proposed in this paper had the best performance and evaluation results. The area under the curve (AUC) value and the accuracy (ACC) of the PSO-RF algorithm were 0.9567 and 0.8874, which were higher than those of the BP (0.8823 and 0.8274), SVM (0.8910 and 0.8311), and RF (0.9293 and 0.8531), respectively.
- (4)
The method proposed in this paper, on the one hand, effectively identified the deforming and potential landslide hazards in the study area, quickly updated the landslide data source, and solved the problems of poor effectiveness and uncertainty of the existing landslide hazard data source. On the other hand, the disadvantage of the traditional landslide susceptibility evaluation model, which requires weight calculation and statistical classification, is prevented by the PSO-RF model developed in this paper. In terms of prediction, it avoided a significant amount of manual expert decision-making. It can serve as a useful reference for future disaster prevention and reduction decisions made by government departments.
Landslide disasters are characterized by a complex mechanism of development and strong suddenness. Even though the impact of seismic intensity on landslide susceptibility in complex mountainous regions is considered in this study, there are still many issues that require further research. For instance, the more important aspects of landslide risk management should be investigated. Investigating the process of landslide formation, in order to explore the possibility of developing a more optimized landslide intelligent model, both landslides precipitated by rainfall and earthquakes are analyzed independently. In future research, we intend to continue conducting pertinent research premised on the aforementioned considerations.