1. Introduction
Among the various natural hazards, landslides are recognized as one of the most destructive and hazardous threats in several parts of the mountainous world. It has been noticed that about 5% of all fatalities in earthquake events are caused by coseismic landslides, in some cases even more [
1]. For example, the recent Hokkaido Eastern Iburi earthquake on 6 September 2018, about 80% of the fatalities are caused by the landslides alone [
2]. Apart from the fatalities, they also cause huge economic losses by damaging properties such as buildings, bridges and roads; this trend is observed more than any other natural disasters, such as earthquakes, typhoons, heat waves, sinkhole collapses, floods and forest fires [
3,
4,
5,
6]. The increased amount of urbanization and economic development together with the unusual frequency of severe regional precipitations owing to global climate change, the landslide hazard losses are expected to rise in the future [
7,
8,
9]. To mitigate and reduce the economic losses and risks associated with the landslide hazards, there is an urgent requirement to identify and map the landslide-prone areas.
Landslide susceptibility mapping (LSM) is regarded as a prime step for in the implementation of immediate disaster management planning and risk mitigation measures [
4,
6,
10,
11,
12]. Most LSM models issued hitherto have been targeted at single-type-induced landslides [
13,
14]. Nevertheless, in areas such as the Chuetsu area, Japan, where landslides can be mainly activated by both earthquakes and heavy rainfall, some snow-melt, it is essential to couple frequently both types into the susceptibility modeling primarily because of the following reasons: (i) earthquake-induced, as well as rainfall-triggered landslides, are solely governed by interrelated environmental factors and partial understanding of landslide occurrence without considering their differences will produce misleading results [
15]; (ii) it can be seen that after a strong seismic activity, rainfall-triggered landslides are prone to increase in both scale and amount, an area with steeper slopes become more susceptible [
16]. Thus, an earthquake- triggered model is probably to have the ability to enhance a rainfall-induced landslide.
Large physically based landslide susceptibility processes rely on digital elevation model (DEM) to characterize the terrain parameters which fundamentally describe the local elevation, slope, hydrologic and various other geomorphic processes. Although the wide range of available DEMs in today’s world produces a rapid analysis of terrain attributes, several studies have shown the effects of grid size in the final portrayal of the land surface models [
17,
18,
19]. Therefore, the selection of an appropriate grid size is significant in any susceptibility mapping. By comparing varying resolutions of DEM (30 m vs. 6 m; 10 m vs. 2 m DEM), Dietrich and Montgomery [
20] concluded that, with a finer elevation model, the patterns of classifications are much more strongly defined by the ridge and valley characteristics. In another study, Claessens et al., [
21] studied the distribution of slope and other terrain factors for shallow landslide mapping using four different elevation model (10 m, 25 m, 50 m and 100 m) and concluded that uncertainty in the results increases with the coarser DEM. The accuracy of freely accessible DEM also sometimes poses a question [
17]. Recently, with the technological advancement in light detection and ranging (LiDAR) methods, usage of high-resolution digital elevation model (DEM) in landslide assessment accuracies has become progressively improved over time [
22,
23]. Jaboyedoff et al., [
24] and others [
25] attributed the significance of LiDAR DEM in landslide mapping studies and advocated that application of LiDAR data for landslide researches would noticeably boost in the coming years, given extensive data availability. For example, Dou et al., [
23] used 2 m LiDAR DEM to discriminate the different landslide types and indicate that LiDAR DEM data area promising in landslide delineation. The near-precise information available from LiDAR data, when incorporated with cutting-edge data mining techniques, is able to produce highly accurate LSM [
22,
25]. Regarding the prompt state of development in LiDAR technology, several potential features present in the data is still not explored to the full potential such as the capability to quantify topographic features at catchment level as well as the connection of these with the hydrological factors including wetness index. Moreover, only very limited researches have scrutinized the identical study area by applying multiple statistical techniques to assess the reliability of models based on rainfall- and earthquake-triggered landslides.
In recent studies, various approaches of the LSM have been developed and explained in numerous papers [
13,
26,
27]. These approaches can mainly be categorized into three groups, that is, heuristic [
28,
29], deterministic [
21,
30] and statistical [
31,
32] techniques. The heuristic techniques are built on the expert’s knowledge to group landslide-prone areas into several ranks from high to low classes. This method is often used for susceptibility mapping in large areas [
21]. Whereas, deterministic techniques rely on numerical modeling of the physical mechanism that controls slope failure [
29]. However, they are not appropriate for a large-scale mapping because of their troublesome and unpractical need of a huge array of data, that is, rock mechanical properties, the wetness and soil saturation and soil depth. Statistical and probabilistic techniques including bivariate, multivariate statistical methods, certainty factor, as well as knowledge-based techniques such as artificial neural networks and fuzzy logic approaches [
33,
34] are known as promising methods for predicting the landslides [
13].
Our study is built upon this prior experience in different models to investigate the comprehensive performance of the susceptibility models using LiDAR DEM data. We address two research questions in this paper: (i) do the sophisticated data mining methods provide a better predictive competency compared with the traditional statistical methods? And (ii) how different the results while using multi-type landslides instead of single type landslides? For achieving the first objective, we analyze and compare the accuracy of LSM maps generated by five different techniques including three traditional statistical methods, that is, probabilistic likelihood-frequency ratio model (PLFR), information value (InV), certainty factor approach (CF); and the two machine learning techniques namely, artificial neural network (ANN) and support vector machine (SVM) in a regional-scale analysis. For achieving the second objective, we used the inventory of both earthquake-and rainfall-induced landslides in the analysis.
2. Overview of the Study Area
Landslides are frequently reported after earthquakes and rainfall events in the Chuetsu area, Niigata Prefecture, Japan [
35]. This area has a steep mountainous topography and conducive geology that makes it inclined to severe landsliding [
23]. Extensive landslides in this area are reported after two major seismic events; Chuetsu earthquake in 2004 and Niigata Chuetsu-Oki earthquake in 2007 [
35,
36]. The heavy rainfall in summers, typhoons and snow melting brought occasional debris movement as well [
37]. The present work is carried out in an area within the Higashiyama hill region in the Niigata-Chuetsu region Japan (
Figure 1) which covers approximately 290 km
2 area. The elevation ranges between 22 m and 734 m with an average elevation of 206 m above the sea level. The area receives an annual rainfall equaling 2000 mm, chiefly delivered by typhoons, as well as those during the summer and winter snow period from Japan Meteorological Agency.
Metamorphic and sedimentary rocks belong to the Paleocene to the Quaternary period, as well as folded mountain belts distributed over NNE-SSE axes represent the geologic characteristic of the studied portion [
38]. The epicenter of the 6.8 M Chuetsu earthquake of 2004 with the hypocenter at the depth of 13 km was located only a few kilometers away from the study region. This event also resulted in serious aftershocks in southern Higashiyama Mountain. Consequently, thousands of mass movement events occurred in the region (
Figure 2). Numerous roads, houses, bridges and other infrastructures were severely damaged. The damages due to the event were largely concentrated on the Imo river basin the extent to which makes it necessary to assess similar hazards to mitigate the damages in the future.
4. Results
4.1. Modeling Result with the Probabilistic Likelihood-Frequency Ratio
The relationship between the spatial location of the landslides and landslides causative factors are processed as shown in
Table 6. According to
Table 6, the PLFR values of elevation classes are greater than 1 at the ranges of 131–190 m, 190–46 m, 301–357 m, 357–413 m with the highest value (1.74). The results show that PLFR values increase with increasing altitudes till it reaches 357 m elevation in the study area. Its values drop further and become less than 1 after 413 m. This means that the possibility of landslide occurrence increases till it reaches a certain height and then decreases when the altitude is higher than 413 m.
With slope angle factor, PLFR values are greater than 1 from 17° to 55°. The landslide occurrence in the slope classes 17°–27°, 27°–39° and 39°–55° are 22.17%, 39.42% and 23.85%, respectively. Following the general trend, it can be seen that the occurrence of landslides progressively increases with an increasing slope. The percentage of landslide occurrences drops sharply after reaching the 55° slope angle. According to the results, it is clear that almost the landslide occurrence increases from one slope gradient up to a certain extent and then it decreases. The shear stress of the soil usually increases following with increasing slope angle.
For the aspect class, the significant number of landslides happened among east, southeast, south, southwest and west-facing directions, their frequency ratio value is all greater than 1. It indicates that the direction from east to west is highly susceptible to the landslide occurrence. A plausible explanation for this condition is that from east to west facing directions are general concerning fully rock weathering. Therefore, around these directions are susceptible to occur landslides.
For the plan curvature, the PLFR value of convex (1.15) has a large value than 1, while concave value (0.89) is less than 1. The results show that many landslides occurred at the convex areas.
The causative factor of drainage density (DD) has a larger value than 1 with the range of 2–4, 4–6 and 6–9. The maximum PLFR value is 1.12. Otherwise, the density of DR is greater than 9, the PLFR value becomes less than 1 and it has a lower probability of landslide occurrence.
In the case of the lithology, it is clear to see that the frequency ratios of Av, Um, KI, Uv, Am, Ku, S, As, W, Tv, th1, Ks are all more than 1. According to the investigation of lithological conditions, the landslides occurred mostly in the sandstone, massive mudstone, sandstone, gravel, sand and silt area corresponded with positive PLFR values.
In the case of the density of the geological boundary, from the 2–27, the PLFR values are higher than 1. The maximum PLFR value is 1.63 and the followed value is 1.49. The largest and second largest probabilities of landslide occurrence are 25.26%, 21.73% respectively. The density of geological boundary is less than 2 and the PLFR value is less than 1 which indicates that a lower percentage of landslide occurs in the density of geological boundaries.
4.2. Modeling Result with the Information Value (InV)
The result of the information value method and the relationship between the spatial location of the landslides and landslides causative factors are processed as shown in
Table 6. According to
Table 6, the InV values of elevation classes are susceptible at the ranges of 131–190 m, 190–246 m, 301–357 m, 357–413 m with the highest value (2.98), followed by 2.94 and 2.88. Results show that InV values decrease with the decreasing and increasing altitude in the study area. These results are similar to and PLFR’s results.
Regarding slope angle factor, the highest InV value is 0.57 from 39°–55°, followed by 0.52 from 27°–39°. The landslide occurrence in the slope classes, 27°–39° and 39°–55° are 27.97%, 36.7%, respectively. Again, what is observed similar to the PLFR, results of InV also shows that the occurrence of landslides gradually increases with an increasing slope angle until it drops after 55° slope angle due to the relatively lower percentage of the total study area. According to the results, it is clear that almost the occurrence of landslides increases from one slope gradient up to a certain extent and then it decreases because of increasing the shear stress of the soil with increasing slope angle. Therefore, the gentle slope has a relatively lower frequency of landslide occurrence because of the lower shear stress corresponded with a lower gradient. The steep slope angle normally causes the collapse to occur.
For slope aspect, landslides were prone to occur in the East, SE, South, SW and West facing slopes. The highest Wi value is 2.76 in the south direction, the following 2.66 in the south-east and south-west direction. To the density of geological boundary, as its values increases, the Wi values also increases that means more landslides occurred. The maximum Wi value 3.31 is obtained for the class with densest geological boundary (15–27) followed by a value of 2.82 in the lower geological density class (10–15). The greater density of geological boundaries indicates a plane of weakness or zone of discontinuity that leads to instability of rock bodies.
For the plan curvature, the value of convex (2.53) is a little larger than the value of concave (2.27). In concave and convex, the percentage of landslide occurrences is very close (51%, 49%, respectively). The drainage density shows that Wi value for 2–4, 4–6 and 6–9 are greater than 2. The Wi maximum value is 2.63 observed with 4–6 DD class. The highest percentage of landslide occurrences, 32.51%, also relates to the same DD class followed by 27.04% of the 2–4 DD class. The results of the lithology show that landslide is susceptible to in these types, namely massive mudstone, sandstone and alternation of sandstone and mudstone.
4.3. Modeling Result with the Certainty Factors
In
Table 6, positive values of CF close to one are observed for the elevation classes 131–190 m, 190–246 m, 246–301 m and 301–357 m equaling 19.96%, 23.37%, 19.35% and 15.23%, respectively. It can also be seen from
Table 6 that for an altitude of 190 m to 357 m, the CF values are greater than 0.5. This suggest that landslides are frequent in the mid-altitude and therefore their CF values; We observed that the ratio total pixels documented in mid-altitude are greater than that in the higher altitudes; whereas lower elevated areas are having the gentle slope and thus are not prone to landslides.
For slope class, CF values are found close to one from 17° to 55° class. The spatial distribution of landslides in the slope classes 17°–27°, 27°–39° and 39°–55° are 22.17%, 39.42% and 23.85%, respectively. Similar to PLFR and InV, the results are consistent for CF also which shows a gradual increase of landslide percentage till it reaches the slope angle 55°.
For aspect class, most of the landslide occurrences are found in the East, SE, South, SW and West facing slopes with a CF value between 0.13 and 0.57. The maximum percentage of landslides is found along the southern slope followed by SW slopes, equaling 15.76% and, 15.73%.
Curvature, the second derivate of the slope, provides valuable information on landslide occurrences. In this study, the CF corresponding to concave curvature gives a negative (−0.18) value, whereas the convex curvature corresponds to a positive (0.21) value. In most cases, the convexity indicates a low CF value than the concavity because of more water retention capacity in concave slopes which increases the soil moisture content that ultimately reduces the soil stability. Contrary to this, we found that concavity is not responsible for the landslide occurrences in this study region because most of the landslides are seismically induced. Mountaintops tend to collapse in coseismic cases because of topographic amplification differences.
A positive CF is recorded for the drainage density (Dd) classes 2–4, 4–6 and 6–9. The maximum CF is recorded for the study area is 0.2 which is observed for the Dd class 4–6, corresponding to 29.89% of landslide occurrences. As mentioned before, the largest chunk of landslides in Chuetsu-Niigata inventory is collected for the 2004 earthquake, the observed CF values are comparatively very small, which confirms the results.
CF values are found strong positive for the following lithology classes: Sandstone and alternation of sandstone and mudstone from Late Miocene–Early Pliocene (As); mudstone interbedded with sandstone (Ku), Andesitic pyroclastic rock (Uv), sandstone (Ks), massive mudstone (Um), sandstone interbedded with mudstone (KI) from Early Pliocene-Late Pliocene; and Massive mudstone (Am) from Late Miocene–Early Pliocene equaling 1.0, 0.94, 0.89, 0.82, 0.78, 0.75, 0.70, 0.52, respectively. The highest percentage of landslides among the lithology classes, 21.4%, occurred in Ku, followed by Am (19%) and Um (13.34%). The bedrock in the area of major landsliding consists of a folded sequence of sandstone, mudstone and their interbedding and the results point to the occurrence of landslides in the weakly cemented lithological groups.
We found that the CF values are always positive for the density of the geological boundary. The maximum CF is observed for the class 15–27 equaling 0.5; followed by the class 10–15 equaling 0.42. The percentage distribution of landslide occurrences in the above-mentioned classes is 25.26% and 21.73%, respectively. The negative CF value of the geological-density class lower than 2 indicates that geological uniformity affects the stability of the area. The higher density of geological boundaries suggests frequent process activity which leads to instability.
4.4. Modeling Result with the ANN
From
Figure 8, it can be observed that when the number of hidden neurons is 12, R
2 has the highest values (0.92). The structures of the ANN (input–hidden–output) were set as 7–12–1. The weights between each layer were acquired by training the ANN to calculate the contribution of each landslide causative factor.
To examine the robustness of the ANN model, it was repeated 10 times, each with a random set of landslide data nominated from the whole data pool. There were no much variances in the results. The standard deviation was 0.0029. Thus, the random sampling sets did not have an obvious effect on the results. In this study, the average values were calculated to interpret the results. When the ANN achieved the minimum RMSE values (0.001), the whole pixels of the study area was fed into the ANN network to evaluate the LSM map. The final weights for the smallest error were documented in the procedure and weights of each factor were fixed for the entire study area. The sets of landslide susceptibility index values attained in all pixel were then converted into raster in GIS setting.
4.5. Modeling result with the SVM
The RBF applied in this model used the subsequent factors: γ = 0.5 and C = 10, convergence epsilon = 0.003 and maximal iterations = 5000. The results of the scenarios were next analyzed to decide the optimal kernel resulting in the best predictive capability. In this study, the RBF-SVM integrated with the InV method was selected for the improvement of the LSM map. The likelihood of landslide occurrence drops in the range from 0 to 1 that was transferred into the ArcGIS 10.3 package for visualization. Finally, the landslide susceptibility index values were reclassified into five classes: very low, low, moderate, high and very high using the natural break method, for easier visual clarification of the LSM.
Figure 9. shows the spatial likelihood of landslide occurrences with the five classes, from very low (dark green) where landslides are not anticipated to very high (red) where landslides are possible using five methods.
4.6. Model Validation
For the validation process, the total landslides of the entire study area were randomly divided into two groups: training data (5921) and validation data (2538). A ROC plot of sensitivity (true positive rate) and 1-specificity (false positive rate) was made for validation. The prediction rate curves of AUC values of five models (PLFR, InV, CF, ANN and SVM) for validation are 0.77, 0.79, 0.81, 0.85 and 0.87, respectively in
Figure 10. According to AUC results, it can be seen that SVM, ANN and CF models (AUC > 0.8) are considered good for application in landslide susceptibility mapping. Additionally, PLFR and InV models (0.7 < AUC < 0.8) are regarded as acceptable. Among these five models, the highest prediction accuracy is the SVM model. Therefore, it can draw a conclusion that the performance of the presented models in this study can satisfy the requirements of landslide prediction.
Figure 11 shows that 91.87% of the total landslides occurred in 54.48% of the area classified as high (high and very high) for the PLFR. As for the CF model, 94.52% of the total landslides in 63.67% of the area as high (high and very high). For the InV method, 93.28% accounted for the entire landslides happened in the 55.76% of the area classified from high to very high susceptibility levels. In the case of ANN model, 95.51% of total landslides occurred at 60.26%, while, SVM 96.84% of total landslide occurred in 58.67% of the study area. Among the five models, over 90% landslides occurred in the high area. The above results display that the prediction landslide occurrence of three models agrees with the real condition. From their comparisons, we can see that the most landslides occurred at high and very high susceptible areas of SVM model, while more areas were predicted the susceptible-areas for the CF models, followed by ANN, SVM and PLRM models.
5. Discussion
In recent years, landslide susceptibility studies proposed using various models have been targeted mostly on a single event-type approach. However, in areas such as Japan, massive landslides can be triggered by earthquakes as well as cyclones. In such areas, a single-event approach may not produce high ROC values, but they do not necessarily provide a complete explanation of landslide occurrence due to complex natures of sliding mechanisms. Instead, such models overlook the effect of the type of landslide analyzed. Previously, Chang et al. [
42] and Li et al. [
69] studied multi-event landslide types for areas in Taiwan and China. However, studies pertaining to the Japanese archipelago is limited to multi-event landslide analysis with the exception of Iwahashi et al. [
70] and others [
71]. Nevertheless, their studies were focused on landslide characterization rather than a susceptibility modeling approach. This study used landslides triggered by the 2004 Niigata-Chuetsu earthquake and rainfall events in the subsequent period to develop a comprehensive susceptibility model for the region. The models were validated by a mix of both earthquake and rainfall-induced landslides (30%). Despite having given a mixed input, the results returned high ROC values (0.87 and 0.85) indicating the acceptance of SVM and ANN models. More importantly, this study highlights the consideration of slope factor (CF = 1 for class 39°–55°) as an important variable in the Japanese mountainous terrains while modeling the landslide susceptibility. This also confirms the trend noted by Oguchi [
72].
Methods for generating LSM maps are numerous based on the GIS platform and many published works discuss to solve the shortages and problems in landslide assessment. The result in this work continues to confirm that the quality of landslide susceptibility assessment is dependent on the method used; where two machine learning models, SVM (AUC = 0.87) and ANN (AUC = 0.85) have clearly better prediction performance than those of the other statistical models. This is because both SVM and ANN have high capacities to deal with non-linear and complex problems as landslides as confirmed in various previous studies [
60,
73]. In contrast to the SVM model and the ANN model, three statistical models (PLFR, InV and CF) have difficulties to model the complex landslides of the Chuetsu region. This is because these are bivariate models that did not take in the account the complex interactions of the seven causative factors. However, each method has its own unique characteristics and advantages in terms of their ability to include input variables and information provided in temporary stages or the final output that is used to analysis. Such as PLFR-bivariate analyses can help provide a clear understanding of the details of classes in each of the causative factors, which were well reflected in the output maps. The main advantages of CF approach is that CF supplies the advantage of rendering the definition of susceptible classes transparent for the researchers, who are not required to offer a priori the definition of hazard classes but is only charged to interpret a-posteriori the final certainty values and to subdivide the resultant interval into meaningful sub-intervals [
33]. For the PLFR model, this is the simplest amongst three models and relatively easy to perform in the ArcGIS platform. The main merits of InV procedures are that the professional, who carries out the analysis, determine the factors combinations of factors used in the evaluation and enable the introduction of expert knowledge into the process [
56]. Among the presented three traditional methods, they could provide the details analysis in each class and the results of the relationship between landslide occurrence and causative factors are very similar. The two soft-computing data mining techniques, that is, ANN and SVM, had diverged results. The strength of SVM is ascribed to the accompanying of using a radial basis function (RBF) with the InV values output to create the output map. Additionally, the SVM model is noticeable for processing both the linear and non-linear classification set. The SVM model targets to draw decision boundaries between data support vector points from different class types and separate them with maximal margins [
10,
23]. To learn the complex functions, SVM uses kernels, while ANN uses multilayer perceptron. The ANN model produced a smooth output with limited details to delineate the hazard zones. The AUC curve plots display that LSM maps produced applying the SVM model is the highest prediction accuracy, followed by the ANN model. CF has the best performance among three traditional statistical models, yet the PLFR model has the lowest accuracy. However, these outcomes indicate that all the models in this study proved reasonably good accuracy in prediction LSM of the Chuetsu area.
Apart from the models what we have compared in this study, there are other more sophisticated approaches are mentioned in the recent literature [
74]. Indeed, the comparison of different machine learning models and their performance issues has become a trend in the LSM studies [
11,
74] However, assessment of model performance not only depends on the methods, but also it is mainly decided by the quality of the collected data [
56]. For example, a random forest (RF) model applied using historical 222 landslides in Jiangxi, China with the help of 25 m DEM returned only a ROC value of 0.75 [
75]. On a similar note, Dou et al. [
34] studied the Osado Island adjoining Niigata using ANN and a 10 m DEM returned a ROC value lower than that achieved for this study. In another example, a case study of the Cameron Highlands, Malaysia using 11 landslide factors and 81 landslide locations by SVM approach achieved maximum accuracy of 0.81 only [
76] Thus, the quality of the landslide inventory and high-resolution conditioning factors are important to all statistical models. The current research utilized fine aerial photographs and LiDAR-derived data for conditioning the input variables. This study, regardless of applying multi-event type landslides and traditional machine learning techniques, therefore, achieved a better performance using high-quality dataset.