1. Introduction
A snow avalanche, subsequently referred to as an avalanche, is a natural phenomenon resulting from the interaction of the cryosphere, hydrosphere, atmosphere, and the living ecosystems that seriously affects humankind. Avalanche susceptibility (AS) mapping is a scientific process that involves predicting the likelihood of avalanche occurrences in an area based on a variety of environmental and topographical factors [
1]. The field of AS mapping is continuously evolving with advancements in data collection, analysis techniques, and computational methodologies [
2]. These advancements provide superior tools for predicting avalanches, helping to save lives and protect settlements and infrastructure by enabling effective risk management and mitigation strategies in mountainous regions.
Researchers have been using both qualitative and quantitative methods for AS mapping [
3]. Qualitative approaches (knowledge-driven), such as analytical hierarchy process (AHP), fuzzy logic (e.g., see [
4,
5,
6]), are based on expert opinion, which can be subjective. On the other hand, quantitative (data-driven) methods use numerical data, statistical analysis, and mathematical models produced without human intervention. Machine learning (ML) approaches provide valuable insights into the context of data-driven avalanche studies due to the complexity of avalanche characteristics.
The new era of computing has witnessed the great utilization of ML-based approaches with the advancements in geographic information system (GIS) and earth observation (EO) technologies in natural hazard assessments. Although relatively simple statistical analysis methods (e.g., the frequency ratio (FR), see [
4,
7]) can be used for AS assessments, novel ML approaches have introduced new possibilities to process large datasets and identify complex patterns and relationships among various factors that influence avalanches. In their study, Choubin et al. [
8] proposed an AS mapping framework with multivariate discriminant analysis (MDA) and support vector machine (SVM) for the Sirvan watershed. The AS investigations of the Darvan and Zarrinehroud watersheds by Rahmati et al. [
9] indicated that the random forest (RF) model was superior to SVM, generalized additive model (GAM), and naïve Bayes (NB) models. Wen et al. [
10] presented a case study comparing the prediction capability of the SVM, K-nearest neighbors (KNN), classification and regression tree (CART), and multilayer perceptron (MLP) methods.
The tree-based ML algorithms, such as RF, categorical boosting (CatBoost), light gradient boosting (LightGBM), natural gradient boosting (NGBoost), and extreme gradient boosting (XGBoost) have yielded satisfactory prediction results in the literature ([
2,
11,
12]). Liu et al. [
2] utilized four tree-based ML methods (CatBoost, LightGBM, RF, and XGBoost) to predict AS using multi-source spatial data, including remote sensing imagery, meteorological data, and topography in the Tianshan Mountains, China. In these studies, CatBoost outperformed the others in its predictive performance, based on accuracy value (0.93). The performance measures indicated that the spatial probability of avalanches can be computed rapidly and accurately using ML methods. However, selecting the appropriate conditioning factors and ML methods remains a challenging task.
The process of feature selection has proven highly effective in investigating the correlations and potential causal relationships between parameters that influence natural hazards [
13,
14]. However, incorporating a large number of features can lead to overfitting and limit the ability of a model to generalize due to high dimensionality [
15]. Costache et al. [
13] utilized the correlation-based feature selection (CFS) method to evaluate the predictive capacity of fourteen predictors in the context of flood susceptibility assessment. Pham et al. [
14] employed two established feature selection techniques, chi-square (CS) and backward elimination (BE), to determine the optimal set of input features for training artificial neural networks (ANNs) in landslide susceptibility modeling. Although several conditioning factors may influence AS at different levels, some may be redundant or less relevant in a specific environment. This can lead to inefficient computation, resource overconsumption, and reduced accuracy of ML algorithms.
Table 1 summarizes the ML methods and the features used for AS mapping in the recent literature.
As shown in
Table 1, the features used for AS mapping can be diverse, and the optimal set depends on the regional characteristics and data availability. Even when data availability is not an issue, a feature selection strategy needs to be implemented to identify the optimal set of descriptive features for achieving high model performance. Different feature selection methods proposed in the literature can be utilized to address this issue. These methods are broadly categorized into filter, wrapper, and embedded approaches based on their evaluation criteria [
16]. Filter methods are popular for feature selection due to their relatively low computational cost compared to wrapper and embedded methods [
17]. Although filter-based methods are computationally efficient, wrapper and embedded methods are indispensable because they consider the relationship between features and the learning algorithm [
18]. Embedded methods integrate feature selection within the model training process, providing effective selection with a reduced risk of overfitting through model-specific optimization [
19]. In contrast, wrapper methods offer high predictive accuracy by evaluating feature subsets through iterative model training and testing, thereby optimizing the selected feature subset [
20].
Yariyan et al. [
3] applied multicollinearity analysis and Pearson correlation to assess the suitability of eleven snow avalanche conditioning factors in their study. Their findings demonstrated that all hybrid models performed strongly in evaluating the susceptibility of locations to avalanches. Tiwari et al. [
15] emphasized the importance of identifying the most pertinent parameters before training. To demonstrate this, they applied parameter importance assessment (PIA) to predict AS along the Leh-Manali highway, one of the most affected regions in India. Eleven avalanche conditioning parameters were considered, and their relevance was evaluated using the Boruta algorithm.
A number of studies have been conducted on AS mapping in the Alpine region. Ghinoi and Chung [
21] employed fuzzy logic with slope, aspect, elevation, distance from crest lines, and concavities/convexities as conditioning factors in Italian Dolomites and found that the accuracies of the AS maps were between 67% and 82%. As preliminary efforts, Cetinkaya and Kocaman [
12] investigated the logistic regression (LR) and the RF methods in Davos, Switzerland, using a total of 12 conditioning factors. They stated that LR provided poor accuracy (0.74) when compared to RF (0.96). The inventory employed in the study was manually generated by Hafner and Bühler [
22,
23] based on satellite images [
24] for two avalanche periods in 2018 and 2019. Hafner et al. [
25] stated that when compiled from satellite images, small and medium-sized avalanches can be missed. In addition, shadows hinder avalanche detection. In a more recent study, Cetinkaya and Kocaman [
26] assessed the effect of different sampling strategies for AS mapping with RF in the Gross Spannort Mountain region, Switzerland, using a total of 18 conditioning factors. They suggested that implementing appropriate feature elimination and backward selection strategies could improve AS maps, though further research is needed. Iban and Bilgilioglu [
11] studied AS mapping using DT-based methods in the province of Sondrio, Italy, using a total of 17 conditioning factors and a point-based inventory with 1880 samples representing the avalanche starting points. The geographic distribution of avalanche starting points may not be uniform across the study area. Certain regions might have a higher density of samples due to accessibility, historical avalanche occurrences, or reporting biases. This could lead to an overrepresentation of specific areas and potentially skew the model’s performance in predicting AS in underrepresented regions. Regarding conditioning factor elimination, they applied a multicollinearity test but found no collinear factors to eliminate. They observed that the XGBoost performed the best with an AUC of 0.978, although the difference between the best and the worst model outputs was merely 1%, indicating similar performances among the evaluated methods.
The present study addresses the challenges of feature selection for AS mapping by combining a comprehensive set of factors with the sequential backward selection (SBS) strategy and the CatBoost algorithm to identify the minimum set of predictive variables for the target region. The CatBoost was preferred here because it is a decision-tree based method that is known for its success in many natural hazard assessment studies. It also has the advantage of handling categorical features without requiring extensive pre-processing and is often faster than RF, especially on large datasets. To the best of our knowledge, this represents a novel application of SBS in AS mapping. Reducing feature dimensionality would decrease the need for large inventories in ML modeling for AS mapping. This study integrates SBS with CatBoost by using feature selection capabilities of SBS to identify the most relevant features based on accuracy results from CatBoost at each step. The approach was demonstrated in three catchment areas in Alpine terrain—Engelberger Aa, Meienreuss, and Göschenerreuss—using a subset of the inventory compiled by [
22]. The main contributions of this study include the following:
- (a)
An iterative feature selection and validation method, SBS, integrated with CatBoost was proposed for improving AS prediction accuracy.
- (b)
The Shapley additive explanations (SHAP) method [
27,
28] was employed to interpret the model results and analyze the influence of individual input features on model learning.
2. Materials and Methods
In
Figure 1, a schematic overview of the proposed framework is presented. The framework utilizes the CatBoost algorithm, an advanced ensemble method known for its effective handling of categorical data. The feature selection was performed using SBS, aiming to optimize the model by iteratively removing the least significant variables based on variations in model accuracy. The GridSearchCV method [
29] was employed to fine-tune model parameters, ensuring the best possible settings for achieving the highest performance. Precision, recall, specificity, negative predictive value (NPV), accuracy, and F1-score measures were used to validate the results at each iteration, enhancing model reliability and determining significant features. The SHAP values were used to interpret feature importance [
27,
28] as they have been successful in several other natural hazard susceptibility assessment studies (e.g., [
11,
30,
31]).
2.1. Study Area and the Inventory
As shown in
Figure 2, three catchment areas were selected in the Alpine terrain (Engelberger Aa, Meienreuss, and Göschenerreuss). All three catchments are part of the Reuss River basin, located in the Swiss Alps. Each catchment represents unique geomorphological and hydrological characteristics that contribute to the AS within the region. The Engelberger Aa catchment is the largest one (226.5 km
2) among the three, with a complex terrain that includes high-altitude peaks and steep slopes. The Meienreuss (71.4 km
2) and Göschenerreuss (92.8 km
2) catchments are relatively small but still characterized by steep and rugged Alpine landscapes.
The dataset of avalanche inventory polygons was produced by Hafner and Bühler [
22]. The investigation focused on extreme avalanche occurrences within a large avalanche period in January 2018, when Switzerland experienced a series of significant snowfall events that resulted in the first use of the highest avalanche danger level (level 5) since 1999. This inventory, compiled using 1.5 m resolution multispectral SPOT6 satellite imagery, provides a valuable foundation for modeling the AS, enabling comprehensive analyses that capture the spatial patterns of avalanche-prone and non-avalanche pixels across the selected catchments. Although the avalanches were provided as polygons (see purple polygons in
Figure 2), the inventory was rasterized with a 10 m grid interval. Thus, large numbers of avalanche and non-avalanche pixels were obtained in the Engelberger Aa (126,915 avalanche and 2,138,089 non-avalanche), Meienreuss (47,714 avalanche and 667,143 non-avalanche), and Göschenerreuss (44,175 avalanche and 883,484 non-avalanche) catchments. To ensure a balance between the avalanche and non-avalanche samples, the number of the latter one was reduced to match the number of the avalanche pixels by random sampling, resulting in equal numbers of pixels (1:1 ratio between the classes) to avoid class imbalance problems. The dataset was randomly split into training (70%) and testing (30%) datasets for AS prediction using the model.
2.2. Conditioning Factors
As shown in the methodological workflow given in
Figure 1, seventeen key factors were extracted to train the susceptibility model, based on their relevance to avalanche dynamics as determined from the literature and the available geospatial data. A digital elevation model (DEM) with 2 m resolution was obtained from swissALTI3D, provided by the Swiss Federal Office of Topography [
32] (
Figure 3). To increase computational efficiency, the DEM was resampled to 10 m.
Similarly, the land use and land cover (LULC) data sourced from the freely available databases of Swisstopo [
32] were resampled to 10 m (
Figure 4). As shown in the figure, the Engelberger Aa catchment is largely covered with forest and grassland, with several settlements in the region. The Meienreuss and Göschenerreuss have greater altitudes with almost no settlements. Higher altitudes usually have bareland as the LULC type.
Several of the conditioning factors used in the analysis were derived from the DEM using the SAGA GIS software [
33]. Statistical summaries of these numerical factors are provided for each catchment in
Table A1,
Table A2 and
Table A3 in
Appendix A. Altitude affects temperature and precipitation patterns, with higher elevations often receiving more snowfall [
34]. The gravitational impact on snow is determined by the slope; avalanches generally occur on slopes between 30° and 45° [
35]. Aspect affects solar radiation exposure, impacting the melting and refreezing cycles that lead to snow stability. The LULC type, such as forests or bareland, significantly impacts snow stability and avalanche risk. Open areas are more prone to avalanches due to the absence of natural barriers [
5,
36].
Plan curvature affects the lateral accumulation of water and snow on a slope, whereas profile curvature affects their downslope movement. The topographic ruggedness index (TRI) [
37] measures terrain roughness by assessing elevation changes between adjacent cells and computed from the DEM using Equation (1).
where
is the elevation value of each cell,
denotes the mean elevation of all cells within the window, and
is the number of cells in the defined window.
Rugged terrain can create topographic barriers or act as a trigger for avalanches [
2]. The vector ruggedness measure (VRM) is a 3D measure of ruggedness that accounts for slope and aspect variations. VRM affects snowpack stability due to variable grounding and distribution conditions, influencing avalanche potential.
The diurnal anisotropic index (DAH) represents the daytime warming of various slopes caused by solar radiation [
38]. This differential temperature influences the cycles of melting and refreezing, which are crucial for establishing snowpack stability and subsequent avalanche risk. The DAH was assessed using Equation (2) [
39].
where γ is the slope aspect,
denotes the aspect with the maximum total heat excess, and λ represents the slope angle.
The slope length factor (LS-Factor) assesses the combined impact of slope length and steepness [
40]. Longer, steeper slopes, characterized by higher LS-factor values, accumulate more snow, increasing the avalanche risk due to greater snow depth and pressure. The topographic wetness index (TWI) is a measure of the terrain’s exposure to accumulate wetness by combining the local slope and upstream drainage area. High TWI values can indicate zones with higher wetness content, potentially destabilizing the snowpack [
41]. The convergence index (CI) (Equation (3)) quantifies terrain convergence or divergence [
42]. Convergent zones, such as valleys, concentrate snow and water, increasing AS due to higher snow depths.
where
represents the angle between the aspect of the surrounding cell
and the aspect of the center cell
, while n is the total number of surrounding cells considered within the moving square window.
The valley depth (VD) quantifies the relative depth of a valley compared to the surrounding peaks or ridges. Due to the topographic convergence in valleys, substantial snow accumulation often occurs, significantly increasing AS [
43]. Water movement transports sediments deposited along the river, which are subsequently deposited as avalanche debris [
44]. The distance to stream (DTS) influences this sediment transport process, affecting the accumulation of snow avalanche debris.
The wind exposition index (WEI) assesses the exposure of the terrain to wind direction [
6]. The MSP parameter assigns a value of 0 to mid-slope positions, while maximum vertical distances in both valley and crest directions receive a value of 1 [
45]. The TPI proposed by De Reu et al. [
46] describes the relative position of a location with respect to the surrounding topography.
2.3. Categorical Boosting (Catboost)
CatBoost is a gradient boosting algorithm developed by Yandex [
47], designed to handle categorical variables efficiently and robustly. It features an innovative approach to processing categorical features, eliminating the need for manual encoding and reducing the risk of overfitting. CatBoost’s ability to handle categorical variables makes it particularly well suited for datasets with mixed data types, common in AS mapping. The method employs an ordered boosting strategy, effectively mitigating target leakage during model training. This is achieved through a novel mathematical formulation that adjusts feature values based on target statistics, as illustrated in Equation (4):
Equation (4) computes the conditional expectation of the target variable, thereby enhancing the integrity of the data utilized for training decisions. Moreover, CatBoost incorporates permutations within the training process to refine feature values further, adapting to the inherent order of data entries as depicted in Equation (5):
where
β and
P denote the weight and the prior value, respectively.
The method facilitates a robust update mechanism that integrates both prior information and empirical data. These mathematical constructs allow CatBoost to effectively manage high-dimensional categorical data, reduce overfitting, and improve generalization across different datasets.
2.4. Sequential Backward Selection (SBS) Algorithm
SBS is a feature selection algorithm that iteratively removes features from the dataset based on specific criteria, such as the decrease in model performance upon removal. It starts with the entire set of features and progressively eliminates one feature at a time until the minimum number of features achieving high accuracy is reached. This technique helps in reducing dimensionality, improving model interpretability, and potentially enhancing model performance.
With SBS, the model is trained multiple times, each time omitting one of the features. The performance of each model configuration (with one feature removed) is evaluated and compared. The feature whose absence results in the least decrease in model performance (or even potentially cause an increase) is permanently eliminated from the dataset. The elimination process is repeated with the reduced set of features from the previous iteration. Each iteration assesses the impact of removing each feature one at a time and eliminates the least impactful feature based on model performance. The iterative process is continued until a desired number of features is reached or no further improvement in model performance is observed. If multiple subsets achieve similar performance metrics, the smallest feature set is selected to maintain model simplicity and efficiency.
Figure 5 illustrates SBS using the CatBoost algorithm, combined with GridSearchCV, to optimize feature selection for modeling. The process begins with a dataset comprising 17 different features or factors, such as altitude, slope, LULC, among others, which are considered for model training. The entire set of 17 features was used to train the initial CatBoost model, employing GridSearchCV for parameter optimization [
48]. To achieve the highest accuracy, different parameters for learning rate, tree depth, and the number of iterations were tested. Learning rates of 0.2 and 0.3 were used, depending on the number of input features. A tree depth of 9 was consistently applied in all runs, with 250 and 500 iterations identified as the optimal values by GridSearchCV. This step involves systematically searching through different combinations of parameter values to find the most effective settings based on a chosen performance measure.
The training and testing subsets were partitioned using the Python-based scikit-learn package [
48]. The same package was also used for the implementation of SBS, tuning of initial parameters, and the training of classifiers. Performance evaluation was subsequently carried out using the same toolkit. The LULC data, which included classes such as bareland, forest, grassland, settlement, and water bodies, were transformed into new binary features representing the presence or absence of that category with a 1 or 0 with one-hot encoding. The models were tested with feature sets ranging from 17 to 2 features, revealing how the predictive power changes with varying levels of input complexity. To ensure comprehensive model evaluation, performance metrics, including accuracy, precision, and recall (see
Section 2.6 for details), were used alongside a visual inspection of the output AS maps. Here, although the minimum sets for each catchment were computed with SBS, as few as 2 features are also provided in
Appendix B and
Appendix C for interpretation.
2.5. SHAP (Shapley Additive Explanations)
The SHAP value quantifies the contribution of each feature to the predictive accuracy of the model for a particular sample [
27,
28]. The general formula for the model prediction is given in Equation (6):
where
yi is the predicted outcome for the ith sample,
ybase denotes the baseline score of the model (often the mean prediction over the training set), and
f(
xi,
j) represents the SHAP value corresponding to the jth feature’s contribution to the ith sample’s prediction. Essentially, the predicted value for a sample is composed of the baseline model score plus the cumulative contributions from each feature.
The SHAP values provide a detailed interpretation of the model, calculating the individual and collective contributions of features to the predicted output. This enables a comprehensive understanding of the model functioning on both global and local scales. Regarding the AS assessments performed here, the SHAP values helped to interpret the specific contributions of various factors to the likelihood of an avalanche.
2.6. Validation
Here, we assessed the model performances both qualitatively and quantitatively, complemented by 5-fold cross-validation (CV) to analyze model robustness and generalizability. The qualitative assessments are based on visual inspections of the AS maps. Quantitative measures include precision (Equation (7)), recall (Equation (8)), F1-score (Equation (9)), specificity (Equation (10)), NPV (Equation (11)), and accuracy (Equation (12)) given in
Table 2. These values were computed using the numbers of true positive (TP), true negative (TN), false negative (FN), and false positive (FP) classifications. While additional measures such as ROC-AUC (receiver operating characteristic—area under the curve) could also be computed, these six parameters provided—precision, recall, accuracy, specificity, NPV, and F1-score—sufficiently demonstrate the reliability and completeness of the AS predictions for avalanche class. Although a prediction made with CatBoost is a probability value ranging between 0 and 1 at each sample location, a probability above 0.5 was classified as “true” (avalanche), and those below as “false” (no avalanche) for calculating the measures in
Table 2. TPs refer to correctly identified objects, the TNs to correctly omitted ones, the FNs to undetected ones, and FPs to incorrectly identified objects. The values of precision, recall, F1-score, specificity, NPV, and accuracy also range from 0 to 1, where 0 indicates poor classification performance, and 1 indicates perfect classification.
2.7. Multicollinearity Analysis
The variance inflation factor (VIF) and tolerance (TOL) are common multicollinearity analyses employed by researchers to identify and eliminate conditioning factors [
2,
10,
11]. Multicollinearity is identified when the VIF value exceeds 10 or the TOL value falls below 0.1 [
11].
Table 3 presents the VIF and TOL values for numerical conditioning factors across the three catchments (Engelberger Aa, Meienreuss, and Göschenerreuss). The results shown in the table indicate that most factors exhibit acceptable or moderate multicollinearity. Although the slope slightly exceeds the mentioned limits, it has not been removed from the feature set as it is an essential parameter in predicting AS. Thus, all 17 factors were used as input and analyzed with SBS-CatBoost. The results are given in the next section.
3. Results
This study evaluated the performance of the SBS-CatBoost model trained on varying combinations of conditioning factors for AS mapping in the Engelberger Aa, Meienreuss, and Göschenerreuss catchments. The accuracy measures for each catchment are given in
Table 4. The Engelberger Aa catchment, the largest of the three with 226.5 km
2, achieved optimal accuracy with 11 features. In Meienreuss, the optimal accuracy was achieved with seven features, as it is also the smallest one with an area of 71.4 km
2. All AS maps, generated with predicted values ranging from 0 to 1.0, were reclassified into categories of very low, low, moderate, high, and very high susceptibility areas for visual interpretation, using equal intervals of 0.2. These correspond to the intervals of 0–0.2, 0.2–0.4, 0.4–0.6, 0.6–0.8, and 0.8–1.0, respectively. The final accuracy values are ordered similarly based on the catchment size. The results demonstrate a direct relationship between catchment size, environmental complexity, and accuracy. As expected, smaller regions can be modeled with fewer features. The results for each catchment are explained in the following sub-headings. The 95% confidence intervals for the mean CV scores further validate the reliability of these models, as shown in
Table 4 for Engelberger Aa, Meienreuss, and Göschenerreuss. These intervals indicate that the observed high-performance measures are consistent and robust across different CV runs, suggesting that the models are well calibrated and reliable for predicting AS in diverse catchments.
3.1. Engelberger Aa Catchment
The best-performing model utilized 11 features (
Figure 6) and achieved an accuracy of 0.94, with precision and recall for the avalanche class notably high at 0.91 and 0.96, respectively (see
Table 4). This indicated that the model effectively identifies avalanche-prone and non-avalanche areas with a high degree of reliability. The resulting map (
Figure 6a) illustrated regions with high AS, primarily along specific elevations and terrain features. According to the SHAP graph (
Figure 6b), in the Engelberger Aa region, altitude, DTS, and aspect have significant impacts on the model outputs. Higher altitude values are positively correlated with avalanche occurrences, while lower altitude values negatively influence them, as interpreted from the SHAP graph (
Figure 6b). On the other hand, greater distances from streams are associated with lower susceptibility to avalanches, whereas closer proximity (low DTS value) has a positive correlation. Regarding the aspect, 0°, 90°, 180°, and 270° correspond to north, east, south, and west. Thus, it can be interpreted from
Figure 6b that low aspect values (close to north) have a positive influence on avalanche occurrence.
3.2. Meienreuss Catchment
The model, using just seven features, achieved robust performance with an overall accuracy of 0.96. It balances precision (0.94) and recall (0.97) for the avalanche class, indicating a strong predictive power despite the reduced number of features. The resulting map (
Figure 7a) shows highly susceptible areas concentrated in specific regions, likely reflecting the unique local topographical influences.
Figure 7b shows that VD, altitude, and aspect have the greatest impact on AS in this catchment. From the graph, it can be seen that high VD values yield to lower AS. The relationship between the altitude and aspect features and the AS is similar to that observed in the Engelberger Aa catchment.
3.3. Göschenerreuss Catchment
With nine features, this model delivered outstanding performance, achieving an accuracy of 0.97. It demonstrated exceptionally high recall (0.98) for the avalanche class, highlighting its ability to identify AS accurately. The distribution of highly susceptible areas was more widespread (
Figure 8a), suggesting that varied terrain or more comprehensive data coverage influenced the predictions. According to the SHAP values (
Figure 8b), aspect, altitude, and DTS are highly influential, similar to the results of the Engelberger Aa catchment.
4. Discussion
In this study, a comprehensive feature selection and elimination strategy, SBS, was evaluated for AS mapping using the CatBoost algorithm in three neighbouring catchments in the Swiss Alps. As avalanches are complex natural phenomena influenced by numerous environmental and geological conditions, the probability of their spatial occurrences can be controlled by a large number of factors. However, a factor set effective in a region may be less influential in another one. For this reason, this study started from a large feature set with 17 inputs and sequentially eliminated features by analyzing their influence on model performance. The parameter optimization for CatBoost was also performed using an automated algorithm to achieve the best possible accuracy.
Across the three catchments, the model performances were robust, with accuracies ranging from 0.94 to 0.97. These results are notable, considering the complexity and variability of the factors influencing avalanche occurrences. The iterative process of feature elimination using SBS revealed that not all 17 conditional factors were equally important in all catchments, even though they were neighboring. The key factors, such as altitude, aspect, and valley depth, consistently showed a strong influence on model prediction accuracies across all catchments, proving their critical role in avalanche dynamics.
Interestingly, the results also highlighted the lesser relevance of factors like plan and profile curvature, the DAH, and the TPI, which were excluded in the optimized models at earlier stages (see
Appendix B). The LULC features were included in models with larger feature sets (16 features) across all catchments, which can be associated with the weaker relation with the AS. This suggests that while LULC is informative, other factors like altitude, aspect, and distance to stream might carry more predictive weight in the reduced models, particularly when computational efficiency or model simplicity is prioritized.
The initial analysis of multicollinearity indicated that only slope revealed high correlations with the other factors. However, it was not omitted from the initial set as the level of correlation was not excessively high. On the other hand, slope was not included in the final feature set for any of the catchments, likely due to its effect being covered by other topographic features.
On the other hand, although the inventory used here is comprehensive [
22,
24], possibly the most extensive one in the literature, it also suffers from data collection conditions, such as shadows and image resolution [
23,
25], its representativeness for only a large avalanche period, and avalanche boundary uncertainty caused by operator differences [
49]. Therefore, the results must be interpreted accordingly.
5. Conclusions and Future Work
This study evaluated the usability of SBS with the CatBoost model for AS modeling. By starting with a large feature set and applying advanced feature selection, the approach enhanced model accuracy and interpretability while providing insights into avalanche conditioning factors. The proposed feature elimination strategy yielded varying numbers of features across the three catchments, ranging from seven to eleven, based on the specific characteristics and sizes of each catchment. The SHAP values provided a unified measure of feature importance and enhanced interpretability by attributing the contribution of each feature to the model outputs. By incorporating SHAP values into the integrated SBS-CatBoost framework, we gained deeper insights into the relative importance of selected features and their impact on AS mapping. Features such as altitude, aspect, and VD were consistently important across all three catchments, highlighting their general significance in AS mapping. Accuracy values ranged from 0.94 to 0.97, showing a linear relationship with catchment size.
The SBS algorithm can be computationally intensive, especially with a large number of features, as it involves features and retraining the model, which can be time-consuming and resource-intensive. Future work could explore assessing multicollinearity results with or without VIF and TOL in conjunction with SBS.
Despite the strengths of the integrated SBS-CatBoost approach, the study faces limitations related to the inherent challenges of modeling this natural phenomenon. The variability in feature importance across different catchments suggested that regional characteristics can significantly influence model performance and feature selection. Additionally, the reliance on historical data and the assumptions inherent in the modeling process may affect the generalizability of the results to other regions or future scenarios.