Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

A novel landslide identification method for multi-scale and complex background region based on multi-model fusion: YOLO + U-Net

  • Technical Note
  • Published:
Landslides Aims and scope Submit manuscript

Abstract

Comprehensive identification of geological hazard risks remains one of the most important tasks in disaster prevention and mitigation. Currently, remote sensing combined with deep learning methods can gradually achieve the recognition of geological hazard risks. However, landslide images taken by satellites, with complex backgrounds and varying scales, particularly small-scale landslide images, are prone to false detections and omissions. The fundamental reason is that the existing models cannot effectively extract the detailed features of landslide images in multi-scale complex backgrounds. To address these challenges, we selected Luding County, Sichuan Province, China, as the study area and created an open and accurate landslide dataset. Geological hazard research experts interpreted and annotated the samples, which contained 230 landslide images with corresponding labels and geographical coordinates for each landslide. We propose a novel deep learning method for landslide identification, which combines YOLO + U-Net. The specific recognition process is as follows: Firstly, the landslide images to be measured are input into the improved YOLOv4 model for the target detection task, and the landslide detection frames are output to find the approximate location of the landslide using the frames. Then, we put forward an innovative method to expand the landslide detection box, so that it retains certain contextual semantic information, except for the detection box, other backgrounds are filled with black, which can shield a part of irrelevant complex background interference, conducive to further recognition. Finally, the improved U-Net semantic segmentation model is used to perform the semantic segmentation task inside the detection frame, and the accurate landslide boundary segmentation results are obtained. In the experiments, we thoroughly discussed and compared four methods: U-Net, improved U-Net, PSP-Net, and YOLO + U-Net. YOLO + U-Net showed a 20.6% improvement in mean IoU for small-scale landslides compared to U-Net, and a 2.08% improvement for landslides in complex backgrounds, with an average improvement of 9.91%. These results indicate that YOLO + U-Net can effectively extract detailed features of landslide images at different scales, improve the ability to recognize landslides in complex backgrounds, and effectively reduce the problem of false detection and omission of landslide image identification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Data availability

The experimental datasets are accessed from open source data: Bijie landslide dataset (http://gpcv.whu.edu.cn/data/Bijie_pages.html, accessed on 26 June 2022). Luding County landslide dataset download link: https://pan.cdut.edu.cn:443/link/B007C24A04BAC995CC7D782DE0483C8F. High-precision aerial imagery and interpretation dataset of landslide and debris flow disaster in Sichuan and surrounding areas: https://cstr.cn/31253.11.sciencedb.j00001.00222.

References

Download references

Funding

This work was sponsored by the Sichuan Science and Technology Program, China, under Grant 2021YFS0324 and 2021YFG0298, the Opening Fund of Geomathematics Key Laboratory of Sichuan Province under Grant SCSXDZ2020YB04, and the State Key Laboratory of Geohazard Prevention and Geoenvironment Protection Independent Research Project under Grant SKLGP2019Z012.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Honghui Wang.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Appendices

Appendix 1. Focal Loss calculation

Focal Loss is a calculation scheme of loss function. It can control the weight of positive and negative samples, well solve the problem of unbalanced distribution of background pixels and landslide pixels, better fit existing samples in the training process, and fully train the model (Lin et al. 2020). The most classic loss function in classification loss is standard cross-entropy, which can be written as:

$$CE\left( {p,y} \right) = \left\{ {\begin{array}{*{20}c} { - \log \left( p \right), \, y = 1} \\ { - \log \left( {1 - p} \right), \, y \ne 1} \\ \end{array} } \right.$$
(14)

where y = 1 represents a positive sample, p represents the probability of a positive sample, and 1 − p represents the probability of a negative sample. In order to unify the probabilities of positive and negative samples, the Pt function is set:

$$P_{t} = \left\{ {\begin{array}{*{20}c} {p, \, y = 1} \\ {1 - p, \, y \ne 1} \\ \end{array} } \right.$$
(15)

So it can be obtained:

$$CE\left( {p,y} \right) = CE\left( {P_{t} } \right) = - \log \left( {P_{t} } \right)$$
(16)

This loss function is ineffective in dealing with class imbalance problems and can dominate the loss function and make the model ineffective because of the redundancy of a class. In order to solve the class imbalance problem, a common practice is to add weighting factors, i.e., balancing the cross-entropy. Under the premise of a \(\in\) [0, 1], a weight factor \(\alpha\) is added to positive samples and a weight factor \(1 - \alpha\) is added to negative samples. For formal convenience, we use at to express the balanced cross-entropy function:

$$CE\left( {p,y} \right) = - \alpha_{t} \times \log \left( {P_{t} } \right)$$
(17)

When dealing with a large number of negative samples and a small number of positive samples, even if we set the weight of negative samples very low, because there are too many negative samples, the loss function of negative samples will still dominate the loss function. To solve this problem, it is necessary to reduce the proportion of sample loss with high confidence in the total loss, by adding a weight factor (1 − Pt)γ before the standard cross-entropy, thus obtaining the calculation formula for the Focal Loss function:

$$FL\left({P}_{t}\right)=-{ (1-{P}_{t})}^{\gamma }\times \mathrm{log }({P}_{t})$$
(18)

When γ = 0, it is the standard cross-entropy function, and when γ > 0, the loss of samples with high classification confidence can be suppressed, so that the loss function focuses on these hard to distinguish samples. Focal Loss is a kind of loss function to deal with the imbalance of sample classification. It focuses on the point of adding weights to the losses corresponding to the samples based on the ease with which they can be distinguished. Smaller weights \(\alpha_{1}\) are added to easily distinguishable samples and larger weights a2 are added to difficult to distinguish samples. Usually the samples with classification confidence close to 1 or close to 0 are called easy to discriminate samples, and the rest are called hard to discriminate samples. Then, the Focal Loss function can be expressed as such an equation:

$$L_{sum} = \alpha_{1} \times L_{e} + \alpha_{2} \times L_{d}$$
(19)

where Le represents easy to distinguish samples, and Ld represents difficult to distinguish samples. Since a1 < a2, the loss function focuses on the samples that are difficult to distinguish. So it can be obtained:

$$FL\left( {P_{t} } \right) = - \alpha_{t} \times \left( {1 - P_{t} } \right)^{\gamma } \times \log \left( {P_{t} } \right)$$
(20)

Appendix 2. Identity Block and Conv Block modules

The Identity Block has a direct structure from input to output, and the output matrices of the two branches have the same dimensions, which can be used to deepen the network. On the other hand, Conv Block has different input and output dimensions, so it cannot be concatenated continuously; its role is to change the dimensions of the network.

As shown in Fig. 11, the figure on the left focuses on the residual structure used for networks with a small number of network layers, while the figure on the right shows the residual structure used for networks with a large number of layers. The main line of the Identity Block structure takes the input feature matrix and obtains the output through two 3 × 3 convolutional layers. The shortcut path is a structure that goes directly from input to output. Since the output matrices of the two branches have the same dimensions, they can be added directly, and the result goes through an activation function for output. On the right, the Conv Block structure differs from the Identity Block structure by adding a 1 × 1 convolutional layer to the input and output. As shown in Fig. 10, the depth of the input matrix is 256. After the first layer of convolutional operations, the dimensions of the input matrix in terms of length and width remain unchanged, but the number of channels decreases from 256 to 64. By the third layer, the number of channels has returned to 256, making the output dimensions the same as the input. The outputs are then added and passed through an activation function to produce the final result.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, H., Liu, J., Zeng, S. et al. A novel landslide identification method for multi-scale and complex background region based on multi-model fusion: YOLO + U-Net. Landslides 21, 901–917 (2024). https://doi.org/10.1007/s10346-023-02184-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10346-023-02184-7

Keywords