Abstract
Comprehensive identification of geological hazard risks remains one of the most important tasks in disaster prevention and mitigation. Currently, remote sensing combined with deep learning methods can gradually achieve the recognition of geological hazard risks. However, landslide images taken by satellites, with complex backgrounds and varying scales, particularly small-scale landslide images, are prone to false detections and omissions. The fundamental reason is that the existing models cannot effectively extract the detailed features of landslide images in multi-scale complex backgrounds. To address these challenges, we selected Luding County, Sichuan Province, China, as the study area and created an open and accurate landslide dataset. Geological hazard research experts interpreted and annotated the samples, which contained 230 landslide images with corresponding labels and geographical coordinates for each landslide. We propose a novel deep learning method for landslide identification, which combines YOLO + U-Net. The specific recognition process is as follows: Firstly, the landslide images to be measured are input into the improved YOLOv4 model for the target detection task, and the landslide detection frames are output to find the approximate location of the landslide using the frames. Then, we put forward an innovative method to expand the landslide detection box, so that it retains certain contextual semantic information, except for the detection box, other backgrounds are filled with black, which can shield a part of irrelevant complex background interference, conducive to further recognition. Finally, the improved U-Net semantic segmentation model is used to perform the semantic segmentation task inside the detection frame, and the accurate landslide boundary segmentation results are obtained. In the experiments, we thoroughly discussed and compared four methods: U-Net, improved U-Net, PSP-Net, and YOLO + U-Net. YOLO + U-Net showed a 20.6% improvement in mean IoU for small-scale landslides compared to U-Net, and a 2.08% improvement for landslides in complex backgrounds, with an average improvement of 9.91%. These results indicate that YOLO + U-Net can effectively extract detailed features of landslide images at different scales, improve the ability to recognize landslides in complex backgrounds, and effectively reduce the problem of false detection and omission of landslide image identification.
Data availability
The experimental datasets are accessed from open source data: Bijie landslide dataset (http://gpcv.whu.edu.cn/data/Bijie_pages.html, accessed on 26 June 2022). Luding County landslide dataset download link: https://pan.cdut.edu.cn:443/link/B007C24A04BAC995CC7D782DE0483C8F. High-precision aerial imagery and interpretation dataset of landslide and debris flow disaster in Sichuan and surrounding areas: https://cstr.cn/31253.11.sciencedb.j00001.00222.
References
Ajaz A, Salar A, Jamal T, Khan AU (2022) Small object detection using deep learning. ArXiv abs/2201.0. https://doi.org/10.48550/arXiv.2201.03243
Can R, Kocaman S, Gokceoglu C (2019) A convolutional neural network architecture for auto-detection of landslide photographs to assess citizen science and volunteered geographic information data quality. ISPRS Int J Geo-Information 8:300. https://doi.org/10.3390/ijgi8070300
Chao Z, Zhenyu C, Fenghuan S et al (2021) High-precision aerial imagery and interpretation dataset of landslide and debris flow disaster in Sichuan and surrounding areas. https://doi.org/10.11922/sciencedb.j00001.00222
Ghorbanzadeh O, Blaschke T, Gholamnia K et al (2019) Evaluation of different machine learning methods and deep-learning convolutional neural networks for landslide detection. Remote Sens 11:196. https://doi.org/10.3390/rs11020196
Ghorbanzadeh O, Xu Y, Ghamisi P et al (2022) Landslide4Sense: reference benchmark data and deep learning models for landslide detection. IEEE Trans Geosci Remote Sens 60:1–17. https://doi.org/10.1109/TGRS.2022.3215209
Han Z, Fang Z, Li Y, Fu B (2023) A novel Dynahead-Yolo neural network for the detection of landslides with variable proportions using remote sensing images. Front Earth Sci 10. https://doi.org/10.3389/feart.2022.1077153
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit 770–778. https://doi.org/10.1109/CVPR.2016.90
Hu Q, Zhou Y, Wang S et al (2019) Improving the accuracy of landslide detection in “off-site” area by machine learning model portability comparison: a case study of Jiuzhaigou earthquake, China. Remote Sens 11. https://doi.org/10.3390/rs11212530
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. Proc - 30th IEEE Conf Comput Vis Pattern Recognition, CVPR 2017 2261–2269. https://doi.org/10.1109/CVPR.2017.243
Ji S, Yu D, Shen C et al (2020) Landslide detection from an open satellite imagery and digital elevation model dataset using attention boosted convolutional neural networks. Landslides 17:1337–1352. https://doi.org/10.1007/s10346-020-01353-2
Ju Y, Xu Q, Jin S et al (2020) Automatic object detection of loess landslide based on deep learning. Wuhan Daxue Xuebao (Xinxi Kexue Ban)/Geomatics Inf Sci Wuhan Univ 45:1747–1755. https://doi.org/10.13203/j.whugis20200132
Krishna H, Jawahar CV (2018) Improving small object detection. In: Proceedings - 4th Asian conference on pattern recognition, ACPR 2017. pp 346–351
Lin TY, Goyal P, Girshick R et al (2020) Focal loss for dense object detection. IEEE Trans Pattern Anal Mach Intell 42:318–327. https://doi.org/10.1109/TPAMI.2018.2858826
Mo P, Li D, Liu M et al (2023) A lightweight and partitioned CNN algorithm for multi-landslide detection in remote sensing images. Appl Sci 13. https://doi.org/10.3390/app13158583
Poudel RPK, Bonde U, Liwicki S, Zach C (2019) ContextNet: exploring context and detail for semantic segmentation in real-time. Br Mach Vis Conf 2018, BMVC 2018. https://doi.org/10.48550/arXiv.1805.04554
Pradhan B, Al-Najjar HAH, Sameen MI et al (2020) Landslide detection using a saliency feature enhancement technique from LiDAR-derived DEM and orthophotos. IEEE Access 8:121942–121954. https://doi.org/10.1109/ACCESS.2020.3006914
Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. Lect Notes Comput Sci (including Subser Lect Notes Artif Intell Lect Notes Bioinformatics) 9351:234–241. https://doi.org/10.1007/978-3-319-24574-4_28
Wang H, Zhang L, Yin K et al (2021) Landslide identification using machine learning. Geosci Front 12:351–364. https://doi.org/10.1016/j.gsf.2020.02.012
Woo S, Park J, Lee JY, Kweon IS (2018) CBAM: convolutional block attention module. ArXiv 11211 LNCS:3–19. https://doi.org/10.1007/978-3-030-01234-2_1
Xu Q (2020) Understanding and consideration of related issues in early identification of potential geohazards. Wuhan Daxue Xuebao (Xinxi Kexue Ban)/Geomatics Inf Sci Wuhan Univ 45:1651–1659. https://doi.org/10.13203/j.whugis20200043
Yu B, Xu C, Chen F et al (2022) HADeenNet: a hierarchical-attention multi-scale deconvolution network for landslide detection. Int J Appl Earth Obs Geoinf 111:102853. https://doi.org/10.1016/j.jag.2022.102853
Zhang Y, Fu Y, Sun Y et al (2021) Landslide detection from high-resolution remote sensing image using deep neural network. Highway 66:188–194
Zhao H, Shi J, Qi X et al (2017) Pyramid scene parsing network. Proc - 30th IEEE Conf Comput Vis Pattern Recognition, CVPR 2017 6230–6239. https://doi.org/10.1109/CVPR.2017.660
Zhou Y, Wang H, Yang R et al (2022) A novel weakly supervised remote sensing landslide semantic segmentation method: combining CAM and cycleGAN algorithms. Remote Sens 14. https://doi.org/10.3390/rs14153650
Funding
This work was sponsored by the Sichuan Science and Technology Program, China, under Grant 2021YFS0324 and 2021YFG0298, the Opening Fund of Geomathematics Key Laboratory of Sichuan Province under Grant SCSXDZ2020YB04, and the State Key Laboratory of Geohazard Prevention and Geoenvironment Protection Independent Research Project under Grant SKLGP2019Z012.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Appendices
Appendix 1. Focal Loss calculation
Focal Loss is a calculation scheme of loss function. It can control the weight of positive and negative samples, well solve the problem of unbalanced distribution of background pixels and landslide pixels, better fit existing samples in the training process, and fully train the model (Lin et al. 2020). The most classic loss function in classification loss is standard cross-entropy, which can be written as:
where y = 1 represents a positive sample, p represents the probability of a positive sample, and 1 − p represents the probability of a negative sample. In order to unify the probabilities of positive and negative samples, the Pt function is set:
So it can be obtained:
This loss function is ineffective in dealing with class imbalance problems and can dominate the loss function and make the model ineffective because of the redundancy of a class. In order to solve the class imbalance problem, a common practice is to add weighting factors, i.e., balancing the cross-entropy. Under the premise of a \(\in\) [0, 1], a weight factor \(\alpha\) is added to positive samples and a weight factor \(1 - \alpha\) is added to negative samples. For formal convenience, we use at to express the balanced cross-entropy function:
When dealing with a large number of negative samples and a small number of positive samples, even if we set the weight of negative samples very low, because there are too many negative samples, the loss function of negative samples will still dominate the loss function. To solve this problem, it is necessary to reduce the proportion of sample loss with high confidence in the total loss, by adding a weight factor (1 − Pt)γ before the standard cross-entropy, thus obtaining the calculation formula for the Focal Loss function:
When γ = 0, it is the standard cross-entropy function, and when γ > 0, the loss of samples with high classification confidence can be suppressed, so that the loss function focuses on these hard to distinguish samples. Focal Loss is a kind of loss function to deal with the imbalance of sample classification. It focuses on the point of adding weights to the losses corresponding to the samples based on the ease with which they can be distinguished. Smaller weights \(\alpha_{1}\) are added to easily distinguishable samples and larger weights a2 are added to difficult to distinguish samples. Usually the samples with classification confidence close to 1 or close to 0 are called easy to discriminate samples, and the rest are called hard to discriminate samples. Then, the Focal Loss function can be expressed as such an equation:
where Le represents easy to distinguish samples, and Ld represents difficult to distinguish samples. Since a1 < a2, the loss function focuses on the samples that are difficult to distinguish. So it can be obtained:
Appendix 2. Identity Block and Conv Block modules
The Identity Block has a direct structure from input to output, and the output matrices of the two branches have the same dimensions, which can be used to deepen the network. On the other hand, Conv Block has different input and output dimensions, so it cannot be concatenated continuously; its role is to change the dimensions of the network.
As shown in Fig. 11, the figure on the left focuses on the residual structure used for networks with a small number of network layers, while the figure on the right shows the residual structure used for networks with a large number of layers. The main line of the Identity Block structure takes the input feature matrix and obtains the output through two 3 × 3 convolutional layers. The shortcut path is a structure that goes directly from input to output. Since the output matrices of the two branches have the same dimensions, they can be added directly, and the result goes through an activation function for output. On the right, the Conv Block structure differs from the Identity Block structure by adding a 1 × 1 convolutional layer to the input and output. As shown in Fig. 10, the depth of the input matrix is 256. After the first layer of convolutional operations, the dimensions of the input matrix in terms of length and width remain unchanged, but the number of channels decreases from 256 to 64. By the third layer, the number of channels has returned to 256, making the output dimensions the same as the input. The outputs are then added and passed through an activation function to produce the final result.
Rights and permissions
About this article
Cite this article
Wang, H., Liu, J., Zeng, S. et al. A novel landslide identification method for multi-scale and complex background region based on multi-model fusion: YOLO + U-Net. Landslides 21, 901–917 (2024). https://doi.org/10.1007/s10346-023-02184-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10346-023-02184-7