A novel landslide identification method for multi-scale and complex background region based on multi-model fusion: YOLO + U-Net

Wang, Honghui; Liu, Jie; Zeng, Shangkun; Xiao, Kaiwen; Yang, Dongying; Yao, Guangle; Yang, Ronghao

doi:10.1007/s10346-023-02184-7

A novel landslide identification method for multi-scale and complex background region based on multi-model fusion: YOLO + U-Net

Technical Note
Published: 15 December 2023

Volume 21, pages 901–917, (2024)
Cite this article

Landslides Aims and scope Submit manuscript

Honghui Wang ORCID: orcid.org/0000-0001-9257-2820^1,2,
Jie Liu²,
Shangkun Zeng³,
Kaiwen Xiao²,
Dongying Yang⁴,
Guangle Yao⁴ &
…
Ronghao Yang⁵

1293 Accesses
Explore all metrics

Abstract

Comprehensive identification of geological hazard risks remains one of the most important tasks in disaster prevention and mitigation. Currently, remote sensing combined with deep learning methods can gradually achieve the recognition of geological hazard risks. However, landslide images taken by satellites, with complex backgrounds and varying scales, particularly small-scale landslide images, are prone to false detections and omissions. The fundamental reason is that the existing models cannot effectively extract the detailed features of landslide images in multi-scale complex backgrounds. To address these challenges, we selected Luding County, Sichuan Province, China, as the study area and created an open and accurate landslide dataset. Geological hazard research experts interpreted and annotated the samples, which contained 230 landslide images with corresponding labels and geographical coordinates for each landslide. We propose a novel deep learning method for landslide identification, which combines YOLO + U-Net. The specific recognition process is as follows: Firstly, the landslide images to be measured are input into the improved YOLOv4 model for the target detection task, and the landslide detection frames are output to find the approximate location of the landslide using the frames. Then, we put forward an innovative method to expand the landslide detection box, so that it retains certain contextual semantic information, except for the detection box, other backgrounds are filled with black, which can shield a part of irrelevant complex background interference, conducive to further recognition. Finally, the improved U-Net semantic segmentation model is used to perform the semantic segmentation task inside the detection frame, and the accurate landslide boundary segmentation results are obtained. In the experiments, we thoroughly discussed and compared four methods: U-Net, improved U-Net, PSP-Net, and YOLO + U-Net. YOLO + U-Net showed a 20.6% improvement in mean IoU for small-scale landslides compared to U-Net, and a 2.08% improvement for landslides in complex backgrounds, with an average improvement of 9.91%. These results indicate that YOLO + U-Net can effectively extract detailed features of landslide images at different scales, improve the ability to recognize landslides in complex backgrounds, and effectively reduce the problem of false detection and omission of landslide image identification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 5

Fig. 7

Fig. 8

Data availability

The experimental datasets are accessed from open source data: Bijie landslide dataset (http://gpcv.whu.edu.cn/data/Bijie_pages.html, accessed on 26 June 2022). Luding County landslide dataset download link: https://pan.cdut.edu.cn:443/link/B007C24A04BAC995CC7D782DE0483C8F. High-precision aerial imagery and interpretation dataset of landslide and debris flow disaster in Sichuan and surrounding areas: https://cstr.cn/31253.11.sciencedb.j00001.00222.

References

Ajaz A, Salar A, Jamal T, Khan AU (2022) Small object detection using deep learning. ArXiv abs/2201.0. https://doi.org/10.48550/arXiv.2201.03243
Can R, Kocaman S, Gokceoglu C (2019) A convolutional neural network architecture for auto-detection of landslide photographs to assess citizen science and volunteered geographic information data quality. ISPRS Int J Geo-Information 8:300. https://doi.org/10.3390/ijgi8070300
Article ADS Google Scholar
Chao Z, Zhenyu C, Fenghuan S et al (2021) High-precision aerial imagery and interpretation dataset of landslide and debris flow disaster in Sichuan and surrounding areas. https://doi.org/10.11922/sciencedb.j00001.00222
Ghorbanzadeh O, Blaschke T, Gholamnia K et al (2019) Evaluation of different machine learning methods and deep-learning convolutional neural networks for landslide detection. Remote Sens 11:196. https://doi.org/10.3390/rs11020196
Article ADS Google Scholar
Ghorbanzadeh O, Xu Y, Ghamisi P et al (2022) Landslide4Sense: reference benchmark data and deep learning models for landslide detection. IEEE Trans Geosci Remote Sens 60:1–17. https://doi.org/10.1109/TGRS.2022.3215209
Article Google Scholar
Han Z, Fang Z, Li Y, Fu B (2023) A novel Dynahead-Yolo neural network for the detection of landslides with variable proportions using remote sensing images. Front Earth Sci 10. https://doi.org/10.3389/feart.2022.1077153
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit 770–778. https://doi.org/10.1109/CVPR.2016.90
Hu Q, Zhou Y, Wang S et al (2019) Improving the accuracy of landslide detection in “off-site” area by machine learning model portability comparison: a case study of Jiuzhaigou earthquake, China. Remote Sens 11. https://doi.org/10.3390/rs11212530
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. Proc - 30th IEEE Conf Comput Vis Pattern Recognition, CVPR 2017 2261–2269. https://doi.org/10.1109/CVPR.2017.243
Ji S, Yu D, Shen C et al (2020) Landslide detection from an open satellite imagery and digital elevation model dataset using attention boosted convolutional neural networks. Landslides 17:1337–1352. https://doi.org/10.1007/s10346-020-01353-2
Article Google Scholar
Ju Y, Xu Q, Jin S et al (2020) Automatic object detection of loess landslide based on deep learning. Wuhan Daxue Xuebao (Xinxi Kexue Ban)/Geomatics Inf Sci Wuhan Univ 45:1747–1755. https://doi.org/10.13203/j.whugis20200132
Krishna H, Jawahar CV (2018) Improving small object detection. In: Proceedings - 4th Asian conference on pattern recognition, ACPR 2017. pp 346–351
Lin TY, Goyal P, Girshick R et al (2020) Focal loss for dense object detection. IEEE Trans Pattern Anal Mach Intell 42:318–327. https://doi.org/10.1109/TPAMI.2018.2858826
Article PubMed Google Scholar
Mo P, Li D, Liu M et al (2023) A lightweight and partitioned CNN algorithm for multi-landslide detection in remote sensing images. Appl Sci 13. https://doi.org/10.3390/app13158583
Poudel RPK, Bonde U, Liwicki S, Zach C (2019) ContextNet: exploring context and detail for semantic segmentation in real-time. Br Mach Vis Conf 2018, BMVC 2018. https://doi.org/10.48550/arXiv.1805.04554
Pradhan B, Al-Najjar HAH, Sameen MI et al (2020) Landslide detection using a saliency feature enhancement technique from LiDAR-derived DEM and orthophotos. IEEE Access 8:121942–121954. https://doi.org/10.1109/ACCESS.2020.3006914
Article Google Scholar
Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. Lect Notes Comput Sci (including Subser Lect Notes Artif Intell Lect Notes Bioinformatics) 9351:234–241. https://doi.org/10.1007/978-3-319-24574-4_28
Article Google Scholar
Wang H, Zhang L, Yin K et al (2021) Landslide identification using machine learning. Geosci Front 12:351–364. https://doi.org/10.1016/j.gsf.2020.02.012
Article Google Scholar
Woo S, Park J, Lee JY, Kweon IS (2018) CBAM: convolutional block attention module. ArXiv 11211 LNCS:3–19. https://doi.org/10.1007/978-3-030-01234-2_1
Xu Q (2020) Understanding and consideration of related issues in early identification of potential geohazards. Wuhan Daxue Xuebao (Xinxi Kexue Ban)/Geomatics Inf Sci Wuhan Univ 45:1651–1659. https://doi.org/10.13203/j.whugis20200043
Yu B, Xu C, Chen F et al (2022) HADeenNet: a hierarchical-attention multi-scale deconvolution network for landslide detection. Int J Appl Earth Obs Geoinf 111:102853. https://doi.org/10.1016/j.jag.2022.102853
Article Google Scholar
Zhang Y, Fu Y, Sun Y et al (2021) Landslide detection from high-resolution remote sensing image using deep neural network. Highway 66:188–194
Google Scholar
Zhao H, Shi J, Qi X et al (2017) Pyramid scene parsing network. Proc - 30th IEEE Conf Comput Vis Pattern Recognition, CVPR 2017 6230–6239. https://doi.org/10.1109/CVPR.2017.660
Zhou Y, Wang H, Yang R et al (2022) A novel weakly supervised remote sensing landslide semantic segmentation method: combining CAM and cycleGAN algorithms. Remote Sens 14. https://doi.org/10.3390/rs14153650

Download references

Funding

This work was sponsored by the Sichuan Science and Technology Program, China, under Grant 2021YFS0324 and 2021YFG0298, the Opening Fund of Geomathematics Key Laboratory of Sichuan Province under Grant SCSXDZ2020YB04, and the State Key Laboratory of Geohazard Prevention and Geoenvironment Protection Independent Research Project under Grant SKLGP2019Z012.

Author information

Authors and Affiliations

State Key Laboratory of Geohazard Prevention and Geoenvironment Protection, Chengdu University of Technology, Chengdu, 610059, People’s Republic of China
Honghui Wang
Sichuan Engineering Technology Research Center for Industrial Internet Intelligent Monitoring and Application, Chengdu University of Technology, Chengdu, 610059, People’s Republic of China
Honghui Wang, Jie Liu & Kaiwen Xiao
Key Laboratory of Earth Exploration and Information Technology of Ministry of Education, Chengdu University of Technology, Chengdu, 610059, People’s Republic of China
Shangkun Zeng
Department of Artificial Intelligence, College of Computer Science and Cyber Security, Chengdu University of Technology, Chengdu, 610059, People’s Republic of China
Dongying Yang & Guangle Yao
College of Earth Sciences, Chengdu University of Technology, Chengdu, 610059, People’s Republic of China
Ronghao Yang

Authors

Honghui Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jie Liu
View author publications
You can also search for this author in PubMed Google Scholar
Shangkun Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Kaiwen Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Dongying Yang
View author publications
You can also search for this author in PubMed Google Scholar
Guangle Yao
View author publications
You can also search for this author in PubMed Google Scholar
Ronghao Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Honghui Wang.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Appendices

Appendix 1. Focal Loss calculation

Focal Loss is a calculation scheme of loss function. It can control the weight of positive and negative samples, well solve the problem of unbalanced distribution of background pixels and landslide pixels, better fit existing samples in the training process, and fully train the model (Lin et al. 2020). The most classic loss function in classification loss is standard cross-entropy, which can be written as:

$$CE\left( {p,y} \right) = \left\{ {\begin{array}{*{20}c} { - \log \left( p \right), \, y = 1} \\ { - \log \left( {1 - p} \right), \, y \ne 1} \\ \end{array} } \right.$$

(14)

where y = 1 represents a positive sample, p represents the probability of a positive sample, and 1 − p represents the probability of a negative sample. In order to unify the probabilities of positive and negative samples, the P_t function is set:

$$P_{t} = \left\{ {\begin{array}{*{20}c} {p, \, y = 1} \\ {1 - p, \, y \ne 1} \\ \end{array} } \right.$$

(15)

So it can be obtained:

$$CE\left( {p,y} \right) = CE\left( {P_{t} } \right) = - \log \left( {P_{t} } \right)$$

(16)

This loss function is ineffective in dealing with class imbalance problems and can dominate the loss function and make the model ineffective because of the redundancy of a class. In order to solve the class imbalance problem, a common practice is to add weighting factors, i.e., balancing the cross-entropy. Under the premise of a $\in$ [0, 1], a weight factor $\alpha$ is added to positive samples and a weight factor $1 - \alpha$ is added to negative samples. For formal convenience, we use a_t to express the balanced cross-entropy function:

$$CE\left( {p,y} \right) = - \alpha_{t} \times \log \left( {P_{t} } \right)$$

(17)

When dealing with a large number of negative samples and a small number of positive samples, even if we set the weight of negative samples very low, because there are too many negative samples, the loss function of negative samples will still dominate the loss function. To solve this problem, it is necessary to reduce the proportion of sample loss with high confidence in the total loss, by adding a weight factor (1 − P_t)^γ before the standard cross-entropy, thus obtaining the calculation formula for the Focal Loss function:

$$FL\left({P}_{t}\right)=-{ (1-{P}_{t})}^{\gamma }\times \mathrm{log }({P}_{t})$$

(18)

When γ = 0, it is the standard cross-entropy function, and when γ > 0, the loss of samples with high classification confidence can be suppressed, so that the loss function focuses on these hard to distinguish samples. Focal Loss is a kind of loss function to deal with the imbalance of sample classification. It focuses on the point of adding weights to the losses corresponding to the samples based on the ease with which they can be distinguished. Smaller weights $\alpha_{1}$ are added to easily distinguishable samples and larger weights a₂ are added to difficult to distinguish samples. Usually the samples with classification confidence close to 1 or close to 0 are called easy to discriminate samples, and the rest are called hard to discriminate samples. Then, the Focal Loss function can be expressed as such an equation:

$$L_{sum} = \alpha_{1} \times L_{e} + \alpha_{2} \times L_{d}$$

(19)

where L_e represents easy to distinguish samples, and L_d represents difficult to distinguish samples. Since a₁ < a₂, the loss function focuses on the samples that are difficult to distinguish. So it can be obtained:

$$FL\left( {P_{t} } \right) = - \alpha_{t} \times \left( {1 - P_{t} } \right)^{\gamma } \times \log \left( {P_{t} } \right)$$

(20)

Appendix 2. Identity Block and Conv Block modules

The Identity Block has a direct structure from input to output, and the output matrices of the two branches have the same dimensions, which can be used to deepen the network. On the other hand, Conv Block has different input and output dimensions, so it cannot be concatenated continuously; its role is to change the dimensions of the network.

As shown in Fig. 11, the figure on the left focuses on the residual structure used for networks with a small number of network layers, while the figure on the right shows the residual structure used for networks with a large number of layers. The main line of the Identity Block structure takes the input feature matrix and obtains the output through two 3 × 3 convolutional layers. The shortcut path is a structure that goes directly from input to output. Since the output matrices of the two branches have the same dimensions, they can be added directly, and the result goes through an activation function for output. On the right, the Conv Block structure differs from the Identity Block structure by adding a 1 × 1 convolutional layer to the input and output. As shown in Fig. 10, the depth of the input matrix is 256. After the first layer of convolutional operations, the dimensions of the input matrix in terms of length and width remain unchanged, but the number of channels decreases from 256 to 64. By the third layer, the number of channels has returned to 256, making the output dimensions the same as the input. The outputs are then added and passed through an activation function to produce the final result.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, H., Liu, J., Zeng, S. et al. A novel landslide identification method for multi-scale and complex background region based on multi-model fusion: YOLO + U-Net. Landslides 21, 901–917 (2024). https://doi.org/10.1007/s10346-023-02184-7

Download citation

Received: 20 June 2023
Accepted: 22 November 2023
Published: 15 December 2023
Issue Date: April 2024
DOI: https://doi.org/10.1007/s10346-023-02184-7

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A novel landslide identification method for multi-scale and complex background region based on multi-model fusion: YOLO + U-Net

Abstract

Access this article

Subscribe and save

Buy Now

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Appendices

Appendix 1. Focal Loss calculation

Appendix 2. Identity Block and Conv Block modules

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation