1. Introduction
Earthquakes are one of the most harmful types of natural disasters in the world. Approximately five million earthquakes occur every year worldwide, of which about a dozen or twenty have caused serious harm to humanity, resulting in incalculable environmental damage and loss of life and wealth. Take the 2014 magnitude 6.5 Ludian earthquake as an example: it caused a death toll of 617, triggered at least 1024 landslides with areas equal to 100 m
2 or larger and tens of thousands of collapsed buildings [
1,
2]. The quick and accurate collection of damage information in earthquake-stricken areas is of substantial significance for the timely rescue of trapped people and postearthquake reconstruction [
3,
4]. In seismic emergency rescue work, the most traditional method is onsite investigation by relevant experts [
5,
6]; however, the workload is extremely large, and the efficiency is low due to the large extent and variety of disaster areas [
7]. It is difficult to reach the disaster sites in time if investigators encounter landslides or clogged scenes. Due to low efficiency and uncertainty, it is not currently possible to satisfy the application requirements of rapid assessment and postearthquake rescue.
However, with the rapid development of technologies such as satellite remote sensing and unmanned aerial vehicles, the ability to acquire real-time information on the Earth’s surface has improved [
8,
9]. Remote sensing images can be acquired quickly and can reflect the objective world comprehensively and intuitively, and they provide a new information source for the rapid recognition and assessment of earthquake damage [
10,
11]. There is a lot of research about disaster risk assessment based on remote sensing images. Jelének et al. [
12] synergically used Sentinel-1 radar images and Sentinel-2 optical data to analyze postearthquake surface changes and took the 2016 magnitude 7.8 Kaikoura earthquake in New Zealand as an example. They used radar interferometry to assess earthquake impacts via computing vertical displacements and differential interferograms. Olen et al. [
13] proposed a new method for Potentially Affected Area (PAA) detection following a natural hazard event based on Sentinel-1 C-band radar data. The proposed method is based on the coherence time series, which determines the natural variability of coherence within each pixel in the region of interest and where statistically significant coherence loss has occurred by comparing pixel-by-pixel syn-event coherence to temporal coherence distributions. They verified the performance of the method in finding PPA with the case of the 2017 Iran–Iraq earthquake and a landslide-prone region of NW Argentina. Mondini et al. [
14] proposed that using Sentinel-1 SAR C-band images could well solve the problem of lack of pre and postlandslide optical images due to cloud persistence. They analyzed 32 global landslide cases, and results showed that changes caused by landslides on SAR amplitudes were unambiguous in about 84% of cases. Expert visual interpretation methods that fully utilize high-resolution images have become mainstream in the field of postearthquake assessment, rescue and reconstruction [
15,
16,
17]. However, these methods suffer from inefficiency and high costs in terms of expert resources. Moreover, the interpretation of results differs substantially across experts [
18,
19].
In recent years, the development of machine learning has helped to overcome some of these limitations by promoting the use of computer image recognition and processing [
20,
21]. Furthermore, with the development of technologies such as GPU and artificial intelligence, image recognition via deep learning methods has become more efficient and accurate [
22,
23], which enables the use of deep learning to realize postearthquake scene recognition. The core steps of image recognition are typically feature extraction and classification. In the early days, image recognition mainly used traditional manual feature extraction methods, such as Scale-invariant Feature Transform (SIFT) [
24], Histogram of Gradient (HOG) [
25] and Deformable Parts Model (DPM) [
26], in combination with classifiers such as Support Vector Machine (SVM) [
27] and random forest [
28]. Since Hinton [
29] proposed a solution to the problem of gradient disappearance in deep network training, deep learning entered a period of substantial development. After Convolutional Neural Networks (CNNs) [
30] were proposed in 2012, deep learning was developed explosively; CNN has been fully developed and has been applied to many research fields. There are two typical types of deep learning for image recognition: methods that are based on region proposal, such as RCNN [
31], FAST-RCNN [
32], FASTER-RCNN [
33] and R-FCN [
34] and methods that are based on regression, such as You Only Look Once (YOLO) [
35] and Single Shot MultiBox Detector (SSD) [
36]. The methods of the second type are faster but less accurate than those of the first type, because they generate bounding boxes in a single net. Compared to YOLO, the SSD method not only improves the speed but also improves the recognition accuracy, which is comparable to the RCNN series [
36]. Therefore, the SSD method is adopted in our model.
Recently, many researchers have applied deep learning methods to disaster scene recognition. Ding et al. [
37] considered a Google postearthquake image with a spatial resolution of 0.3 m in Ludian county, Yunnan province of China as an example and used a pretrained AlexNet deep convolution neural network model for feature extraction, in combination with a SVM classifier, to realize postearthquake scene recognition. Sun et al. [
38] proposed a convolutional neural network that was combined with multiscale segmentation (CMSCNN) for high-resolution seismic image classification, which realized improved accuracy. Xu et al. [
39] developed a Dense Feature Pyramid model with an encoder–decoder network (DFPENet) for coseismic landslide recognition, and the experimental results demonstrated its high-precision, high-efficiency and cross-scene recognition of earthquake disasters. Ji et al. [
40] proposed a CNN feature with the random forest method; compared with CNN, this method improves the accuracy of postearthquake collapse identification and the feature extraction performance of CNN is better than that of texture feature extraction. Song et al. [
41] proposed a method that used the Deeplab v2 neural network for the initial identification of damaged building areas and applied the simple linear iterative cluster (SLIC) method to accurately extract the area boundaries of the earthquake-damaged buildings. Finally, a mathematical morphological method was introduced for eliminating the background noise in this paper.
The methods that are discussed above yielded substantial results in the field of postearthquake scene recognition through the optimization of network structure and integration with other algorithms. However, these methods rarely consider the lack of data and may struggle to perform well with insufficient data. Especially in postearthquake remote sensing image recognition, a substantial obstacle is the lack of labeled samples. Therefore, it is important to establish a postearthquake scene recognition model that can perform well with only a small amount of data.
In this paper, a postearthquake multiple scene recognition (PEMSR) model based on the classical SSD detection method and transfer learning is proposed. The model attempts to collect postearthquake scenes images and to label them manually for the construction of a dataset. To eliminate the negative influence of an insufficient dataset, data augmentation and transfer learning [
42] are used in this model. In addition, random oversampling is utilized to overcome the problem of data imbalance. The PEMSR model and other models are evaluated and compared to examine the model’s performance and the impacts of data augmentation and transfer learning on the PEMSR model.
4. Discussion
The proposed PEMSR model realizes six types of postearthquake scene recognition. Then, the model is optimized, and the recognition performance is improved. Through several experiments, the PEMSR model demonstrates two advantages. First, the PEMSR model based on SSD with transfer learning outperforms the HOG+SVM method in recognition. Second, data augmentation and balancing overcome the problems caused by insufficient and imbalanced datasets, which improves the accuracy of the PEMSR model on postearthquake scene recognition. In addition, the model facilitates the identification of areas that merit further study.
4.1. PEMSR Model with Transfer Learning Outperforms Other Methods
According to
Table 6, the PEMSR model shows a higher recognition efficiency compared with the traditional HOG+SVM machine learning method: the average detection time required is only 0.4565s, while that of the HOG+SVM method is 8.3472s, and its overall recognition accuracy is higher. The application of the transfer learning method results in a significant improvement on the task of training a model with insufficient samples. The PEMSR model that is proposed in this paper is based on the SSD method which uses the transfer learning strategy to reduce the required training sample data volume. As
Table 10 shows, the overall accuracy for each type of scene is improved due to the transfer learning strategy, although the average detection time is slightly longer compared to the SSD method. The PEMSR model shows better overall accuracy via transfer learning.
4.2. Data Augmentation and Balancing Improves the Accuracy of PEMSR Model
Most deep learning methods require sufficient sample data; otherwise, the training performance will be poor or overfitting will occur. In the PEMSR model, data augmentation is used to overcome the problem of poor original samples.
Table 11 presents the F1 score results of each postearthquake scene for every augmentation experiment. The overall recognition accuracy of the PEMSR model increases with each data augmentation. However, there are also scenes with very small improvements, such as ruins. On this scene, the recognition performance is close to the optimal recognition performance of the PEMSR model. The ruins recognition result F1 score tends to be stable, with only a few fluctuations in each different experiment, and it is difficult to improve the performance via data augmentation. The imbalanced dataset may bias the model towards the majority class of the sample, whereas the recognition of the minority class of the sample is not satisfactory. Whilst simply increasing the amount of training data may not continue to improve the performance of the model, we consider applying the oversampling method in data augmentation to balance the dataset. The results demonstrate that when our dataset was balanced, the recognition performance of the PEMSR model improved substantially, especially on classes of scenes that occupy small proportions of the original dataset, such as ponding, trees and clogged. It is concluded that our PEMSR model offers advantages when faced with sample shortages and imbalance.
4.3. Future Enhancements
As reported in this paper, the PEMSR model realizes excellent recognition performance in postearthquake scene recognition. However, there is scope for further development and application. For example, this paper does not consider the influence of the image resolution on the recognition performance. In the future, contrast experiments will be conducted through image blurring, and the influence of the resolution on PEMSR model recognition performance will be explored. Besides, the samples of this paper come from a mountainous seismic area, and the recognition effect of the model in the urban building dense area needs further verification. Furthermore, we consider taking the common scenes such as houses, trees and ponding as a type of background sample and analyze their impact on model recognition of scenes caused by earthquakes. At the same time, more types of scenes caused by earthquakes were added, such as ground cracks. Finally, the application of additional knowledge from disaster science to realize hierarchical recognition of postearthquake scenes is a subject for further investigation. For example, ruins can be divided into severe and mild damage classes, which may be more valuable for postearthquake relief and reconstruction.