Intelligent classification of ground-based visible cloud images using a transfer convolutional neural network and fine-tuning

Min Wang; Min Wang; Zhihao Zhuang; Kang Wang; Shudao Zhou; Shudao Zhou; Zhanhua Liu

doi:10.1364/OE.442455

1. Introduction

Cloud classification is very important for weather forecasts because it is used to directly determine weather events such as precipitation, snow, hail, and lightning. Based on their shape, structure, characteristics, and height, clouds can be divided into ten classification types: cumulus (Cu), cumulonimbus (Cb), stratocumulus (Sc), stratus (St), nimbostratus (Ns), altostratus (As), altocumulus (As), cirrus (Ci), cirrostratus (Cs), or cirrocumulus (Cc). These cloud types are characterized by their different forms, rapid changes, similarities, and the ease at which they integrate with the background sky. Manual observation is the main method used for actual cloud observations, but this approach has many problems, such as the strong subjectivity and quasi-static properties of clouds, high costs, a paucity of observation points, and incomplete information records.

Much research has therefore recently been conducted into the automatic observation of ground-based clouds using instruments. Thus, it is now possible to identify all types of sky clouds using visible and infrared instruments. However, an image preprocessing feature extraction classification process is typically used during the automatic recognition of ground-based cloud images. Most researchers have focused on developing feature extraction techniques for different cloud attributes. Singh and Glennen [1] used co-occurrence matrices and kernels to extract many features and distinguish five different sky conditions. Moreover, Calbo et al. [2] used texture attributes and the Fourier transform of the visible channels of a camera to classify up to eight types of sky conditions, with an accuracy of approximately 62%. Heinle et al. [3] proposed an automatic cloud classification algorithm based on a set of statistical features that describe the color (mean, standard deviation, skewness, and difference) and texture (energy, entropy, contrast, uniformity, and amount of clouds) of whole sky images. The success rate of this method for classifying seven types of clouds was approximately 75%. Kazantzidis et al. [4], meanwhile, proposed a multicolor standard for sky images that attained an average performance of approximately 87% for seven cloud types. Liu et al. [5] proposed several algorithms for extracting texture and image descriptors, such as multiple random projections, salient local binary patterns, and group pattern learning. Zhuo et al. [6] combined textural and structural features to represent clouds; they achieved a high classification accuracy. Kliangsuwan et al. [7], meanwhile, used a new method based on the fast Fourier transform to extract cloud features, and in doing so, achieved an automatic classification accuracy of up to 90% for seven cloud types. Wacker et al. [8] measured longwave radiation to derive auxiliary information for cloud classification; compared with only using information from sky cameras, they managed to increase the accuracy by nearly 10%, achieving an average accuracy of 80-90%. Xiao et al. [9] fused texture, structure, and color features, and while doing so, observed that clouds could be regarded as having a natural texture. Thus, it is reasonable to use texture and image descriptors to describe the appearance of clouds. Li et al. [10] adopted a new cloud-type recognition method in which they analyzed an image as a group of patches, instead of as a group of pixels; this method obtained an accuracy of 90% for five sky conditions. In the above-mentioned traditional cloud classification methods, after feature extraction, methods such as an artificial neural network (ANN), k-nearest neighbor (KNN), or support vector machine (SVM) are often used as classifiers to distinguish features. Traditional classifiers easily fall into local extrema during training. Furthermore, there are generally only two or three layers in a learning network, which is actually a kind of “shallow learning.” As a result, this method is only applicable to a limited range of cloud types. Only a few types of typical clouds, such as Cu, St, Ac, and Ci, can be automatically identified, and their recognition rates are not high. Currently, there is no universal method for classifying all ten cloud types. Moreover, some studies treat cloud image patches with more recognizable features as the classification object; this approach is far from the actual observation requirements.

Convolutional neural networks (CNNs) have achieved great success in large-scale image classification tasks. Although successful results have been achieved while applying CNNs in different machine learning scenarios, there are still some difficulties regarding their application to cloud image classification. First, in practical applications, a CNN needs a large volume of labelled data for training, but there is currently a lack of cloud image data. Furthermore, annotating cloud images requires professional knowledge, which is very expensive and time-consuming, and the results are subject to observer variability. In the absence of a large amount of labelled data, it is difficult to ensure the effectiveness of a CNN for ground-based cloud image classification. Second, the use of limited training data can easily lead to “overfitting,” and features cannot be well summarized. The appearances of clouds vary in cloud images. When this variability is very large, overfitting would therefore become a more serious problem. Third, training a CNN from scratch requires high computing power, extensive memory resources, and time, all of which place certain limitations on the actual operation process. In these cases, transfer learning (TL) can be regarded as a good solution. TL applies a mature network trained by a sample database to train a new sample database; that is, the learned knowledge is transferred to more quickly solve new problems [11]. When using TL, the pre-trained classifier is fine-tuned to train the new classifier, which effectively utilizes useful information from the source data and reduces the need for new tags. This can also greatly accelerate the convergence speed and reduce the training time [12–13]. Fine-tuning refers to the process of accurately adjusting a model’s parameters, which is one of the skills of machine learning [14]. It is possible to effectively improve the accuracy of cloud classification by hierarchically fine-tuning a pre-trained transfer CNN (TCNN) and then classifying cloud images until the parameters corresponding to the best classification performance are found.

As a CNN can automatically learn image features, through TL it is possible to transfer the deep learning (DL) ability from a mature network. Therefore, using TCNN for cloud recognition should increase the speed at which the model is trained, and at which it recognizes clouds. Thus, this approach could can solve the problems arising from the current paucity of cloud images. In this paper, therefore, a classification method for ground-based visible cloud images is proposed, based on a TCNN. A large sample database was established using the sample expansion method, and the AlexNet network was pre-trained using the ImageNet database. Then, the trained TCNN was regarded as a new network. It was retrained using cloud images, and its weights were adjusted using the backward propagation algorithm. Subsequently, the new network was used to classify cloud images. Finally, the optimal tuning scheme was determined using the layer-by-layer tuning method. The aim of this study was to establish whether this proposed TCNN could obtain a satisfactory classification accuracy, compared to an ab initio trained CNN.

2. Data

The data used in this study were obtained from an online image sample database. They comprised 1,049 types of tagged visible light cloud images, sourced from resources such as the “Aerometeorological Cloud Atlas” [15], “China Cloud Atlas” [16], and the official website of the International Cloud Atlas [17]. All images were stored in the JPG format. Most of these images had different lengths, widths, and resolutions. After collecting these images, it was necessary to individually verify the accuracies of the original annotations and classify them according to the ten cloud classification types: Cu, Cb, Sc, St, Ns, As, Ac, Ci, Cs, and Cc. Some samples are shown in Fig. 1.

Fig. 1. Example of an online-image sample database.

Model	AlexNet	VGG16	Google Net	ResNet
Year	2012	2014	2014	2015
Number of Layers	8	19	22	152
Number of convolution layers	5	16	21	151
Convolutional Kernel Size	11,5,3	3	7,1,3,5	7,1,3,5
Top-5 error	16.4%	7.3%	6.7%	3.57%
Local Corresponding Standardization	Yes	No	Yes	No

Cloud class	Number of cloud images before data enhancement	Number of cloud images after data enhancement
Cu	83	332
Cb	160	640
Sc	140	560
St	82	328
Ns	72	288
As	67	268
Ac	239	956
Ci	111	444
Cs	60	240
Cc	35	140
Total	1049	4196

Abstract

1. Introduction

2. Data

3. Model introduction

3.1 Verification of the necessity of TL

3.2 Determination of network architecture

4. Method

4.1 Data processing

4.2 Method steps

4.3 Pre-trained network

4.4 Fine-tuning network

4.5 Network re-verification after fine-tuning

5. Simulation experiment and result analysis

6. Conclusion

Funding

Acknowledgments

Disclosures

Data availability

References

Data availability

Cited By

Figures (16)

Tables (2)

Equations (6)

Optics Express