Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

Intelligent classification of ground-based visible cloud images using a transfer convolutional neural network and fine-tuning

Open Access Open Access

Abstract

Here a classification method for ground-based visible images is proposed based on a transfer convolutional neural network (TCNN). This approach combines the ability of deep learning (DL) and transfer learning (TL). A sample database containing all ten cloud types was used; this database was expanded four-fold using enhancement processing. AlexNet was chosen as the basic convolutional neural network (CNN), with the ImageNet database being used for pre-transfer. The optimal method, once determined by layer-by-layer fine-tuning, was used to test the classification effects for ten cloud types. The proposed method achieved 92.3% recognition accuracy for all ten ground-based cloud types.

© 2021 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

1. Introduction

Cloud classification is very important for weather forecasts because it is used to directly determine weather events such as precipitation, snow, hail, and lightning. Based on their shape, structure, characteristics, and height, clouds can be divided into ten classification types: cumulus (Cu), cumulonimbus (Cb), stratocumulus (Sc), stratus (St), nimbostratus (Ns), altostratus (As), altocumulus (As), cirrus (Ci), cirrostratus (Cs), or cirrocumulus (Cc). These cloud types are characterized by their different forms, rapid changes, similarities, and the ease at which they integrate with the background sky. Manual observation is the main method used for actual cloud observations, but this approach has many problems, such as the strong subjectivity and quasi-static properties of clouds, high costs, a paucity of observation points, and incomplete information records.

Much research has therefore recently been conducted into the automatic observation of ground-based clouds using instruments. Thus, it is now possible to identify all types of sky clouds using visible and infrared instruments. However, an image preprocessing feature extraction classification process is typically used during the automatic recognition of ground-based cloud images. Most researchers have focused on developing feature extraction techniques for different cloud attributes. Singh and Glennen [1] used co-occurrence matrices and kernels to extract many features and distinguish five different sky conditions. Moreover, Calbo et al. [2] used texture attributes and the Fourier transform of the visible channels of a camera to classify up to eight types of sky conditions, with an accuracy of approximately 62%. Heinle et al. [3] proposed an automatic cloud classification algorithm based on a set of statistical features that describe the color (mean, standard deviation, skewness, and difference) and texture (energy, entropy, contrast, uniformity, and amount of clouds) of whole sky images. The success rate of this method for classifying seven types of clouds was approximately 75%. Kazantzidis et al. [4], meanwhile, proposed a multicolor standard for sky images that attained an average performance of approximately 87% for seven cloud types. Liu et al. [5] proposed several algorithms for extracting texture and image descriptors, such as multiple random projections, salient local binary patterns, and group pattern learning. Zhuo et al. [6] combined textural and structural features to represent clouds; they achieved a high classification accuracy. Kliangsuwan et al. [7], meanwhile, used a new method based on the fast Fourier transform to extract cloud features, and in doing so, achieved an automatic classification accuracy of up to 90% for seven cloud types. Wacker et al. [8] measured longwave radiation to derive auxiliary information for cloud classification; compared with only using information from sky cameras, they managed to increase the accuracy by nearly 10%, achieving an average accuracy of 80-90%. Xiao et al. [9] fused texture, structure, and color features, and while doing so, observed that clouds could be regarded as having a natural texture. Thus, it is reasonable to use texture and image descriptors to describe the appearance of clouds. Li et al. [10] adopted a new cloud-type recognition method in which they analyzed an image as a group of patches, instead of as a group of pixels; this method obtained an accuracy of 90% for five sky conditions. In the above-mentioned traditional cloud classification methods, after feature extraction, methods such as an artificial neural network (ANN), k-nearest neighbor (KNN), or support vector machine (SVM) are often used as classifiers to distinguish features. Traditional classifiers easily fall into local extrema during training. Furthermore, there are generally only two or three layers in a learning network, which is actually a kind of “shallow learning.” As a result, this method is only applicable to a limited range of cloud types. Only a few types of typical clouds, such as Cu, St, Ac, and Ci, can be automatically identified, and their recognition rates are not high. Currently, there is no universal method for classifying all ten cloud types. Moreover, some studies treat cloud image patches with more recognizable features as the classification object; this approach is far from the actual observation requirements.

Convolutional neural networks (CNNs) have achieved great success in large-scale image classification tasks. Although successful results have been achieved while applying CNNs in different machine learning scenarios, there are still some difficulties regarding their application to cloud image classification. First, in practical applications, a CNN needs a large volume of labelled data for training, but there is currently a lack of cloud image data. Furthermore, annotating cloud images requires professional knowledge, which is very expensive and time-consuming, and the results are subject to observer variability. In the absence of a large amount of labelled data, it is difficult to ensure the effectiveness of a CNN for ground-based cloud image classification. Second, the use of limited training data can easily lead to “overfitting,” and features cannot be well summarized. The appearances of clouds vary in cloud images. When this variability is very large, overfitting would therefore become a more serious problem. Third, training a CNN from scratch requires high computing power, extensive memory resources, and time, all of which place certain limitations on the actual operation process. In these cases, transfer learning (TL) can be regarded as a good solution. TL applies a mature network trained by a sample database to train a new sample database; that is, the learned knowledge is transferred to more quickly solve new problems [11]. When using TL, the pre-trained classifier is fine-tuned to train the new classifier, which effectively utilizes useful information from the source data and reduces the need for new tags. This can also greatly accelerate the convergence speed and reduce the training time [1213]. Fine-tuning refers to the process of accurately adjusting a model’s parameters, which is one of the skills of machine learning [14]. It is possible to effectively improve the accuracy of cloud classification by hierarchically fine-tuning a pre-trained transfer CNN (TCNN) and then classifying cloud images until the parameters corresponding to the best classification performance are found.

As a CNN can automatically learn image features, through TL it is possible to transfer the deep learning (DL) ability from a mature network. Therefore, using TCNN for cloud recognition should increase the speed at which the model is trained, and at which it recognizes clouds. Thus, this approach could can solve the problems arising from the current paucity of cloud images. In this paper, therefore, a classification method for ground-based visible cloud images is proposed, based on a TCNN. A large sample database was established using the sample expansion method, and the AlexNet network was pre-trained using the ImageNet database. Then, the trained TCNN was regarded as a new network. It was retrained using cloud images, and its weights were adjusted using the backward propagation algorithm. Subsequently, the new network was used to classify cloud images. Finally, the optimal tuning scheme was determined using the layer-by-layer tuning method. The aim of this study was to establish whether this proposed TCNN could obtain a satisfactory classification accuracy, compared to an ab initio trained CNN.

2. Data

The data used in this study were obtained from an online image sample database. They comprised 1,049 types of tagged visible light cloud images, sourced from resources such as the “Aerometeorological Cloud Atlas” [15], “China Cloud Atlas” [16], and the official website of the International Cloud Atlas [17]. All images were stored in the JPG format. Most of these images had different lengths, widths, and resolutions. After collecting these images, it was necessary to individually verify the accuracies of the original annotations and classify them according to the ten cloud classification types: Cu, Cb, Sc, St, Ns, As, Ac, Ci, Cs, and Cc. Some samples are shown in Fig. 1.

 figure: Fig. 1.

Fig. 1. Example of an online-image sample database.

Download Full Size | PDF

ImageNet is a public database that has been widely used in the fields of computer vision and pattern recognition; it has almost become the standard database for algorithm performance tests in DL. ImageNet contains more than 14 million images, covering more than 20,000 classifications. More than one million of these images have clear category and object location annotations. In this study, ImageNet [18] was used to pre-train a mature CNN, called AlexNet. Some of the samples are shown in Fig. 2.

 figure: Fig. 2.

Fig. 2. Some samples of ImageNet.

Download Full Size | PDF

3. Model introduction

3.1 Verification of the necessity of TL

For different-sized databases, there are two main methods for applying a pre-trained CNN to a new image classification task. In the first method, if the new sample database is small, the pre-trained CNN’s weights are used as fixed feature extractors. In this method, the network remains unchanged [1920], the feature vectors are only extracted before the last fully connected layer, and the linear classifiers are then trained for classification. In the second method, if the new sample base is large, the new sample database is used to fine-tune the pre-trained CNN, following which the back-propagation algorithm is used to fine-tune its weights [21] and finally the CNN is updated to solve a new problem [2223]. This enables the first layers of the network to learn highly generalizable features from a larger sample base, while the latter layers of the network can learn the appropriate model. Here, the accuracy of the corresponding smaller sample database was higher than that obtained using the first method. Taking a network structure with four convolution layers (called CloudA) as an example [24], the pre-training model was used as shown in Fig. 3.

 figure: Fig. 3.

Fig. 3. Example of how to use the pre-trained model.

Download Full Size | PDF

CloudA has achieved good classification results of five types of cloud images from two sample databases (Swimcat [25] and Total-sky [26]). The above network, when used with four convolution structures, has also been shown to be able to identify clouds in some sample databases [24]. Therefore, this study used online image sample databases, which featured ten types of cloud images. In the experiment, the parameters of CloudA remained unchanged, while the output of the last layer of the full connection layer was changed from five to ten. The accuracy and loss of the obtained verification set are shown in Fig. 4 (left and right, respectively).

 figure: Fig. 4.

Fig. 4. Accuracy and loss of verification set, using CloudA.

Download Full Size | PDF

As shown in Fig. 4, on the premise that the training accuracy was close to 100% and the loss continued to decline, the accuracy of the verification set was less than 40%, and the loss of the verification set first decreased rapidly, and then rebounded and rose. It was preliminarily determined that fitting had occurred at this time. In this regard, a comparative experiment was conducted using the sample library after data enhancement; the accuracy and loss of the obtained verification set are shown in Fig. 5 (left and right, respectively).

 figure: Fig. 5.

Fig. 5. Comparative experiment before and after data enhancement, using CloudA.

Download Full Size | PDF

As can be seen from Fig. 5, after expanding the original data four-fold, the accuracy (a) of the verification was been greatly improved (it basically remained at approximately 60%) but the curve of the loss value (b) still rose, after initially falling. At this time, therefore, there was still a certain degree of overfitting in the network.

The improvement in accuracy observed after data enhancement shows that: (1) the original number of samples was not sufficient to train the network from scratch, meaning that overfitting would inevitably occur; (2) the data enhancement method was both effective and feasible, and could improve the accuracy of the model to a certain extent; and (3) at this time, the network had been fitted, meaning that TL could be applied.

3.2 Determination of network architecture

Most transfer learning uses a classical network structure for pre-training, such as AlexNet [27], Google Net [28], VGG16 [29], and ResNet [30]. Here, AlexNet with five convolution layers was chosen to conduct transfer learning. The main reasons for this were as follows:

First, a large number of cases in the field of image classification have shown that pre-trained networks can have a good classification effect.

Second, an AlexNet model that is trained in advance by ImageNet is easier to obtain, because of the relatively small number of parameters.

Third, CloudA achieved an accuracy of nearly 60% on the online image sample database after data enhancement, which was six times higher than the accuracy obtained through random selection. It was thus reasonable to believe that after TL, using a network with four or more layers in its convolution structure could further improve the experimental results. In the classical network structure, the number of convolution layers in AlexNet was the closest to CloudA. A comparison of several classical pre-training network models is presented in Table 1.

Tables Icon

Table 1. Comparison of several classical pre-training network models

Fourth, although there are deeper network structures, such as Google Net and VGGNet, the convergence rates of these structures are slow. The aim of the experiment conducted here was to find a fine-tuning method to improve the performance of a pre-trained network; thus, AlexNet represented a reasonable choice.

AlexNet consists of five convolution layers and three fully connected layers. To adapt to the classification task, the last fully connected layer is modified into ten nodes, where each node represents a category in the image sample database. The structures and parameters of the experimental network are shown in Fig. 6. For each layer, the top half of Fig. 6 represents the input, the middle part represents the parameters, and the bottom half represents the output. For example, in conv1, $3 \times 227 \times 227$ means that the input image of this layer comprises three channels and $227 \times 227$ pixels, and that the convolution kernel size is $11 \times 11$. Meanwhile, $96 \times 55 \times 55$ means that the output image comprises 96 channels and $55 \times 55$ pixels.

 figure: Fig. 6.

Fig. 6. AlexNet structure.

Download Full Size | PDF

4. Method

4.1 Data processing

As the resolution of the original sample database was not consistent, it was necessary to adjust the image size uniformly based on bilinear interpolation, as follows:

$$\begin{aligned} f(x,y) &= \frac{1}{{({x_2} - {x_1})({y_2} - {y_1})}}[({x_2} - x)({y_2} - y)f({x_1},{y_1}) + (x - {x_1})({y_2} - y)f({x_2},{y_1})\\ &+ ({x_2} - x)(y - {y_1})f({x_1},{y_2}) + (x - {x_1})(y - {y_1})f({x_2},{y_2})] \end{aligned}$$
where $f(x,y)$ is the gray value of the target image at the pixel point $(x,y)$, and $f({x_1},{y_1})$, $f({x_1},{y_2})$, $f({x_2},{y_1})$, and $f({x_2},{y_2})$ are the gray values of the four pixels $({x_1},{y_1})$, $({x_2},{y_1})$, $({x_1},{y_2})$, and around point $(x,y)$, respectively.

Due to the limited amount of data in the sample database, and in order to solve the problems of over-fitting and the low recognition accuracy caused by small sample size, it was necessary to enhance the sample data. Horizontal flips, scaling and image brightness adjustments were used to expand the sample database, and the sample number was increased four-fold.

As the vertical features of clouds (especially the texture features of a cloud’s bottom) are usually important factors in cloud recognition, methods that change the vertical distributions of clouds, such as vertical flipping and rotation, are not suitable for enhancing cloud image data. Therefore, horizontal flipping was used to enhance the samples. The scaling was set to 1.2, and parts beyond the standard size after bilinear interpolation were trimmed. Similarly, the brightness conversion coefficient was 1.2. From 1,049 original images, after data enhancement there were 4,196 samples. The number of samples in each category is listed in Table 2.

Tables Icon

Table 2. Number of images for each class in the cloud image sample database

The establishment of the sample database can be divided into the following steps.

  • (1) First, all image sizes were adjusted to $227 \times 227$ pixels using bilinear interpolation.
  • (2) The sample database was expanded using horizontal flips, scaling, and brightness adjustments.
  • (3) Finally, 80% of the sample database was selected as the training set, 10% was selected as the validation set, and 10% was selected as the test set, as shown in Fig. 7.

 figure: Fig. 7.

Fig. 7. Experimental distribution of the sample database.

Download Full Size | PDF

Through the above methods, the number of databases was expanded four-fold. An example of an expanded image is presented in Fig. 8. This four-fold expansion does not necessarily mean that the sample size was sufficient; this could only be determined by the experimental results. Assuming that other parameters remain unchanged, if the classification accuracy of the expanded data was not significantly different from that of the original data, this would indicate that the amount of original data was sufficient. If the expanded data’s accuracy was significantly higher than that of the original data, however, this would indicate that the original data volume was not sufficient to support network training. The sample size can be increased by increasing the number of source pictures, or by increasing the expansion multiple, until the classification accuracy is no longer affected by increasing the sample size.

 figure: Fig. 8.

Fig. 8. Cloud images using data enhancement.

Download Full Size | PDF

4.2 Method steps

The basic idea of ground-based visible cloud image classification based on the TCNN is shown in Fig. 9. The main steps are as follows.

 figure: Fig. 9.

Fig. 9. Basic idea of ground-based visible cloud image classification based on TCNN.

Download Full Size | PDF

Step 1: Use AlexNet as a suitable CNN architecture and build it.

Step 2: Pre-process the original ground-based cloud images.

Step 3: Pre-train the network using ImageNet; this network is called the TCNN.

Step 4: The trained TCNN is regarded as a new network, and this new network is retrained using cloud images. The weights of the new network are adjusted using a backward propagation algorithm, following which the cloud images are classified using the new network.

4.3 Pre-trained network

AlexNet can be trained using a large-scale image database and saved parameters to obtain a pre-trained network. Here, three images (alpaca, sea lion, and zebra) were then randomly selected from the 1,000 image types in ImageNet18 and were inputted into the pre-trained AlexNet for recognition. All three pictures were successfully identified; the output effects are shown in Fig. 10. Thus, the network was correctly pre-trained and assigned.

 figure: Fig. 10.

Fig. 10. Examples of pre-training model.

Download Full Size | PDF

4.4 Fine-tuning network

As the characteristics of different tasks are more suitable for initializing network weights than random weights, in this study, the weights of all layers were transmitted using AlexNet, which was trained using ImageNet. After AlexNet was pre-trained, a total of eight rounds of experiments were planned; the weights were adjusted using a backward propagation algorithm. In the first round, the parameters of the last layer of the pre-processed AlexNet were retrained using the cloud image sample database, and all of the parameters of the other layers were frozen until the convergence rate was calculated. Similarly, in the second round, the last two layers of AlexNet were pre-trained, and all of the parameters of the other layers were frozen during the update process. Next, a layer was gradually added to the update process. In this way, the entire network was fine-tuned at the same time. In other words, after AlexNet was pre-trained using natural images, the network was fine-tuned in a layered manner.

The effects of the stochastic gradient descent (SGD) and Adam optimizer were then compared. To control the variables, all of the parameters outside the optimizer were left unchanged. Adam's accuracy was found to be nearly 10% higher than that of SGD. Therefore, Adam was selected as the experimental optimizer. Comparisons of the classification accuracies of SGD and Adam for 120 epochs are shown in Fig. 11 (left and right, respectively). The Adam optimizer can be expressed as:

$${m_t} = {\beta _1}{m_{t - 1}} + ({1 - {\beta_1}} ){g_t}$$
$${v_t} = {\beta _2}{v_{t - 1}} + ({1 - {\beta_2}} ){g}_t^2$$
where ${m_t}$ and ${v_t}$ are deviation estimates at the first and second times of the gradient, respectively, ${g_t}$ is the gradient of the cost function of t iterations to the weight $\theta$, and ${\beta _1}$ and ${\beta _2}$ are attenuation rates, which are both close to 1. The Adam update rule is as follows:
$${\theta _{t + 1}} = {\theta _t} - \frac{\eta }{{ - \sqrt {{{\hat{v}}_t} + \varepsilon } }}{\hat{m}_t}$$
where $\theta$ is the weight, $\eta$ is the learning rate, $\varepsilon$ is a small number and generally set e-8, and ${\hat{m}_t}$ and ${\hat{v}_t}$ are the bias correction estimates for the first and the second times of the gradient, respectively.

 figure: Fig. 11.

Fig. 11. Comparison of the classification accuracy between SGD and Adam (the horizontal axis shows the number of steps, with more steps and closer spacing).

Download Full Size | PDF

L2 regularization can be expressed as:

$${E_{L2}} = \frac{1}{m}\sum\limits_i^m {{{({y_i} - {f_w}({x_i}))}^2} + \sum\limits_t {w_t^2} }$$
where ${f_w}({x_i})$ is the actual value of the network function, ${y_i}$ is the corresponding label, and ${w_t}$ is the weight equivalent to introducing a weight attenuation term in the weight update, which can be expressed as:
$${w_t} = {w_{t - 1}} - \eta (\triangle E + \gamma {w_{t - 1}})$$
where $\eta$ is the learning rate and $\gamma$ is regularization coefficient.

In addition, an experiment was conducted to verify the selection of the epoch number. The accuracy and loss curves for 360 epochs are shown in Fig. 12 (left and right, respectively). The peak accuracy occurred at approximately 150 epochs, after which the accuracy remained unchanged. Therefore, in this study, the epoch value was set to 180 during layered fine-tuning.

 figure: Fig. 12.

Fig. 12. Experimental results under 360 epochs (the horizontal axis shows the number of steps, with more steps and closer spacing).

Download Full Size | PDF

Based on the above research, the following network parameters were selected. The batch size was set to 32 and training was stopped at 180 epochs. Using the Adam optimizer, the learning rate of the fine-tuned layer was 0.0001, the learning rate decreased to 0.99, and the learning rate of the frozen layer was 0. Training was regularized using weight attenuation (the L2 regularization coefficient was set to 0.001), and the first two fully connected layers were regularized by random deactivation (the random deactivation ratio was set to 0.5, by default).

When the fourth round fine-tuned the last four layers (conv5-fc8), abnormal results appeared in the accuracy of the verification set, as shown in Fig. 13 (left). Therefore, another experiment was conducted. The last five layers (conv4-fc8) of the fine-tuning were also similar, as shown in Fig. 13 (right). That is, after a period during which the accuracy increased, it then rapidly declined and remained at a lower level. This occurred because of the overfitting of results from insufficient samples (relatively excessive training parameters). Therefore, the algorithm fine-tuned the latter layers of the network.

 figure: Fig. 13.

Fig. 13. Verification set accuracy curve when fine-tuning the last four and five layers (the horizontal axis shows the number of steps, with more steps and closer spacing).

Download Full Size | PDF

In this study, the performances of the following methods were compared regarding their ability to classify clouds:

Network I: A network without TL training from the beginning.

Network II: A network with TL but no fine-tuning.

Network III: A network in which the last two layers (fc7-fc8) of AlexNet were subjected to TL and fine-tuning.

Network IV: A network where the last three layers (fc6-fc8) of AlexNet were subjected to TL and fine-tuning.

After the TL and fine-tuning of AlexNet, the model parameters were saved and then tested using the test set. The accuracy is shown in Fig. 14, revealing that low classification accuracy was obtained for St and Cs. In the optimal “fine-tuning AlexNet: fc6-fc8” scheme, the accuracies of St and Cs were 82.3 and 86.4%, respectively. Overall, for all types, the accuracy of Network I was 59.1%, the accuracy of Network II was 41.3%, the accuracy of Network III was 85.9%, and the accuracy of Network IV was 92.3%. It can be seen that fine-tuning and TL were feasible, because Network IV achieved a high accuracy.

 figure: Fig. 14.

Fig. 14. Classification accuracy of each training scheme.

Download Full Size | PDF

4.5 Network re-verification after fine-tuning

To observe and analyze the experimental results more clearly and intuitively, a confusion matrix was generated from the final test results. As shown in Fig. 15, there were 35 samples of Cu in the test set, of which 34 samples were correctly classified, and one was misclassified as Ci. Therefore, the total accuracy of the Cu classification was 0.97.

 figure: Fig. 15.

Fig. 15. Confusion matrix of the test results.

Download Full Size | PDF

5. Simulation experiment and result analysis

In this experiment, nine images were randomly selected from the cloud database for testing; the results are shown in Fig. 16. The first classification result of the first column was 01cu (Cu), for which the possibility was 1; the second classification result was 08ci (Ci cloud), for which the possibility was 1; and the classification result of the third cloud map was 10cc (Ci Cu), for which the possibility was 1. The first classification result in the second column was 04th (stratiform cloud), for which the possibility was 1; the second classification result was 06as (rain layer cloud), for which the possibility was 1; and the classification result of the third cloud map was 09cs (stratification). The first classification result in the third column was 03sc (Sc), for which the possibility was 1; the second classification result was 05 ns (rain layer cloud), for which the possibility was 1; and the classification result of the third cloud map was 02cb (Cb). Therefore, these experiments showed that Network IV achieved the best classification performance, and that the classification probability almost reached 1 for these nine images. It is feasible to classify cloud images by fine-tuning the transfer learning without a large number of labelled images.

 figure: Fig. 16.

Fig. 16. Test results of the TCNN cloud classification after fine-tuning.

Download Full Size | PDF

6. Conclusion

In this paper, an intelligent classification method for ground-based visible cloud images is proposed. Using TL, the network was pre-trained using the ImageNet database; it was then retrained using an online image sample database, followed by layer-by-layer fine-tuning. The optimal fine-tuning scheme was obtained, thus realizing complete classification, high precision, and fast identification of ground-based visible cloud images.

  • (1) AlexNet was trained on a large dataset to obtain the pre-trained model, following which feature-based TL was conducted. Fine-tuning TL was found to be feasible, because regardless of the type of fine-tuning method adopted, the effect of pre-training and fine-tuning AlexNet was much better than that obtained by training AlexNet from scratch and not fine-tuning the TL network.
  • (2) The classification accuracies of various fine-tuning methods were experimentally analyzed. Intelligent classification, which used Network IV, achieved a recognition accuracy of 92.3% for ten cloud types. This was found to be the best fine-tuning method on the premise of fine-tuning the full connection, layer-by-layer.
  • (3) The pre-training model was shown to have several advantages over the random initialization model. For example, the preprocessed model could clearly extract edge features, shape features, and other advanced features from images, and could obtain a higher accuracy. The ability of the random initialization model was not sufficiently strong.
  • (4) By analyzing the fine-tuning experimental results, it is found that the categories with low accuracy were mainly stratus and cirrus stratus. The difference between these cloud types was small, so they were difficult to distinguish. The confusion matrix, combined with analysis of sample balance, revealed that stratus clouds and high-level clouds occurred in two categories when using few training samples. In the future, increasing the numbers of samples in these two categories would help to determine whether their accuracy can be improved.
  • (5) Many studies have shown that it is not absolutely necessary to fine-tune the first few convolution layers, but that is very important to fine-tune the last few layers. This is because the initial layer describes the general features of an image, such as color and edge, whereas several of the last fully connected layers describe high-order features related to the image classification task.

The size of the sample library led to abnormal results during the fine-tuning of the convolution layer. This sample library did not permit an effective comparison, but it did reflect that fine-tuning the convolution layer introduced a large number of parameters and consumed a large amount of computing resources. Future experiments should aim to improve the sample database, seek more effective solutions, and improve the recognition rates of stratiform and high-level clouds.

Funding

National Natural Science Foundation of China (41775165, 41775039); The Startup Foundation for Introducing Talent of NUIST (2021r034).

Acknowledgments

We would like to thank Editage [www.editage.cn] for English language editing.

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. M. Singh and M. Glennen, “Automated ground-based cloud recognition,” Pattern Anal. Appl. 8(3), 258–271 (2005). [CrossRef]  

2. J. Calbó and J. Sabburg, “Feature Extraction from Whole-Sky Ground-Based Images for Cloud-Type Recognition,” J. Atmos. Oceanic Technol. 25(1), 3–14 (2008). [CrossRef]  

3. A. Heinle, A. Macke, and A. Srivastav, “Automatic cloud classification of whole sky images,” Atmos. Meas. Tech. 3(3), 557–567 (2010). [CrossRef]  

4. A. Kazantzidis, P. Tzoumanikas, A. F. Bais, S. Fotopoulos, and G. Economou, “Cloud detection and classification with the use of whole-sky ground-based images,” Atmos. Res. 113, 80–88 (2012). [CrossRef]  

5. S. Liu, C. Wang, B. Xiao, Z. Zhong, and Y. Shao, “Ground-based cloud classification using multiple random projections,” IEEE International Conference on Computer Vision in Remote Sensing (2012).

6. W. Zhuo, Z. Cao, and Y. Xiao, “Cloud Classification of Ground-Based Images Using Texture–Structure Features,” J. Atmos. Oceanic Technol. 31(1), 79–92 (2014). [CrossRef]  

7. T. Kliangsuwan and A. Heednacram, “Feature extraction techniques for ground-based cloud type classification,” Expert Systems with Appl. 42(21), 8294–8303 (2015). [CrossRef]  

8. S. Wacker, J. Gröbner, C. Zysset, L. Diener, P. Tzoumanikas, A. Kazantzidis, L. Vuilleumier, R. Stöckli, S. Nyeki, and N. Kämpfer, “Cloud observations in Switzerland using hemispherical sky cameras,” J. Geophys. Res.: Atmos. 120(2), 695–707 (2015). [CrossRef]  

9. Y. Xiao, Z. Cao, W. Zhuo, L. Ye, and L. Zhu, “mCLOUD: A Multi-view Visual Feature Extraction Mechanism for Ground-based Cloud Image Categorization,” J. Atmos. Oceanic Technol. 33(4), 789–801 (2016). [CrossRef]  

10. Q. Li, Z. Zhang, W. Lu, and J. Yang, “From pixels to patches: a cloud classification method based on bag of micro-structures,” Atmos. Meas. Tech. 8(10), 10213–10247 (2016). [CrossRef]  

11. M. Hussain, J. Bird, and D. Faria, “A Study on CNN Transfer Learning for Image Classification,” Adv. in Computat. Intelligence Systems 11(840), 191–202 (2018). [CrossRef]  

12. F. Jiang, H. Liu, S. Yu, and Y. Xie, “Breast mass lesion classification in mammograms by transfer learning,” Int. Conf. on Bioinformatics & Computat. Biol. 1, 59–62 (2017). [CrossRef]  

13. Y. Xie, S. Su, and S. Li, “A pedestrian classification method based on transfer learning,” International Conference on Image Analysis & Signal Processing, 420–425 (2010).

14. H. Shouno, S. Suzuki, and S. Kido, “A Transfer Learning Method with Deep Convolutional Neural Network for Diffuse Lung Disease Classification,” Neural Information Processing 9489(9489), 199–207 (2015). [CrossRef]  

15. PLA Air Force Headquarters. Aerometeorological cloud Atlas. PLA Air Force Headquarters, Beijing, 1973.

16. China Meterological Administration. China cloud Atlas. Beijing, China Meteorological Press, 2004.

17. World Meteorological Organization. International Cloud Atlas. World Meteorological Organization, 2017. https://cloudatlas.wmo.int/en/home.html

18. J. Deng, W. Dong, R. Socher, L. Li, and F. Li, “ImageNet: A large-scale hierarchical image database,” IEEE Conference on Computer Vision & Pattern Recognition. IEEE (2009).

19. A. Razavian, H. Azizpour, J. Sullivan, and S Carlsson, “CNN Features off-the-shelf: an Astounding Baseline for Recognition,” IEEE conference on computer vision and pattern recognition workshops, 8 pages (2014).

20. O. Penatti, K. Nogueira, and J. Santos, “Do deep features generalize from everyday objects to remote sensing and aerial scenes domains,” IEEE Conf. on Comput. Vis. & Pattern Recognit. Workshops 1, 44–51 (2015). [CrossRef]  

21. E. Maggiori, Y. Tarabalka, G. Charpiat, and P. Alliez, “Convolutional Neural Networks for Large-Scale Remote-Sensing Image Classification,” IEEE Trans. on Geoscience and Remote Sensing 55(2), 645–657 (2017). [CrossRef]  

22. J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, and Trevor Darrell, “DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition,” International Conference on Machine Learning32, 10 pages (2013).

23. H. Azizpour, A. Razavian, J. Sullivan, A. Maki, and S. Carlsson, “From generic to specific deep representations for visual recognition,” IEEE Conf. on Comput. Vis. Pattern Recognit. Workshops (CVPRW). IEEE Computer Society 1, 36–45 (2015). [CrossRef]  

24. M. Wang, S. D. Zhou, and Z. H. Liu, “CloudA: A Ground-Based Cloud Classification Method with a Convolutional Neural Network,” J. Atmos. Oceanic Technol. 37(9), 1661–1668 (2020). [CrossRef]  

25. S. Dev, Y. H. Lee, and S. Winkler, “Categorization of cloud image patches using an improved texton-based approach,” in Proc. IEEE Int. Conf. Image Process. (ICIP), 422–426 (2015).

26. J. Yang, Q. Min, W. T. Lu, Y. Ma, W. Yao, T. S. Lu, J. Du, and G. Y. Liu, “A total sky cloud detection method using real clear sky background,” Atmos. Meas. Tech. 9, 587–597 (2016). [CrossRef]  

27. A. Krizhevsky, I. Sutskever, and G. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” Int. Conf. on Neural Inform. Process. Systems 60(6), 84–90 (2017). [CrossRef]  

28. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, and A. Rabinovich, “Going Deeper with Convolutions,” IEEE Conf. on Comput. Vis. Pattern Recognit. 1(1), 1 (2015). [CrossRef]  

29. K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” Comput. Sci. 9, 1–14 (2014).

30. K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)1, 770–778 (2016).

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (16)

Fig. 1.
Fig. 1. Example of an online-image sample database.
Fig. 2.
Fig. 2. Some samples of ImageNet.
Fig. 3.
Fig. 3. Example of how to use the pre-trained model.
Fig. 4.
Fig. 4. Accuracy and loss of verification set, using CloudA.
Fig. 5.
Fig. 5. Comparative experiment before and after data enhancement, using CloudA.
Fig. 6.
Fig. 6. AlexNet structure.
Fig. 7.
Fig. 7. Experimental distribution of the sample database.
Fig. 8.
Fig. 8. Cloud images using data enhancement.
Fig. 9.
Fig. 9. Basic idea of ground-based visible cloud image classification based on TCNN.
Fig. 10.
Fig. 10. Examples of pre-training model.
Fig. 11.
Fig. 11. Comparison of the classification accuracy between SGD and Adam (the horizontal axis shows the number of steps, with more steps and closer spacing).
Fig. 12.
Fig. 12. Experimental results under 360 epochs (the horizontal axis shows the number of steps, with more steps and closer spacing).
Fig. 13.
Fig. 13. Verification set accuracy curve when fine-tuning the last four and five layers (the horizontal axis shows the number of steps, with more steps and closer spacing).
Fig. 14.
Fig. 14. Classification accuracy of each training scheme.
Fig. 15.
Fig. 15. Confusion matrix of the test results.
Fig. 16.
Fig. 16. Test results of the TCNN cloud classification after fine-tuning.

Tables (2)

Tables Icon

Table 1. Comparison of several classical pre-training network models

Tables Icon

Table 2. Number of images for each class in the cloud image sample database

Equations (6)

Equations on this page are rendered with MathJax. Learn more.

f ( x , y ) = 1 ( x 2 x 1 ) ( y 2 y 1 ) [ ( x 2 x ) ( y 2 y ) f ( x 1 , y 1 ) + ( x x 1 ) ( y 2 y ) f ( x 2 , y 1 ) + ( x 2 x ) ( y y 1 ) f ( x 1 , y 2 ) + ( x x 1 ) ( y y 1 ) f ( x 2 , y 2 ) ]
m t = β 1 m t 1 + ( 1 β 1 ) g t
v t = β 2 v t 1 + ( 1 β 2 ) g t 2
θ t + 1 = θ t η v ^ t + ε m ^ t
E L 2 = 1 m i m ( y i f w ( x i ) ) 2 + t w t 2
w t = w t 1 η ( E + γ w t 1 )
Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.