Abstract
Band selection is a kind of dimension reduction method, which tries to remove redundant bands and choose several pivotal bands to represent the entire hyperspectral image (HSI). Supervised band selection algorithms tend to perform well because of the introduction of prior information. However, The traditional methods are based on the entire image, without taking into account the differences in ground categories, and cannot figure out which band is discriminative for a specific category. In this paper, a supervised method is proposed based on the ground category with convolutional neural network (CNN). Firstly, we propose a structure called contribution map which can record discriminative feature location. Secondly, the contribution map is added to CNN to generate a new model called contribution map based CNN (CM-CNN). Thirdly, we apply CM-CNN for HSI classification with the whole bands. Then, we can get the contribution map which records discriminative bands location for each category. Finally, the contribution map guides us to select discriminative bands. We found that CM-CNN model can obtain a satisfactory classification result while preserving the position information of important bands. To verify the superiority of the proposed method, experiments are conducted on HSI classification. The results demonstrated the reliability of the proposed method in HSI classification.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Hyperspectral image contains hundreds of bands which range from the visible to the near infrared wavelength [8]. In other words, each pixel in the image captures hundreds of narrow spectral bands from the same area on the Earth. Hyperspectral remote sensing plays an important role in national economy and national defense, and has been widely used in the field of classification, target detection, agricultural monitoring and mineral mapping [13].
There are many problems in the actual use of hyperspectral data. The most important one is that the band redundancy is too great [4]. A hyperspectral image often contains hundreds of bands, and the rich spectral information can enhance the ability to distinguish objects. However, for specific tasks, such as classifying the types of crops, it may not be necessary to analyze the objects with all the bands. In the contrast, the bands irrelevant to the target object may weaken the classification accuracy [20]. Compared with the high dimensionality of bands, the ground truth of HSI are often very limited under the condition that labeling samples is very time-consuming and expensive [2]. As a well-known conclusion, the generalization ability of the classifier is weakened if the dimensionality of features is too high in the case of limited training samples [4, 8]. Therefore, it is necessary to carry out dimension reduction method for hyperspectral data.
Dimension reduction methods are mostly divided into two categories: feature extraction and band selection. The feature extraction based dimension reduction methods consider that hyperspectral data is projected to a lower dimensional space so as to get an abstract representation of the data [16, 21]. The research direction includes principal component analysis [18], wavelet transformation [10], independent component analysis [6] and linear discriminant analysis [9]. Although these methods can get relatively satisfactory results, they also have some serious problems. First, these method have a high time complexity [2]. Second, the extracted features may lose the physical meaning for interpretation [19].
Band selection based dimension reduction methods aim to select discriminative bands for a specific task. The ultimate selected bands should be able to represent the whole hyperspectral data with little loss of effectiveness [17]. It is obvious that band selection has advantages of preserving original physical information compared with feature extraction methods. Therefore, we try to figure out a band selection method for hyperspectral task such as HSI classification.
Band selection methods can be divided into unsupervised and supervised according to whether there are training samples [17]. Numerous researches have focused on the unsupervised band selection methods. This is a very versatile method that comprehensively considers the distribution of all ground category in the image and retains an optimal subset distribution of original features [3]. However, it is not very effective for a specific task such as vegetation classification. Not all the band selected are used to distinguish vegetation category. Supervised band selection algorithms often performs better because of the introduction of prior information. However, there are still some problems for supervised method. The traditional methods are based on the entire image, without taking into account the differences in ground categories, and cannot figure out which band is discriminative for a specific category. Another disadvantage is that it is not possible to select the band flexibly according to the type of ground taking into account different task. Moreover, the traditional supervised method is not intuitive and interpretable. It cannot explain the specific role of a band.
To alleviate problems mentioned above, we proposed a convolutional neural network that generates contribution map (CM-CNN) in this paper. Due to excellent feature extraction capabilities, convolutional neural networks are widely used in hyperspectral image classification. Recent research on CNN shows the ability to localize discriminative image regions in the field of natural image processing [23]. We apply this idea to hyperspectral images classification and try to locate the position of discriminative bands through simple spectral classification. In this paper the structure of CNN has been changed sightly, and we proposed a contribution map by replacing the fully-connected layer with pooling features in order to record the location of discriminative band while classifying at the same time. The main contributions of this paper are as follows: (1) We apply neural network interpretability methods to hyperspectral task for the first time and have achieved remarkable results. (2) The proposed model is able to locate discriminative bands for a specific category and show the contribution of all the bands. (3) The proposed model can be transfered to other hyperspectral tasks such as anomaly detection.
2 Related Work
Band selection is considered to be an efficient dimension reduction method for HSI. The purpose of this method is to obtain a subset to represent the most discriminative information of HSI. Band selection is able to reduce the amount of data and speed up processing at the same time. Supervised methods and unsupervised methods are the most commonly used approaches [17]. The difference is that supervised methods need training samples with ground truth, while unsupervised methods do not require training and learning.
This paper mainly discusses supervised methods for the reason that they have better performance under conditions of prior knowledge. Existing supervised methods are mostly based on classification. Lots of supervised algorithms are proposed including local spatial information based [22], the hypergraph model based [1], the support vector machine (SVM) based [14], the non homogeneous hidden Markov chain based [7] and the sparse independence criterion based methods [5]. However, there are still some problems for the supervised methods above. The result of selected bands depends on the accuracy and even distribution of the labels. Another shortcoming is that the traditional supervised methods are not intuitive and interpretable. It can not figure out which band is discriminative for a specific type of ground category.
With the development of machine learning and computer vision, methods based on deep learning have become the mainstream of computer science [15, 24]. Therefore, applying deep learning methods to hyperspectral images is an inevitable direction. Recently, convolutional neural networks have been widely applied in the field of HSI [8]. The CNN model is a special type of feedforward neural network which is able to automatically learn high-level semantic features from the samples [11]. A 3D CNN model is proposed for HSI classification in [4]. Spectral-spatial based CNN model also gets a satisfying result [13]. CNN can even be applied to hyperspectral image reconstruction [12]. It is also effective for band selection [19]. As far as we know, deep learning methods for band selection are quite limited. Since the neural network is a black box, the interpretability of this method needs to be improved. Some works find that CNN can locate the features that activate the classification result to some extent [23]. The advantage of the deep learning methods is that it can extract more abstract high-level semantic features that are thought to reflect intrinsic meaning of the input [4, 11]. So, we try to exact location information of input bands with CNN.
3 Proposed Framework
We propose a CNN model that can select bands specifically for each category while classifying. First, we introduce the specific process of 1D CNN classification using spectral information. Then, we put forward the contribution map which tends to retain location information of the bands in classification. Finally, we introduce how to select bands based on the contribution map.
3.1 Spectral Based Classification
The ultimate goal of hyperspectral image classification is to assign a unique category to each pixel in an image which contains hundreds of bands. In this paper, we adopt the typical 1D CNN model for classification that only uses the spectral information.
Since each pixel is a one-dimensional vector containing hundreds of bands, we propose a 1D CNN model. The network structure is shown in the Fig. 1. The 1D CNN consists of 5 convolution layers, 2 pooling layers and 1 fully-connected layer. Then the last layer of convolution is reshaped to an one-dimensional vector. The feature vector goes through a fully-connected layer lately to reduce dimensions. Finally, we use the softmax function to compute the probability of each category for the pixel.
The main units of 1D CNN include convolution layer, pooling layer and nonlinear function. The formula for the convolution operation is
where \(x^{k-1}_{i}\) refers to the i-th feature map in (\(k-1\))-th layer. \(x^{k}_{j}\) refers to the j-th feature map in current k-th layer, and N refers to the number of input channels. \(w^{k}_{ij}\) and \(b^{k}_{j}\) are trainable parameters in convolution operation.
The output value of convolution layer is subjected to a non-linear operation through the activation function. Typical activation functions include sigmoid, rectified linear unit (ReLU) and tanh. ReLU is adopted in our model for it’s excellent performance in training. ReLU works as follows:
Max and average pooling are most widely used downsampling method. Max pooling is thought to be robust to rotation and slight translation. Due to the fact that the spectral dimension of hyperspectral images does not have geometric distortion. Pooling layer adopts the average pooling in our model. To a certain extent, it can reduce the impact of noise. The method of average pooling is
where \(x^{k-1}_{i}\) refers to the output of convolution layer, and n refers to the size of pooling window. \(x^{k}_{i}\) refers to the output of pooling layer.
The result of CNN depends on the weight parameters of the network. So it is very vital to find proper weights. In our model, parameters are randomly initialized and trained by a back-propagation algorithm which is the core training algorithm for all sorts of neural networks [23]. Our model adopts mini-batch update strategy, which is efficient for large data set processing. The cost is computed on a mini-batch of inputs
where m is the size of mini-batch which means the number of samples per optimization in training. c refers to the total error of cost function for a mini-batch. \(y_{i}\) is the output of network for each sample. \(y_{i}^{label}\) refers to label of each sample. Since this is a multi-classification problem, we adopt one-hot encoding for the label.
To maximize accuracy, learning rate decay method is added to our training process. With the increase in the number of training epoch, the learning rate slowly declines. It is shown as follows:
where lrmax is the initial learning rate of the network. lrmin is the final learning rate of the network. i represents the current training epoch which means once complete training of all training data. P is a constant used to control the rate of decline on learning rate.
3.2 Contribution Map
In this section, we describe the procedure of generating contribution map by changing the last convolution layer in CNN. A contribution map indicates the discriminative spectral regions which are used by the CNN to determine the category. Figure 2 shows the procedure of generating these maps for a particular category.
The proposed method takes into account the ability of CNN to localize the discriminative image regions in the field of natural image processing [23]. We apply this idea to hyperspectral images classification and try to locate discriminative bands which play an important role in the classification results. Before the final output layer, our model performs global average pooling on the feature maps of the last convolution layer. Then we use the pooled vector to sort through a softmax function. In this way, the weight of each feature map which represents the contribution of the feature to the final classification can be obtained. Finally, we compute a weighted sum of the feature maps in the last convolution layer to get our contribution map.
The specific operation is as follows. First, we assume that the values of the elements in the k-th channel(feature map) in the last convolution layer are expressed as \(f_{k}(x)\). x refers to the spacial position at the feature map. Then, global average pooling is performed on the k-th channel. Let \(F_{k}=\sum \limits _{i=1}^{n}f_{k}(x)\), and \(S_{c}=\sum \limits _{j=1}^{k}w_{k}^{c}F_{k}\). Therefore, for a given category c, \(w_{k}^{c}\) is the weight corresponding to category c for channel k. Virtually, \(w_{k}^{c}\) reflects the importance of \(F_{k}\) for category c. In the end, the output of softmax is calculated by
A little trick is adopted. We replace \(F_{k}\) in \(S_{c}=\sum \limits _{j=1}^{k}w_{k}^{c}F_{k}\) with \(\sum \limits _{i=1}^{n}f_{k}(x)\). Then \(S_{c}\) can be described as follows:
We define \(CM_{c}\) as the contribution map for category c, and \(CM_{c}\) represents the contribution of each node to the classification in the feature map. It is defined as follows:
We name this changed network as CM-CNN which means a convolutional neural network that can generate contribution map. In the process of training we use \(F_{k}\) to participate in the classification with softmax function. Right after the training step, we calculate \(CM_{c}=\sum \limits _{j=1}^{k}w_{k}^{c}f_{k}(x)\) to get the contribution map. Every time a sample passes through the network we are able to get a contribution map.
3.3 Band Selection
Once the training of our CM-CNN is completed, each training sample through the network can get a contribution map correspondingly. We assume that it roughly reflects the contribution of each band to the final classification result. Our purpose is to prove that this assumption is credible. Figure 3 shows the contribution map.
For a particular category, we average the contribution map of all the training samples to get the final contribution map of that category. The number of bands selected for each category is determined based on the number of bands we need to select in a total. Two algorithms are proposed taking into account the efficiency and accuracy of band selection. The first method is to ignore the non-uniform distribution of samples. Assuming that the number of all the bands is N, the number of category is l, and M bands need to be selected. For each category, we choose the top M/l bands according to the element value in the contribution map which is ranked beforehand. Consider that the hyperspectral bands are highly correlated. If the former band selected and the latter band selected are adjacent bands, we abandon the latter band and select the next band sequentially until the number of bands meets the requirements. The second method takes into account the uneven distribution of the samples. Especially for the classification task, we recommend the second band selection method. The number of bands selected for a category with a large number of samples should be appropriately reduced so as to leaving more bands to those difficult to classify. Assume that the number of samples in different categories is \(H_{c}\), and a maximum and minimum normalization f(x) is applied to all categories of \(\log (H_{c})\). \(f(H_{c})\) is as follows:
and then the number of bands to be selected for each category is determined based on
The method of selecting a fixed number of bands on the contribution map is the same as the first method. For each category, we choose the top \(N_{c}\) bands according to the element value in the contribution map. If the former band selected and the latter band selected are adjacent bands, we abandon the latter band and select the next band sequentially until the number of bands meets the requirements.
4 Experiment
4.1 Visualization
The interpretability of neural network model has always been an important topic in natural image processing. Visualization can helps us to get some intuitive knowledge and improve the understanding of the model. So we try to visualize the contribution map to show whether the location distribution of features is regular. Figure 4 shows the contribution maps of a few samples which are randomly selected in the first category (Alfalfa) of Indian Pines’ training data. As we can see, for the same category of samples, the activation region of the contribution map is almost the same. Then, we summarize all the contribution maps for each category in the Indian Pines as shown in Fig. 5. It can be seen that different categories correspond to different activation region.
4.2 Indian Pines
Indian Pines data set was obtained by the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS). The image size is 145 \(\times \) 145 pixels with 220 spectral bands in the range of 0.4–2.5 \(\upmu \)m. The final bands adopted is reduced to 200 by abandoning water absorption bands. There are total 16 categories in Indian Pines data set.
For Indian Pines, Table 1 shows the architecture of the CM-CNN used for the experiment. In the figure, Ci means the i-th convolution layer. There are a total of 10,249 labeled samples in the Indian Pines, of which 7,168 are for training and 3,081 for testing. The training epoch is set to 300. With the increase in the number of training epoch, the learning rate slowly declines. The initial learning rate is set to 0.003 and the final learning rate decade to 0.0001. We have found that average pooling always performs better than max pooling, and smaller kernel size can significantly increase training speed but with lower accuracy. So, we choose 4 as the kernel size for both speed and accuracy. In order to preserve location information, stride is mostly set to 1 to keep output size the same as original input. Once the training of CM-CNN is over, we can get the contribution map to make reasonable band selection according to the method in Sect. 3.2.
We compared two supervised and two unsupervised band selection method with our CM-CNN. As is shown in Fig. 6, CM-CNN outperform all these methods when the number of selected bands is larger than 35. The selected bands are classified by a SVM classifier with RBF kernel to verify the validity of them. Wrapper based SVM (SVMwrapper) method and minimal Redundancy Maximal Relevance (mRMR) method are supervised methods [20]. CEM-BCM and ordinary-clustering-based band selection (OCBBS) are unsupervised methods [17]. When the number of selected bands is small, the performance of our method is not very ideal. We figure that it is because the location information of contribution map is not precise enough. Band correlation makes adjacent bands easily selected at the same time. However, experiment proves that CM-CNN is able to extract location information and has the best performance when the number of selected bands is larger than 35.
4.3 Pavia University
Pavia University data set was obtained by Reflective Optics Spectrographic Imaging System (ROSIS-03) airborne instrument. The image size is 640 \(\times \) 340 pixels with 115 spectral bands in the range of 0.43–0.86 \(\upmu \)m. The number of bands adopted is reduced to 103 by removing noise bands. The spatial resolution is 1.3 m per pixel. There are total 9 categories in Pavia University data set.
For Pavia University, Table 2 shows the architecture of the CM-CNN used for the experiment. Same as Indian Pines, \(C_{i}\) means the i-th convolution layer. There are a total of 42,776 labeled samples in the Indian Pines, of which 29,943 are for training and 12,833 for testing. The training epoch is set to 400. The initial learning rate is set to 0.008 and he final learning rate decade to 0.0005. The following process is the same as Indian Pines.
We compared two supervised and one unsupervised band selection method with our CM-CNN. As is shown in Fig. 7. The selected bands are classified by a SVM classifier with RBF kernel to verify the validity of them. SVMCV is a kind of wrapper based supervised method and minimal redundancy maximal relevance (mRMR) method are also supervised [20]. CEM-BCM is an unsupervised method has been widely compared. CM-CNN performs as good as the two supervised method. Results proved the effectiveness of our method to locate discriminative bands.
5 Conclusion
In this paper, a CNN-based supervised band selection model (CM-CNN) is proposed. The main task is to choose the most discriminative bands so as to effectively represent the original image cube. The bands selected by our method prove to perform well on HSI classification. The contribution map we get proves to be sensitive to discriminative bands. Approach in this paper is instructive for many hyperspectral tasks.
References
Bai, X., Guo, Z., Wang, Y., Zhang, Z., Zhou, J.: Semisupervised hyperspectral band selection via spectral-spatial hypergraph model. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 8(6), 2774–2783 (2015)
Cao, X., Xiong, T., Jiao, L.: Supervised band selection using local spatial information for hyperspectral image. IEEE Geosci. Remote Sens. Lett. 13(3), 329–333 (2016)
Chang, C.I., Wang, S.: Constrained band selection for hyperspectral imagery. IEEE Trans. Geosci. Remote Sens. 44(6), 1575–1585 (2006)
Chen, Y., Jiang, H., Li, C., Jia, X., Ghamisi, P.: Deep feature extraction and classification of hyperspectral images based on convolutional neural networks. IEEE Trans. Geosci. Remote Sens. 54(10), 6232–6251 (2016)
Damodaran, B.B., Courty, N., Lefévre, S.: Sparse Hilbert Schmidt independence criterion and surrogate-kernel-based feature selection for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 55(4), 2385–2398 (2017)
Falco, N., Bruzzone, L., Benediktsson, J.A.: An ICA based approach to hyperspectral image feature reduction. In: Geoscience and Remote Sensing Symposium, pp. 3470–3473 (2014)
Feng, S., Itoh, Y., Parente, M., Duarte, M.F.: Hyperspectral band selection from statistical wavelet models. IEEE Trans. Geosci. Remote Sens. 55(4), 2111–2123 (2017)
Ghamisi, P., Plaza, J., Chen, Y., Li, J., Plaza, A.J.: Advanced spectral classifiers for hyperspectral images: a review. IEEE Geosci. Remote Sens. Mag. 5(1), 8–32 (2017)
Imani, M., Ghassemian, H.: Boundary based discriminant analysis for feature extraction in classification of hyperspectral images. In: International Symposium on Telecommunications, pp. 424–429 (2015)
Kavitha, K., Nivedha, P., Arivazhagan, S., Palniladevi, P.: Wavelet transform based land cover classification of hyperspectral images. In: International Conference on Communication and Network Technologies, pp. 109–112 (2014)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: International Conference on Neural Information Processing Systems, pp. 1097–1105 (2012)
Li, Y., Xie, W., Li, H.: Hyperspectral image reconstruction by deep convolutional neural network for classification. Pattern Recognit. 63, 371–383 (2016)
Luo, Y., Zou, J., Yao, C., Li, T., Bai, G.: HSI-CNN: a novel convolution neural network for hyperspectral image (2018)
Patra, S., Modi, P., Bruzzone, L.: Hyperspectral band selection based on rough set. IEEE Trans. Geosci. Remote Sens. 53(10), 5495–5503 (2015)
Saeedan, F., Weber, N., Goesele, M., Roth, S.: Detail-preserving pooling in deep networks (2018)
Wang, Q., Yuan, Y., Yan, P., Li, X.: Saliency detection by multiple-instance learning. IEEE Trans. Cybern. 43(2), 660–672 (2013)
Yuan, Y., Lin, J., Wang, Q.: Dual-clustering-based hyperspectral band selection by contextual analysis. IEEE Trans. Geosci. Remote Sens. 54(3), 1431–1445 (2016)
Zabalza, J., et al.: Novel two-dimensional singular spectrum analysis for effective feature extraction and data classification in hyperspectral imaging. IEEE Trans. Geosci. Remote. Sens. 53(8), 4418–4433 (2015)
Zhan, Y., Hu, D., Xing, H., Yu, X.: Hyperspectral band selection based on deep convolutional neural network and distance density. IEEE Geosci. Remote Sens. Lett. 14(12), 2365–2369 (2017)
Zhang, H., Huang, B., Yu, L.: Kernel function in SVM-RFE based hyperspectral data band selection. Remote Sens. Technol. Appl. 28(5), 747–752 (2013)
Zhang, Q., Zhang, L., Yang, Y., Tian, Y., Weng, L.: Local patch discriminative metric learning for hyperspectral image feature extraction. IEEE Geosci. Remote Sens. Lett. 11(3), 612–616 (2013)
Zheng, X., Lu, X.: Discovering diverse subset for unsupervised hyperspectral band selection. IEEE Press (2017)
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: Computer Vision and Pattern Recognition, pp. 2921–2929 (2016)
Zhou, Z., Shin, J., Zhang, L., Gurudu, S., Gotway, M., Liang, J.: Fine-tuning convolutional neural networks for biomedical image analysis: actively and incrementally. In: Computer Vision and Pattern Recognition, pp. 4761–4772 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Cai, R., Yuan, Y., Lu, X. (2018). Hyperspectral Band Selection with Convolutional Neural Network. In: Lai, JH., et al. Pattern Recognition and Computer Vision. PRCV 2018. Lecture Notes in Computer Science(), vol 11259. Springer, Cham. https://doi.org/10.1007/978-3-030-03341-5_33
Download citation
DOI: https://doi.org/10.1007/978-3-030-03341-5_33
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-03340-8
Online ISBN: 978-3-030-03341-5
eBook Packages: Computer ScienceComputer Science (R0)