1. Introduction
Hyperspectral remote sensing, which provides numerous information sources for monitoring human activities and systems of the Earth, has been widely used in various fields, such as crop analysis, mineral identification, geological research, and environmental mapping [
1,
2,
3]. However, because of the enormous spectral bands contained in hyperspectral images (HSIs), their analysis has become a challenging and computationally expensive task [
1,
4]. Therefore, it is necessary to discard redundant information in HSIs without losing their desirable features [
5]. As an effective tool to solve this problem, dimensionality reduction (DR) has become a critical step in HSI processing tasks [
6]. It can bring benefits by: (1) improving the statistical ill-conditioning problem by discarding redundant features; (2) reducing computational complexity and storage pressure; and (3) avoiding the Hughes phenomenon (that is, when a fixed training sample size is given, the classifier performance first improves as the dimensionality increases but then degrades when the dimensionality is higher than the optimal value) existing in an HSI classification task [
5,
6].
A variety of DR methods has been proposed over the years. Presently, they are separated into two main categories: feature extraction and band selection. Band selection methods are designed to select a subset of hyperspectral spectral features to remove spectral redundancy, while retaining the important information of the entire image [
7]. The hyperspectral band selection methods can be divided into six major classes: hybrid-scheme-based methods [
8,
9,
10], embedding learning-based methods [
7], searching-based methods [
11,
12], ranking-based methods [
13,
14,
15], clustering-based methods [
16,
17,
18], and sparsity-based methods [
19,
20,
21]. However, it is difficult to determine the optimal band number and comprehensively assess the performance of band selection [
6,
7]. Unlike band selection, feature extraction transforms the original data into the optimized space by mathematical manipulation [
22]. Many methods have been presented for feature extraction. Traditional feature extractions can be categorized as linear and nonlinear methods. Principal component analysis (PCA) [
23], minimum noise fraction (MNF) transformation [
24], linear discriminant analysis (LDA) [
25], non-negative matrix underapproximation (NMU) [
26], and non-negative matrix factorization (NMF) [
27] are commonly used linear feature extraction methods. The nonlinear methods can be categorized into kernel-based methods [
28,
29], manifold-learning-based methods [
30,
31,
32], and graph-theory-based methods [
33]. By imposing the nonnegative constraint, NMF retains the non-negativity of HSIs in the lower feature space. Introducing a recursive procedure to NMF, NMU has the advantage of identifying features sequentially. However, because they ignore the inherent geometric structure information of the original data, it is difficult for both NMF and NMU to produce desired results when the data are nonlinear [
34]. LDA is a supervised feature extraction method. It projects the initial data to the lower space where the distances between different class centers are maximized [
25]. In contrast, this supervised method requires prior knowledge of categories and is unsuitable for complex scenes. Utilizing information content as an assessment index, PCA sorts the components by descending order [
23]; however, its performance relies heavily on the noise characteristics of the original data. It cannot guarantee that the components are ordered following the image quality when the noise is distributed unevenly in each band [
24], whereas MNF solves this problem and produces new components arranged by image quality, no matter how the noise is distributed [
24].
Because of factors such as the interaction of ground objects within one pixel, atmospheric absorption, and scattering, HSIs have inherent nonlinear characteristics [
35]. It is difficult for the aforementioned methods to deal with the nonlinear features effectively and efficiently within the HSIs. Recently, deep learning architectures and kernel functions have achieved remarkable success in dealing with the nonlinear features. Deep learning methods employ a hierarchical framework to extract high-dimensional features [
36]. A convolutional neural network (CNN), which contains multiple hidden layers, can extract features without manual annotation of attributes [
37,
38]. Chen et al. [
39] extracted the deep spatial-spectral features of HSIs using CNN, and achieved high classification performance. Although CNN is excellent in feature extraction, the high-dimensional data of HSIs place a heavy load on the computational process. In terms of this issue, Feng et al. [
40] combined the multibranch CNN with attention mechanisms (AMs) to obtain features in an adaptive region search. Mou et al. [
41] employed a squeeze-and-excitation network (SENet) to suppress redundancy and strengthen the vital bands. Adopting a nonlocal neural network (NLNN), Xue et al. [
42] developed Gaussian-like distribution weights of feature maps to generate the second-order features of HSIs for classification. In addition, Li et al. [
43] exploited the manifold-based maximization margin discriminant network (M
3DNet) to improve the feature extraction ability of deep learning models. In recent years, various feature extraction methods were developed based on different deep learning frameworks. However, finding the favorable number and size of hidden units for specific problems is a major problem with deep learning frameworks [
36]. The kernel functions ensure data linear separability by transforming the original data to the higher feature space, following which a linear analysis can be performed in this space [
44]. As a kernel-based method, kernel MNF (KMNF) [
28] transformation adopts kernel functions [
45] to make up for the weakness of MNF in modelling the nonlinear features within the HSIs. While KMNF is a valuable feature extraction method for HSIs, KMNF transformation presents the problem of high computational complexity and low execution efficiency. It is not suitable for the processing of large-scale datasets. In terms of this problem, this paper proposes a novel method for fast KMNF transformation (GNKMNF) based on the Nyström method and graphics processing unit (GPU) parallel computing. The contributions of this paper can be summarized as follows:
(1) In this paper, the performance of different feature extraction methods (including PCA, MNF, kernel PCA (KPCA), factor analysis (FA), LDA, local preserving projections (LPP), and KMNF) are evaluated in the Indian Pines, Salinas, and Xiong’an datasets. The experimental results show the generalization and effectiveness of KMNF transformation in feature extraction, which provides a reference for further research in feature extraction.
(2) In terms of high computational complexity and low execution efficiency of KMNF transformation, the Nyström method is introduced to estimate the eigenvector of the entire kernel matrix by the decomposition and extrapolation of the sub-kernel matrix. The experimental results demonstrate that the Nyström method-based KMNF (NKMNF) transformation has lower computational complexity and achieves satisfactory results in classification. The proposed framework can be developed as a general model for the real-time implementation of other algorithms.
(3) The sample size of the sub-kernel matrix in NKMNF transformation is an essential factor that affects the result. This paper determines the sample size of the sub-kernel matrix by utilizing a proportional gradient selection strategy. Experimental results suggest that when the proportion of sub-kernel matrix in the entire kernel matrix is 20%, the improvement of the NKMNF transformation in overall classification accuracy and Kappa is up to 1.94% and 2.04% compared with the KMNF transformation. Moreover, with a data size of 64 × 64 × 250, the execution efficiency of the NKMNF transformation speeds up by about 8x.
(4) In this paper, GPU parallel computing is employed to improve the execution efficiency of NKMNF transformation further. Experimental results show that with a data size of 64 × 64 × 250, the execution efficiency of GNKMNF transformation speeds up by about 10x and 80x compared with NKMNF transformation and KMNF transformation, respectively.
The remainder of this paper is organized as follows.
Section 2 introduces the procedures of the Nyström method and GNKMNF transformation. The experiments and results of the proposed method are shown in
Section 3.
Section 4 describes the analysis of the experimental results, and conclusions are provided in
Section 5.
3. Results
The experiments and results of the proposed method are shown in this section. Four experiments were designed to assess the performance of the GNKMNF transformation. Three real HSIs (including the Indian Pines, Salinas, and Xiong’an datasets) with different spatial and spectral resolutions over different scenes used in the experiments are introduced in
Section 3.1. The first experiment was designed to evaluate the performance of seven feature extraction methods (including PCA, MNF, KPCA, FA, LDA, LPP, and KMNF) in classification. Taking overall accuracy and Kappa as the assessment criteria, the classification performance of each feature extraction method in terms of the support vector machine (SVM) classifier with a radial basis function (RBF) kernel was evaluated. In this experiment, 25% of samples were randomly chosen for training, the remaining 75% were applied to testing, and ten-fold cross-validation was employed to find the best parameters in SVM. The results are described in
Section 3.2. In order to visually show the computational complexity of each feature extraction method, the second experiment tests the runtimes of each method with a data size of 100 × 100 × 250. The test results are shown in
Section 3.3. The Nyström method estimates the eigenvector of the entire kernel matrix by the decomposition and extrapolation of the sub-kernel matrix. The sample size of the sub-kernel matrix is an essential factor that affects the result. To determine the sample size of the sub-kernel matrix, the proportional gradient selection strategy was employed in the third experiment. In this experiment, the best sample size selection was determined by using 10% as the descending gradient. The results are shown in
Section 3.4.1. The last experiment was designed to evaluate the execution efficiency of NKMNF and GNKMNF. By increasing the data volume in processing, the computational costs of KMNF, NKMNF, and GNKMNF were tested. Moreover, using a data size of 64 × 64 × 250, the execution efficiency of GNKMNF was analyzed in detail. The results are given in
Section 3.4.2. To ensure the reliability of the experimental results, each experiment was conducted five times and the average value is reported for comparison.
3.1. Input Data
The three actual HSIs used in the experiments are introduced in this section. The Indian Pines, Salinas, and Xiong’an datasets are described in
Section 3.1.1,
Section 3.1.2, and
Section 3.1.3, respectively.
3.1.1. Indian Pines Dataset
The Indian Pines hyperspectral data were collected by the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) in Indiana, USA. This dataset contains 220 bands and 145 × 145 pixels. The spatial resolution was 20 m, and the spectral range was from 400 nm to 2500 nm. Because of the atmospheric vapor absorption and noise, after excluding the 104th–180th, 150th–163rd, and 220th bands, 200 bands were used in the experiments. The original image and the ground reference map of the Indian Pines dataset are shown in
Figure 1.
3.1.2. Salinas Dataset
The Salinas hyperspectral data were acquired by AVIRIS in California, USA. The spatial resolution was 3.7 m, and this dataset consists of 224 bands and 512 × 217 pixels. After removing the 108th–112th, 154–167th, and 224th bands because of the atmospheric vapor absorption and noise, 204 bands were used in experiments. The original image and the ground reference map of the Salinas dataset are shown in
Figure 2.
3.1.3. Xiong’an Dataset
The Xiong’an hyperspectral data were obtained by Airborne Multi-Modular Imaging Spectrometer (AMMIS) in New District, Hebei Province, China. This dataset consists of 512 × 512 pixels and 250 spectral bands. The spatial resolution was 0.5 m, and the spectral band ranged from 400 nm to 1000 nm [
50,
51,
52,
53]. All 250 bands were used in the experiments. The original image and the ground reference map of the Xiong’an dataset are shown in
Figure 3.
3.2. Experiments on Feature Extraction Methods
The number of training and testing samples from the Indian Pines, Salinas, and Xiong’an datasets is listed in
Table 2 and
Table 3. The overall accuracies and confidence intervals for each feature extraction method in terms of SVM classifier are shown in
Table 4. The Kappa and confidence intervals for each feature extraction method in terms of the SVM classifier are shown in
Table 5. To visualize the classification results, the results of the SVM classifier after applying different methods on the Indian Pines, Xiong’an, and Salinas datasets are depicted in
Figure 4,
Figure 5, and
Figure 6, respectively.
In this experiment, the SVM classifier with RBF kernel was utilized as the classification method, and the overall accuracies and confidence intervals were used to evaluate the results. Experiments using three HSIs with different spatial and spectral resolutions over different scenes were conducted. The results suggest that compared with other methods, the improvements of KMNF transformation in overall accuracies are 2.00%, 1.52%, and 1.06% in the Indian Pines, Salinas, and Xiong’an datasets, respectively, and the improvements of KMNF transformation in Kappa are 2.17%, 1.65%, and 1.08% in the Indian Pines, Salinas, and Xiong’an datasets, respectively. The results demonstrate the excellent performance of KMNF transformation in classification.
3.3. Experiments on Runtimes Testing of Each Method
The runtimes of each feature extraction method with a data size of 100 × 100 × 250 are shown in
Table 6.
In this experiment, the runtimes of each feature extraction method were tested with a data size of 100 × 100 × 250 to evaluate the computational complexity. The results illustrate that the KMNF transformation requires more processing time with the same data size, representing that KMNF transformation has higher computational complexity than other feature extraction methods.
3.4. Experiments on GNKMNF Transformation
Two experimental results are described in this section.
Section 3.4.1 reports how the experiment was conducted to analyze the sample size selection in NKMNF.
Section 3.4.2 reports how the execution efficiency of KMNF transformation, NKMNF transformation, and GNKMNF transformation were evaluated by increasing the data size. In addition, the execution efficiency of GNKMNF is analyzed in detail with a data size of 64 × 64 × 250.
3.4.1. Experiments on Sample Size Selection
The overall accuracies and confidence intervals of the SVM classifier after NKMNF with different proportion sample sizes in the Indian Pines, Xiong’an, and Salinas datasets are shown in
Figure 7,
Figure 8, and
Figure 9, respectively. The Kappa and confidence intervals of the SVM classifier after NKMNF with different proportion sample sizes in the Indian Pines, Xiong’an, and Salinas datasets are shown in
Figure 10,
Figure 11, and
Figure 12, respectively. In these figures, 100% represents using all pixels, which is the KMNF transformation. To visualize the classification results, the results of SVM classification of the Indian Pines, Xiong’an, and Salinas datasets after NKMNF with different proportion sample sizes are depicted in
Figure 13,
Figure 14, and
Figure 15, respectively.
In this experiment, the overall accuracies, Kappa, and their confidence intervals of the SVM classifier after NKMNF with different proportion sample sizes in the Indian Pines, Xiong’an, and Salinas datasets were evaluated. The results show that NKMNF transformation outperforms KMNF transformation in most cases. Comprehensively considering the performance of NKMNF transformation and the computational complexity of this algorithm, the NKMNF transformation demonstrates the best performance when the proportion of sub-kernel matrix in the entire kernel matrix is 20%.
3.4.2. Experiments on GNKMNF Transformation
As seen in
Section 3.4.1, the NKMNF transformation demonstrates the best performance when the proportion of sub-kernel matrix in the entire kernel matrix is 20%. In this experiment, the proportion 20% was used for testing in NKMNF transformation. A comparison, in terms of computational cost, of the KMNF, NKMNF, and GNKMNF transformations using different data sizes is shown in
Figure 16. The detailed analyses of the execution efficiency of GNKMNF with a data size of 64 × 64 × 250 is shown in
Table 7.
In this section, the computational costs of KMNF, NKMNF, and GNKMNF transformation with different data volumes were assessed. The results demonstrate that compared with KMNF and NKMNF, the GNKMNF leads to a significant improvement in execution efficiency. In addition, the execution efficiency of GNKMNF with a data size of 64 × 64 × 250 was analyzed in detail and provides a reference for the further research in parallel computing.
4. Discussion
In this section, the four experimental results stated in
Section 3 are discussed.
The first experiment was designed to evaluate the performance of each feature extraction method (including PCA, MNF, KPCA, FA, LDA, LPP, and KMNF) in classification. This experiment was conducted on three real HSIs with different spatial and spectral resolutions over different scenes. Taking overall accuracy as the evaluation criterion, the classification performance of each feature extraction method in terms of the SVM classifier with RBF kernel was assessed. The results show that: (1) compared with other feature extraction methods, KMNF has excellent classification performance in terms of overall accuracy. The improvements of KMNF transformation in overall accuracy on the Indian Pines, Salinas, and Xiong’an datasets are 2.00%, 1.52%, and 1.06%, respectively, and the improvements of KMNF transformation in Kappa are 2.17%, 1.65%, and 1.08% in the Indian Pines, Salinas, and Xiong’an datasets, respectively; (2) in most cases, the more features extracted, the higher the overall classification accuracy; (3) MNF and FA can be considered for dimension reduction in practice. The experimental results suggest KMNF outperforms PCA, KPCA, LDA, and LPP and is relatively equivalent to MNF and FA.
The second experiment tested the computational costs of each feature extraction method with a data size of 100 × 100 × 250. This experiment was designed to show the computational complexity of each method intuitively. The results suggest that: (1) with the same data size, the KMNF transformation requires more processing time, which indicates that KMNF has higher computational complexity compared with other algorithms; (2) compared with LPP, KPCA, and KMNF, PCA, MNF, FA, and LDA have lower computational complexity and are more applicable to the processing of large-scale datasets.
The Nyström method estimates the eigenvector of the entire kernel matrix by the decomposition and extrapolation of the sub-kernel matrix. The sample size of the sub-kernel matrix is an essential factor that affects the result of NKMNF. The third experiment was conducted to determine the optimal sample size. The results show that: (1) comprehensively considering the performance of NKMNF and the execution speed of this algorithm, the NKMNF transformation has the best performance when the proportion of sub-kernel matrix in the entire kernel matrix is 20%; (2) in most cases, NKMNF transformation outperforms KMNF in Kappa and overall classification accuracy; (3) with a proportional gradient to select sample sizes in NKMNF transformation, the differentials of NMKNF in overall classification accuracy are 1.71%, 0.98%, and 0.99% in the Indian Pines, Salina, and Xiong’an datasets, respectively, and the differentials of NMKNF in Kappa are 5.53%, 1.50%, and 1.09% in the Indian Pines, Salina, and Xiong’an datasets, respectively.
The last experiment was designed to evaluate the execution efficiency of KMNF, NKMNF, and GNKMNF. By increasing the data volume in processing, the computational costs of KMNF, NKMNF, and GNKMNF were tested. The results suggest that: (1) the larger the data size, the more significant the acceleration effect; (2) when the data size is 64 × 64 × 250, the execution efficiency of NKMNF speeds up by about 8x compared with the KMNF transformation; (3) when the data volume was 64 × 64 × 250, the execution efficiency of GNKMNF speeds up by about 10× and 80× compared with NKMNF transformation and KMNF transformation, respectively.
From the above analysis, it can be seen that compared with other feature extraction methods, KMNF has excellent classification performance in terms of overall accuracy. Although KMNF is a valuable feature extraction method for HSIs, it is found that the KMNF transformation presents the problem of high computational complexity and low execution efficiency. It is not applicable to the processing of large-scale datasets. The Nyström method is an efficient method to solve this problem, and the NKMNF is presented in this paper. Comprehensively considering the performance of NKMNF and the execution speed of the algorithm, the NKMNF transformation demonstrates the best performance when the proportion of sub-kernel matrix in the entire kernel matrix is 20%. Compared with the KMNF transformation, when the data size is 64 × 64 × 250, the execution efficiency of NKMNF and GNKMNF speed up by about 8× and 80×, respectively. The outcome demonstrates the significant performance of GNKMNF in feature extraction and execution efficiency.
5. Conclusions
The KMNF transformation presents the problem of high computational complexity and low execution efficiency. It is not suitable for the processing of large-scale datasets. Therefore, it is valuable to develop a real-time implementation for KMNF transformation. Considering the feature extraction performance and the execution speed, a GPU and Nyström method-based method is presented in this paper. The experimental results demonstrate the significant performance of GNKMNF in execution efficiency and classification. The results can be generalized as follows:
(1) In this paper, the performance of different feature extraction methods (including PCA, MNF, KPCA, FA, LDA, LPP, and KMNF) are evaluated in terms of the Indian Pine, Salinas, and Xiong’an datasets. The experimental results show the generalization and effectiveness of KMNF transformation in feature extraction, which provides a reference for further research in feature extraction.
(2) In terms of high computational complexity and low execution efficiency in KMNF transformation, the Nyström method is employed in this paper. The experimental results demonstrate that the NKMNF transformation has lower computational complexity and achieves satisfactory results in classification. The proposed framework can be developed as a general model for the real-time implementation of other algorithms.
(3) Comprehensively considering the performance of NKMNF and the execution speed of this algorithm, the NKMNF transformation demonstrates the best performance when the proportion of sub-kernel matrix in the entire kernel matrix is 20%.
(4) GPU parallel computing is employed to improve the execution efficiency of NKMNF further. Experimental results show that with a data size of 64 × 64 × 250, the GNKMNF speeds up by about 80× compared with KMNF.
In summary, the results show that the GNKMNF demonstrates a significant performance in classification and execution efficiency. The realization of this method can provide a reference for fast algorithm design of other feature extraction methods. In recent years, deep learning architectures have become an exciting research topic in feature extraction. In the future, we are interested in exploring a lightweight deep learning framework to extract the nonlinear feature structures in HSIs.