1. Introduction
Hyperspectral remote sensing has drawn a lot of interest lately for a variety of Earth observation uses [
1,
2,
3,
4]. Mapping the physical, biological, or geographical dimensions of ecosystems is necessary to monitor the temporal and spatial patterns of Earth surface activities and comprehend how they work. Because each pixel contains a wealth of spectral information, hyperspectral imaging (HSI) has been applied extensively in a variety of real-world applications, including precision agriculture [
5], military object detection [
6], and land use land cover mapping [
7,
8]. Because it offers precise and detailed information about the physical and chemical properties of objects that are imaged, HSI has grown to be an essential tool in the industry. Notably, the detailed features produce effective classification results that are too intricate for conventional methods, i.e., a nonlinear correlation among the obtained spectrum data and the corresponding object, such as buildings [
3].
As opposed to standard panchromatic and multi-spectral imagery captured by satellites, HSI supplies hundreds of contiguous narrow spectral bands, offering an improved detailed and accurate technique for discerning Earth objects [
9]. HSI is especially useful for more refined classification because of its capacity to identify subtle spectral characteristics that standard imagery is unable to detect [
10]. The majority of techniques used in the early stages of HSI classification research concentrated on handcrafted extraction of features, such as extended morphological profiles (EMPs) [
11] and extended extinction profiles (EEPs). However, these conventional classification techniques are limited in their ability to retrieve high-level characteristics of images, and they are associated with “shallow” models. As a result, these techniques typically fall short of achieving greater accuracy. Recently, it has been established that deep learning (DL) is a powerful feature extractor that effectively recognizes the nonlinear problems that have emerged in a variety of computer vision tasks. This promotes the encouraging outcomes of using DL for HSI data classification [
12,
13,
14].
Convolutional neural networks (CNNs), because of their superior local contextual modeling capabilities, are widely used in spectral–spatial HSI data classification. While CNN-based methods are advantageous for spatial–contextual identification, they suffer greatly from handling spectral sequential data because long-range dependencies are often difficult for CNNs to capture correctly [
15]. While the existing CNN-based techniques have shown promising results [
16], they continue to encounter several difficulties. For instance, the receptive field is constrained, data are lost during the downsampling phase, and deep networks require a large amount of processing power [
17]. On the other hand, in the field of computer vision, vision transformers (ViTs) have demonstrated significant promise recently [
18,
19,
20,
21,
22]. By means of the incorporation of a multi-layer perceptron (MLP) and a multi-headed self-attention (MHSA) module, ViTs are capable of acquiring global long-range data interactions in the input sequential data. Because of this capability, the application of transformers to the classification of HSI data is expanding rapidly [
23,
24,
25,
26].
However, due to their quadratic computational complexity, transformers need a substantially higher amount of training data than CNNs and have a relatively high computational cost [
27]. ViTs and CNNs have been surpassed in image classification tasks by modern MLP algorithms, such as MLP-Mixer [
27] and ResMLP [
28], which have demonstrated an excellent classification capability. These modern MLP models require significantly less training data compared to CNNs and ViTs, achieving state-of-the-art classification accuracy [
29]. In addition, SpectralMamba [
30] has been proposed for hyperspectral image classification to further reduce computational complexity while effectively improving the classification performance. This work is notable as the first to introduce the Mamba framework into the hyperspectral remote sensing field.
In the past few months, Kolmogorov–Arnold Networks (KANs), which are inspired by the Kolmogorov–Arnold representation theorem, were proposed as viable alternatives for MLPs [
31]. KANs employ feature learnable activation functions on edges, or “weights”, in comparison with MLPs, which have fixed activation functions on nodes, or “neurons”. KANs do not use any linear weights at all; instead, a uni-variate function with spline parameterization serves as a substitute for each weight parameter. Thus, in this research, we assess and evaluate the capability and effectiveness of KAN models for complex HSI data classification over several other CNN- and vision-based models. The contributions of this paper can be summarized as follows:
We introduce a hybrid architecture based on KANs, a technique that achieves competitive or better HSI classification accuracy over several well-known CNN- and ViT-based algorithms.
We incorporate 1D, 2D, and 3D KAN modules to enhance the ability of linear KANs in image classification tasks. This hybrid architecture increases the discriminative capability of the KAN architecture.
We conduct extensive experiments on a brand new, complex HSI dataset called Qingdao UAV-borne HSI (QUH), including QUH-Tangdaowan, QUH-Qingyun, and QUH-Pingan [
32]. These experiments prove the effectiveness of the proposed KAN architecture.
The remainder of the paper is structured as follows. In
Section 2, we examine the structure and various modules developed in the proposed KAN-model-based architecture. Subsequently, we conduct comprehensive experiments, including a thorough discussion of the obtained HSI data classification results, as detailed in
Section 4. The paper concludes with a summary provided in
Section 5.
2. Proposed Methodology
Multilayer perceptrons (MLPs) are the foundation of many modern deep learning models. KANs were recently presented as an alternative to MLPs [
31]. KANs are motivated by the Kolmogorov–Arnold representation theorem [
33], whereas MLPs are inspired by the universal approximation theorem. Similar to MLPs, KANs have fully-connected structures. But MLPs employ fixed activation functions on nodes (referred to as “neurons”), while KANs place learnable activation functions on edges (referred to as “weights”). Instead of using linear weight matrices, KANs use a learnable 1D function parametrized as a spline for each weight parameter. Nodes in KANs do nothing more than add up incoming signals without using any non-linearities. The straightforward modification of KANs to use an activation function on the edges allows them to surpass MLPs in terms of accuracy as well as interpretability on small-scale machine learning challenges. In function-fitting tasks, smaller KANs can attain accuracy levels that are comparable to or higher than larger MLPs. KANs are known to have faster neural scaling laws than MLPs, both in theory and in practice [
31]. Splines can be easily adjusted locally, are precise for low-dimensional functions, and can transition between different resolutions. However, due to their limited ability to take advantage of compositional structures, splines suffer greatly from the curse of dimensionality (COD). In contrast, MLPs are less prone to COD because of their feature learning capabilities. However, in low dimensions, their accuracy is inferior to splines due to their incapacity to optimize univariate functions. It should be noted that KANs are just combinations of splines and MLPs, utilizing their respective advantages and avoiding their respective disadvantages, despite their sophisticated mathematical interpretation. In order to correctly learn a function, the model must be able to approximate the univariate functions (internal degrees of freedom) as well as learn the compositional structure (external degrees of freedom). Because of their internal similarity to splines and their external similarity to MLPs, KANs are able to optimize learned features with remarkable accuracy in addition to being able to learn new features.
Are KANs similar to MLP? An MLP can be expressed as stacking
N layers and each layer may be expressed as a linear combination of the weight matrix (
W) followed by non-linear operations (
) for the input
:
On the other hand, a general KAN model consists of nesting
N layers and the output map can be defined as follows:
where
represents the
i-th layer of the entire KAN models. Let
and
be the dimension of the input and output for each KAN layer, then
consists of
1D learnable activation function
:
The outcome of KAN models while computing from layer
n to layer
n + 1 may be shown in matrix form as follows:
It is evident that KANs treat non-linearities and linear transformations collectively in
, whereas MLPs treat them separately as
W and
. To ensure the representation power of
and
, as shown in
Figure 1, in the KAN models a basis function
(similar to that of residual connections) is included, such that the activation function
is the sum of the many spline function and the basis function
, as defined by:
where
, spline(x)=
, and
are trainable. For more details, refer to Liu et al. [
31].
Classical vs. KAN Convolution: KAN convolutions are perhaps similar to traditional convolutions operation, except that each element is given a learnable non-linear activation function, which is then added to the kernel and the associated pixels in the image patch, rather than the dot product between the two. The kernel of the KAN convolution is equivalent to a KAN linear Layer of 9 inputs and 1 output neuron (shown in
Figure 2). The output pixel of that convolution step is the sum of
for each input
i, to which we have applied a
learnable function. To visualize the difference between classical vs. KAN convolution, consider the input image patch
, the output
, the kernel
K, and
for the convolutional
and KAN kernel are defined in Equation (
7), respectively.
The output of the classical convolutional operation (*) can be obtained as follows:
In the case of KAN convolution, the inner function
may be represented as a matrix containing several activation functions as shown in Equation (
7). We also have an input matrix (
X) that will cycle through each activation function and has
characteristics. It should be noted that here,
denotes the activation function rather than the weights. These activation functions are called B-splines. Let’s add all of the functions, which are just basic polynomial curves and these curves are dependent upon the
X input. The output of the KAN convolutional operation (∘) can be obtained as follows:
Similarly, the above Equation (
9) can easily be extended for the input image
with
channels by applying a set of KAN kernels
, which produces the output
as follows:
HybridSN an Embedding by KAN Layer: We experimentally selected a KAN architecture similar to the hybrid spectral network [
34], as seen in
Figure 3. The hybrid spectral network was proposed in 2020 and is considered to be a successful architecture in hyperspectral feature extraction and classification. Considering an input hyperspectral image of
, where
H,
W, and
B indicate the height,, width, and number of spectral bands, respectively. We first utilized a principle component analysis (PCA) algorithm to reduce the number of input channels/bands in all HSI datasets to
D, expressed as follows:
To enhance the HSI classification accuracy obtained by the KAN models, we developed and proposed a hybrid KAN-network-based architecture consisting of three consecutive 3D KANs with 8, 16, and 32 output channels (feature maps), expressed as follows:
Then, one 2D KAN layer with an output channel (output map) of 64 is employed immediately after the third 3D KAN. The resulting feature maps are then flattened and sent to a 1D KAN layer with a hidden layer of 32 and output map/channel equivalent to the number classes in the HSI data, expressed as:
The architecture of the proposed KAN-based model layer-wised is presented in
Table 1.
5. Conclusions
This research proposed and discussed a KAN-model-based architecture for complex land use land cover mapping using HSI data, which employs 1D, 2D, and 3D KAN models. The classification results on three highly complex HSI datasets demonstrate that the developed classification model, HybridKAN, was competitive or better statistically and visually over several other CNN- and ViT-based algorithms, including 1D-CNN, 2D-CNN, 3D-CNN, VGG-16, ResNet-50, EfficientNet, RNN, and ViT. The obtained results underscored the significant potential use of KAN models in complex remote sensing tasks. The HSI data classification ability of the proposed hybrid KAN architecture compared with other CNN-and ViT-based classification models is shown over three HSI benchmark datasets: QUH-Pingan, QUH-Tangdaowan, and QUH-Qingyun. The results underscored the competitive or better capability of the developed hybrid model across these benchmark datasets compared with state-of-the-art classification architectures.