1. Introduction
From the release of the 5G white paper in 2014 to the official launch of the world’s first 5G heterogeneous network roaming trial in 2023, the communications industry has undergone a significant and rapid evolution [
1]. 5G networks offer significant advantages, such as high bandwidth, low latency, and wide coverage, which have made people’s lives more convenient. However, these advancements have also resulted in an exponential growth of generated data, driving the rapid development of various technologies, such as autonomous driving [
2], connected cars [
3], etc. With the emergence of new industries and services such as telemedicine, the current 5G technologies are no longer sufficient to meet their communication performance requirements. This realization has led academia and industry to propose a vision for 6G networks. In this context, unmanned aerial vehicles (UAVs) have emerged as an indispensable technology to complement 6G mobile networks due to their numerous advantages [
4].
Unmanned aerial vehicles (UAVs) are operated by essential components, including the actual UAV, a ground controller, and a communication platform that connects the two, and are equipped with specialized sensors and communication equipment featuring high mobility, flexibility, low cost, and large service coverage [
5]. With the continuous development and innovation of science and technology, drones have found a wide range of applications, such as smart cities, post-disaster damage assessment, rescue missions, military operations, and communications assistance. UAV-based analysis provides a solution for timely communication in emergency scenarios, such as network reconstruction in major natural disasters, temporary communication in remote areas, etc. [
6]. During sudden forest fires, mudslides, earthquakes, and other disasters, drones play a crucial role in capturing images or videos, which can be transmitted to ground stations promptly. These real-time visuals provide essential information for emergency decision-making. However, the transmission capacity of UAVs is constrained by limited bandwidth, resulting in real-time images or videos that may lack high-definition quality. As a result, researchers have focused on finding solutions to transmit higher-quality compressed data within these bandwidth limitations.
UAVs can be remotely operated to access areas that are challenging for humans to reach, such as rivers, glaciers, and large forests, allowing for the collection of diverse and rich data samples. Based on this feature, various compression methods have been proposed in academia. Deep learning has been widely used in data analysis, image processing, image detection, and other fields. In the field of automatic damage detection, in order to solve the lack of superior feature extraction ability of current damage detection models in complex environments, a high-performance damage detection model based on real-time deep learning, called DenseSPH-YOLOv5, was proposed in [
7], and experiments proved that the detection rate of the model was 4.5 FPS, the average accuracy was 85.25%, and the F1 score was 81.18%, which is better than the current state-of-the-art model. In graph data representation and semi-supervised classification tasks, ref. [
8] proposed a learning framework of a multi-graph learning neural network (MGLNN) to address the fact that existing GLNNs cannot be used for multi-graph data representation, which is experimentally verified to outperform the related methods on semi-supervised classification tasks. The latest image compression algorithm combined with a neural network has a better compression effect compared with the traditional compression and can achieve a higher compression ratio. Ref. [
9] proposed a novel squirrel search algorithm (SSA) with the Linde–Buzo–Gray (LBG)-based image compression technique called SSA-LBG [
10,
11] for UAVs. The LBG model initializes the vector quantization (VQ) codebook, and the whole algorithm has a high peak signal-to-noise ratio. In order to reduce block artifacts in the compression process, ref. [
12] proposed a two-step framework based on the inter-block correlation increment to divide the coded image into planar and edge regions, which is experimentally verified to be able to successfully suppress as camps and outperform existing methods in terms of visual quality. Ref. [
13] proposed a hierarchical image compression framework based on deep semantic segmentation and experimentally verified its performance, which outperforms better portable graphics (BPG) and other encoders in both PSNR and MS-SSIM metrics in the RGB domain. Ref. [
14] proposed a high-fidelity compression algorithm for UAV images under complex disaster conditions based on improved generative adversarial network [
15], and the experimental results proved that the method has a higher compression ratio than the traditional image compression algorithm for disaster areas, under the premise of guaranteeing image quality. The authors in [
16] proposed a neural network compression algorithm based on Ko-Honen Self-Organizing Feature Mapping (SOFM), which combines SOFM with artificial neural networks, and experimentally verified that better compression ratios and PSNR values can be obtained. In [
17], the authors propose a unified end-to-end learning framework that utilizes deep-learning neural network (DNN) models and feature-compressed contexts with fewer model assumptions and significantly simplifies the training process. Ref. [
18] proposed to train a deep convolutional neural network (CNN) capable of performing lossy image compression (LIC) at multiple bpp rates and proposed a tucker decomposition network (TDNet) that can be used to adjust the bpp of potential image representations within a single CNN, which was verified through extensive experiments to have a good performance under PSNR and MS-SSIM metrics. In [
19], the autoregressive and hierarchical prior are combined as a compression model, and the two components produce complementary results in terms of performance, surpassing the BPG in terms of rate distortion performance. Ref. [
20] proposed a JPEG2000-compatible CNN image compression method that utilizes a JPEG2000 encoder to compress the bitstream and two CNNs to recompress the bitstream and post-process it on the decoder side, respectively. It is validated on the Kodak dataset and the two CNN modules help to improve the compression efficiency significantly. Ref. [
21] proposed a CNN-based quadratic transform with the aim of improving the coding efficiency of JPEG2000, and the proposed algorithm was experimentally shown to have an improvement over the conventional JPEG2000 at high code rates. In [
22], the authors proposed an adaptive multi-resolution (AMID) image compression algorithm; AMID can be effectively used as an alternative to wavelet transform and can achieve high-quality image compression and high compression ratio.
Since its introduction in 1960, the Kalman filter has found extensive application in various fields, including autonomous driving, robotics, and more. The fundamental Kalman filter utilizes linear equations to model system states. However, in practical scenarios characterized by nonlinearity, the extended Kalman filter (EKF) is better suited to address such challenges. The EKF extends the capabilities of the Kalman filter by employing nonlinear equations, making it a valuable tool with a wide range of applications across diverse fields. A state-dependent extended Kalman filter was established in [
23] and used on the optimal attitude estimation alignment model of the Jetlink inertial guidance system during carrier motion to reduce the influence of state-dependent noise on the estimation results and effectively improve the estimation accuracy. An onboard soft short circuit fault diagnosis method for electric vehicles based on the extended Kalman filter is proposed in [
24], and the effectiveness of the method in rapid fault detection and robustness in accurate estimation are experimentally demonstrated. Two extended Kalman filters are connected in parallel in [
25] for direct torque-sensor-free control of permanent magnet synchronous motors (PMSM) with better estimation accuracy. Ref. [
26] combined deep neural networks with extended Kalman filtering to solve the state estimation problem of a bimanual tendon-driven aerial continuum operating system (ACMS), and the performance of the proposed method was demonstrated by simulation results. Ref. [
27] improved the tracking progress in spatially informed millimeter wave beam tracking using an extended Kalman filtering algorithm for the beams at both ends of the model.
Building upon the broad utilization of deep learning in image compression and the extended Kalman filter’s ability to estimate nonlinear model states, this paper proposes a splicing image compression algorithm based on a neural network with an extended Kalman filter (SEKF-UC). The images are spliced before encoding, the spliced images are obtained by the standard of structural similarity (SSIM), the images are compressed uniformly after stitching, and the extended Kalman filter is used to greatly reduce the time it takes the neural network to adjust the parameters, ensuring the timeliness of the images returned by the UAV. Experimental verification has been conducted to validate the effectiveness of the proposed algorithm. The reconstructed spliced image is split and compared with the original image to assess the compression performance. The experimental results demonstrate that, within a predefined error range, the algorithm achieves a higher compression ratio compared to single compression methods. The main contributions of this paper are as follows:
We propose an algorithm (SEKF-UC) that combines image splicing with a neural network compression algorithm based on the extended Kalman filter. SEKF-UC aims to comprehensively address the quality and efficiency aspects of image compression in UAVs, with the ultimate objective of improving speed and ensuring high-quality results.
The images returned by the UAVs will show more repetitive information or more consistent pixel value distribution, ensuring the timeliness of subsequent processing and screening of images. We have considered the processing of the returned image dataset under the condition that the compression quality is guaranteed and the data set is classified before it is input into the compression algorithm; the same or similar image splicing method is proposed by analysis. The image compression ratio is improved with guaranteed quality.
When the input image dimension is large, training of the deep neural network is slower. An exponential increase in the amount of input data after image stitching will ensure the speed of training without compromising image quality. We introduced the extended Kalman filter when training the network, and the number of training network iterations decreased significantly.
The work of this paper is organized as follows.
Section 2 describes the proposed algorithm SEKF-UC in detail.
Section 3 verifies the effectiveness and feasibility of the algorithm based on a real UAV-captured dataset and compares its performance with other compression algorithms.
Section 4 concludes this work and gives the next work schedule.
2. The Proposed Algorithm SEKF-UC
Figure 1 shows the block diagram of the framework of the proposed image compression algorithm SEKF-UC. The framework consists of four main parts: splicing, compression, decompression, and unsplicing of images. The images returned from the UAVs are first spliced for similarity and then fed into the neural network algorithm for uniform compression and decomposition. This not only improves the compression rate and compression ratio but also helps to subsequently filter the large amount of duplicate information that appears in the returned images. Especially, the colors of the block diagrams in the figures are only used to distinguish different images or layers in the neural network. Thick arrows indicate data flow and thin arrows indicate explanatory notes.
As shown in
Figure 1, when a large number of UAV images need to be compressed and transmitted, the images to be compressed are firstly spliced according to SSIM, so that similar images are compressed together by DNN, which results in a lower compression rate while maintaining the quality of decompression [
28]. The splice of images leads to a multiplied amount of input data into DNN. In order to ensure the compression efficiency, each layer of the DNN introduces an extended Kalman filter. These filters replace the usual back propagation (BP) algorithm to accelerate the training speed of DNN. Finally, the compressed data is decompressed and decomposed by inverse algorithms to obtain the restored images.
2.1. Image Splicing and De-Splicing
Research The image to be compressed can be represented by a two-dimensional matrix as , where the dimension of X is M × N. To further improve compression efficiency and save transmission bandwidth, before inputting the image data into the encoder, images with similar structure to the image to be compressed (e.g., large areas in the picture with similar colors) or with more repetitive information (e.g., the same area is shot continuously) are found to be spliced based on their structural similarity (SSIM). The number of splices and the shape of the splice can be set according to the actual situation and target requirements.
Structural similarity (SSIM) is a measure of the similarity of two images. The structural similarity of two images, p1 and p2, can be calculated by the following equation:
where
is the average of p
1,
is the average of p
2,
is the variance of p
1,
is the variance of p
2, and
is the covariance of p
1, p
2.
,
are the constants used to maintain stability.
L is the dynamic range of the pixel values.
and
k2 are constants. SSIM takes values in the range of [0, 1], and the closer the magnitude of the value is to 1, the more similar the two graphs are proven to be. Set SSIM threshold to splice similar images and then compress them uniformly.
Take 4 pictures, for example. They can be spliced as 2 × 2 dimensions, and the size after splicing is 2
M × 2
N. They can also be spliced as 1 × 4 dimensions, and the size after splicing is
M × 4
N.
Figure 2a represents two different splicing methods; when the four diagrams are spliced together, and the four diagrams have similar structural properties, the four diagrams in
Figure 2b have a large amount of repeated information.
When the spliced image needs to be decomposed, segmentation is performed according to the original splicing method.
2.2. Deep Neural Network for Image Compression
2.2.1. Encoder
Figure 3 shows a detailed block diagram of the coding layer of the proposed algorithm.
First, extract the pixel matrix of the stitched image
P as
, input
P into the input layer of the neural network; the input and hidden layers are connected by the weights
W1 (
w1,
w2) and the bias
B1 (
b1,
b2). At this point, the output of the first input layer is
, where j <
, and as input to the second layer; the output of the second input layer is
, where i < j.
C and
D are calculated by the following equation:
The output of the coding layer needs to be normalized by the activation function in the hidden layer. The commonly used activation functions are sigmoid, Tanh, ReLU, etc. In this paper, the sigmoid function is used. Sigmoid is a common S-type function, often used as a threshold function for neural networks; it normalizes the variables and maps them to between [0, 1]. The calculation formula is as follows:
2.2.2. Decoder
Figure 4 shows a detailed block diagram of the decoder part of the proposed algorithm.
Input the output
D of the encoding layer to the decoding layer for decompression; layers are connected to each other by the weights
W2 (
w21,
w22) and the bias
B2 (
b21,
b22). The output of the first decoding layer is
, and the output of the second decoding layer is
;
E and
Y can be calculated from following equation:
Y is the pixel matrix of the final decompressed output, and to measure the magnitude of its error with respect to the original input, the loss function mean square error (MSE) is introduced. The reconstruction error
L is obtained by inputting the decompression matrix
Y with the original input matrix
P into the loss function, and
L is obtained by the following equation:
2.3. Extended Kalman Filter Training Network
To make the error between the output decompressed image and the original image as small as possible, the neural network parameters generated by the initialization need to be retrained. At the same time, in order to improve the speed of parameter optimization, increase the compression rate, and ensure the timeliness of the compressed return image, the extended Kalman filter is introduced to replace the gradient descent method in back propagation for tuning the network parameters.
Deep neural networks can be described as continuous nonlinear combinations, and the extended Kalman filter can be used to estimate the parameters of the neural network as a nonlinear model, allowing the error to converge to the optimal value faster.
The weights and biases in the neural network can be expressed in terms of the state vector of the extended Kalman filter, as
The neural network can be formulated as a nonlinear discrete-time system and can be calculated by the following equation:
2.3.1. Predicted Status
The covariance matrix of the prediction state and the prediction error of the extended Kalman filter is expressed by the following equation:
where
L is the magnitude of the mean square error between the decompressed matrix of the deep neural network initialized by the weights and bias to the original input image matrix,
is the learning rate, generally set to 0.01,
is the covariance matrix of the process noise error.
2.3.2. Update Status
The extended Kalman filter training parameters begin with determining the Kalman gain, which is expressed as follows:
where
, the measurement covariance, and
, the error covariance matrix of the state, are calculated by the following equation, where
P is the pixel matrix of the image input to the neural network compression algorithm after stitching, and
is the decompressed pixel matrix.
can be calculated by the following equation, which represents the decompressed pixel matrix biased against the neural network parameters.
In summary, the updated formula for the predicted state and its error covariance matrix is as follows:
The network parameters are updated at each iteration, and the iteration can be stopped either by reaching a set number of iterations or by reaching a set error size.
2.4. Evaluation Parameters
Performance indicators: the compression performance is measured by the structural similarity (SSIM) and peak signal-to-noise ratio (PSNR), and the reconstructed images are split and compared with the original images, one by one.
Compression ratio is a further measurement metric used for compression measurements, and the compression ratio
CR is calculated as follows:
4. Conclusions
In this paper, we propose a compression algorithm (SEKF-UC) for images transmitted back from UAVs. We firstly splice the image dataset based on SSIM, which leads to a higher compression ratio. Then, the spliced image is fed to DNN for compression, which can guarantee the compression quality. Finally, extended Kalman filters replace a BP algorithm in each layer to train DNN, which can improve its iteration speed in the case of large amount of processing data. We perform ablation experiments and comprehensive comparison experiments. Experimental results demonstrate the following conclusions. Firstly, the splicing process can achieve higher compression ratios with guaranteed image compression quality. Secondly, the introduction of the extended Kalman filter module in the algorithm significantly enhances the compression rate. Thirdly, the number of iterations required for DNN to reach the steady state is reduced significantly. Hence, SEKF-UC successfully enhances the image compression ratio and speed while maintaining high quality.
UAV communication plays a crucial role in supporting 6G communication. However, spliced images are directly put into DNN for compression in this proposed algorithm. We can add a module after splice to distinguish the different and similar parts of the images. So, the different parts in the spliced images are compressed as a focus, and the similar parts are compressed in a one-time compression. Therefore, higher compression ratio can be obtained. We will make improvements according to the above thoughts and continue to improve the compression ratio based on the characteristics of the UAV images themselves, while ensuring the communication quality, in order to adapt to a narrower bandwidth or in exchange for a larger actual transmission rate.