Image Splicing Compression Algorithm Based on the Extended Kalman Filter for Unmanned Aerial Vehicles Communication

Liang, Yanxia; Zhao, Meng; Liu, Xin; Jiang, Jing; Lu, Guangyue; Jia, Tong

doi:10.3390/drones7080488

Open AccessArticle

Image Splicing Compression Algorithm Based on the Extended Kalman Filter for Unmanned Aerial Vehicles Communication

by

Yanxia Liang

^1,*,

Meng Zhao

¹

,

Xin Liu

²,

Jing Jiang

¹,

Guangyue Lu

¹ and

Tong Jia

¹

Shaanxi Key Laboratory of Information Communication Network and Security, Xi’an University of Posts and Telecommunications, Xi’an 710121, China

²

School of Information Engineering, Xi’an Eurasia University, Xi’an 710065, China

^*

Author to whom correspondence should be addressed.

Drones 2023, 7(8), 488; https://doi.org/10.3390/drones7080488

Submission received: 5 July 2023 / Revised: 22 July 2023 / Accepted: 23 July 2023 / Published: 25 July 2023

(This article belongs to the Section Drone Communications)

Download

Browse Figures

Versions Notes

Abstract

:

The technology of unmanned aerial vehicles (UAVs) is extensively utilized across various domains due to its exceptional mobility, cost-effectiveness, rapid deployment, and flexible configuration. However, its limited storage space and the contradiction between the large amount of image data returned and the limited bandwidth in emergency scenarios necessitate the development of a suitable UAVs image compression technology. In this research, we propose an image splicing compression algorithm based on the extended Kalman filter for UAV communication (SEKF-UC). Firstly, we introduce the concept of image splicing to enhance the compression ratio while ensuring image quality, leveraging the characteristics of deep learning-based compression, which are relatively unaffected by the size of the compressed data. Secondly, to address the issue of slow processing speed of deep neural networks (DNN) caused by the substantial volume of data involved in image splicing, we incorporate an extended Kalman filter module to expedite the process. Simulation experiments show that our proposed algorithm outperforms existing methods in multiple ways, achieving a significant compression ratio improvement, from 2:1 to 25:1, with a marginal reduction of 6.5% in structural similarity (SSIM) compared to the non-spliced approach. Moreover, for deep neural networks (DNN), our method incorporating the extended Kalman filter module achieves the same error level with only 30 iterations—a significant reduction compared to the traditional BP algorithm’s requirement of over 4000 iterations—while improving the average network operation speed by an impressive 89.76%. Additionally, our algorithm excels in image quality, with peak signal-to-noise ratio (PSNR) improving by 92.7% and SSIM by 42.1%, at most, compared to other algorithms. These results establish our algorithm as a highly efficient and effective solution, suitable for various image processing and data compression applications.

Keywords:

image compression; neural network; the extended Kalman filter; splicing; deep learning

1. Introduction

From the release of the 5G white paper in 2014 to the official launch of the world’s first 5G heterogeneous network roaming trial in 2023, the communications industry has undergone a significant and rapid evolution [1]. 5G networks offer significant advantages, such as high bandwidth, low latency, and wide coverage, which have made people’s lives more convenient. However, these advancements have also resulted in an exponential growth of generated data, driving the rapid development of various technologies, such as autonomous driving [2], connected cars [3], etc. With the emergence of new industries and services such as telemedicine, the current 5G technologies are no longer sufficient to meet their communication performance requirements. This realization has led academia and industry to propose a vision for 6G networks. In this context, unmanned aerial vehicles (UAVs) have emerged as an indispensable technology to complement 6G mobile networks due to their numerous advantages [4].

Unmanned aerial vehicles (UAVs) are operated by essential components, including the actual UAV, a ground controller, and a communication platform that connects the two, and are equipped with specialized sensors and communication equipment featuring high mobility, flexibility, low cost, and large service coverage [5]. With the continuous development and innovation of science and technology, drones have found a wide range of applications, such as smart cities, post-disaster damage assessment, rescue missions, military operations, and communications assistance. UAV-based analysis provides a solution for timely communication in emergency scenarios, such as network reconstruction in major natural disasters, temporary communication in remote areas, etc. [6]. During sudden forest fires, mudslides, earthquakes, and other disasters, drones play a crucial role in capturing images or videos, which can be transmitted to ground stations promptly. These real-time visuals provide essential information for emergency decision-making. However, the transmission capacity of UAVs is constrained by limited bandwidth, resulting in real-time images or videos that may lack high-definition quality. As a result, researchers have focused on finding solutions to transmit higher-quality compressed data within these bandwidth limitations.

UAVs can be remotely operated to access areas that are challenging for humans to reach, such as rivers, glaciers, and large forests, allowing for the collection of diverse and rich data samples. Based on this feature, various compression methods have been proposed in academia. Deep learning has been widely used in data analysis, image processing, image detection, and other fields. In the field of automatic damage detection, in order to solve the lack of superior feature extraction ability of current damage detection models in complex environments, a high-performance damage detection model based on real-time deep learning, called DenseSPH-YOLOv5, was proposed in [7], and experiments proved that the detection rate of the model was 4.5 FPS, the average accuracy was 85.25%, and the F1 score was 81.18%, which is better than the current state-of-the-art model. In graph data representation and semi-supervised classification tasks, ref. [8] proposed a learning framework of a multi-graph learning neural network (MGLNN) to address the fact that existing GLNNs cannot be used for multi-graph data representation, which is experimentally verified to outperform the related methods on semi-supervised classification tasks. The latest image compression algorithm combined with a neural network has a better compression effect compared with the traditional compression and can achieve a higher compression ratio. Ref. [9] proposed a novel squirrel search algorithm (SSA) with the Linde–Buzo–Gray (LBG)-based image compression technique called SSA-LBG [10,11] for UAVs. The LBG model initializes the vector quantization (VQ) codebook, and the whole algorithm has a high peak signal-to-noise ratio. In order to reduce block artifacts in the compression process, ref. [12] proposed a two-step framework based on the inter-block correlation increment to divide the coded image into planar and edge regions, which is experimentally verified to be able to successfully suppress as camps and outperform existing methods in terms of visual quality. Ref. [13] proposed a hierarchical image compression framework based on deep semantic segmentation and experimentally verified its performance, which outperforms better portable graphics (BPG) and other encoders in both PSNR and MS-SSIM metrics in the RGB domain. Ref. [14] proposed a high-fidelity compression algorithm for UAV images under complex disaster conditions based on improved generative adversarial network [15], and the experimental results proved that the method has a higher compression ratio than the traditional image compression algorithm for disaster areas, under the premise of guaranteeing image quality. The authors in [16] proposed a neural network compression algorithm based on Ko-Honen Self-Organizing Feature Mapping (SOFM), which combines SOFM with artificial neural networks, and experimentally verified that better compression ratios and PSNR values can be obtained. In [17], the authors propose a unified end-to-end learning framework that utilizes deep-learning neural network (DNN) models and feature-compressed contexts with fewer model assumptions and significantly simplifies the training process. Ref. [18] proposed to train a deep convolutional neural network (CNN) capable of performing lossy image compression (LIC) at multiple bpp rates and proposed a tucker decomposition network (TDNet) that can be used to adjust the bpp of potential image representations within a single CNN, which was verified through extensive experiments to have a good performance under PSNR and MS-SSIM metrics. In [19], the autoregressive and hierarchical prior are combined as a compression model, and the two components produce complementary results in terms of performance, surpassing the BPG in terms of rate distortion performance. Ref. [20] proposed a JPEG2000-compatible CNN image compression method that utilizes a JPEG2000 encoder to compress the bitstream and two CNNs to recompress the bitstream and post-process it on the decoder side, respectively. It is validated on the Kodak dataset and the two CNN modules help to improve the compression efficiency significantly. Ref. [21] proposed a CNN-based quadratic transform with the aim of improving the coding efficiency of JPEG2000, and the proposed algorithm was experimentally shown to have an improvement over the conventional JPEG2000 at high code rates. In [22], the authors proposed an adaptive multi-resolution (AMID) image compression algorithm; AMID can be effectively used as an alternative to wavelet transform and can achieve high-quality image compression and high compression ratio.

Since its introduction in 1960, the Kalman filter has found extensive application in various fields, including autonomous driving, robotics, and more. The fundamental Kalman filter utilizes linear equations to model system states. However, in practical scenarios characterized by nonlinearity, the extended Kalman filter (EKF) is better suited to address such challenges. The EKF extends the capabilities of the Kalman filter by employing nonlinear equations, making it a valuable tool with a wide range of applications across diverse fields. A state-dependent extended Kalman filter was established in [23] and used on the optimal attitude estimation alignment model of the Jetlink inertial guidance system during carrier motion to reduce the influence of state-dependent noise on the estimation results and effectively improve the estimation accuracy. An onboard soft short circuit fault diagnosis method for electric vehicles based on the extended Kalman filter is proposed in [24], and the effectiveness of the method in rapid fault detection and robustness in accurate estimation are experimentally demonstrated. Two extended Kalman filters are connected in parallel in [25] for direct torque-sensor-free control of permanent magnet synchronous motors (PMSM) with better estimation accuracy. Ref. [26] combined deep neural networks with extended Kalman filtering to solve the state estimation problem of a bimanual tendon-driven aerial continuum operating system (ACMS), and the performance of the proposed method was demonstrated by simulation results. Ref. [27] improved the tracking progress in spatially informed millimeter wave beam tracking using an extended Kalman filtering algorithm for the beams at both ends of the model.

Building upon the broad utilization of deep learning in image compression and the extended Kalman filter’s ability to estimate nonlinear model states, this paper proposes a splicing image compression algorithm based on a neural network with an extended Kalman filter (SEKF-UC). The images are spliced before encoding, the spliced images are obtained by the standard of structural similarity (SSIM), the images are compressed uniformly after stitching, and the extended Kalman filter is used to greatly reduce the time it takes the neural network to adjust the parameters, ensuring the timeliness of the images returned by the UAV. Experimental verification has been conducted to validate the effectiveness of the proposed algorithm. The reconstructed spliced image is split and compared with the original image to assess the compression performance. The experimental results demonstrate that, within a predefined error range, the algorithm achieves a higher compression ratio compared to single compression methods. The main contributions of this paper are as follows:

We propose an algorithm (SEKF-UC) that combines image splicing with a neural network compression algorithm based on the extended Kalman filter. SEKF-UC aims to comprehensively address the quality and efficiency aspects of image compression in UAVs, with the ultimate objective of improving speed and ensuring high-quality results.
The images returned by the UAVs will show more repetitive information or more consistent pixel value distribution, ensuring the timeliness of subsequent processing and screening of images. We have considered the processing of the returned image dataset under the condition that the compression quality is guaranteed and the data set is classified before it is input into the compression algorithm; the same or similar image splicing method is proposed by analysis. The image compression ratio is improved with guaranteed quality.
When the input image dimension is large, training of the deep neural network is slower. An exponential increase in the amount of input data after image stitching will ensure the speed of training without compromising image quality. We introduced the extended Kalman filter when training the network, and the number of training network iterations decreased significantly.

The work of this paper is organized as follows. Section 2 describes the proposed algorithm SEKF-UC in detail. Section 3 verifies the effectiveness and feasibility of the algorithm based on a real UAV-captured dataset and compares its performance with other compression algorithms. Section 4 concludes this work and gives the next work schedule.

2. The Proposed Algorithm SEKF-UC

Figure 1 shows the block diagram of the framework of the proposed image compression algorithm SEKF-UC. The framework consists of four main parts: splicing, compression, decompression, and unsplicing of images. The images returned from the UAVs are first spliced for similarity and then fed into the neural network algorithm for uniform compression and decomposition. This not only improves the compression rate and compression ratio but also helps to subsequently filter the large amount of duplicate information that appears in the returned images. Especially, the colors of the block diagrams in the figures are only used to distinguish different images or layers in the neural network. Thick arrows indicate data flow and thin arrows indicate explanatory notes.

As shown in Figure 1, when a large number of UAV images need to be compressed and transmitted, the images to be compressed are firstly spliced according to SSIM, so that similar images are compressed together by DNN, which results in a lower compression rate while maintaining the quality of decompression [28]. The splice of images leads to a multiplied amount of input data into DNN. In order to ensure the compression efficiency, each layer of the DNN introduces an extended Kalman filter. These filters replace the usual back propagation (BP) algorithm to accelerate the training speed of DNN. Finally, the compressed data is decompressed and decomposed by inverse algorithms to obtain the restored images.

2.1. Image Splicing and De-Splicing

Research The image to be compressed can be represented by a two-dimensional matrix as

X = [x_{1} x_{2} \dots x_{N}]

, where the dimension of X is M × N. To further improve compression efficiency and save transmission bandwidth, before inputting the image data into the encoder, images with similar structure to the image to be compressed (e.g., large areas in the picture with similar colors) or with more repetitive information (e.g., the same area is shot continuously) are found to be spliced based on their structural similarity (SSIM). The number of splices and the shape of the splice can be set according to the actual situation and target requirements.

Structural similarity (SSIM) is a measure of the similarity of two images. The structural similarity of two images, p1 and p2, can be calculated by the following equation:

SSIM (p_{1}, p_{2}) = \frac{(2 μ_{p_{1}} μ_{p_{2}} + c_{1}) + (2 σ_{p_{1} p_{2}} + c_{2})}{(μ_{p_{1}}^{2} + μ_{p_{2}}^{2} + c_{1}) (σ_{p_{1}}^{2} + σ_{p_{2}}^{2} + c_{2})}

(1)

where

μ_{p_{1}}

is the average of p₁,

μ_{p_{2}}

is the average of p₂,

σ_{p_{1}}^{2}

is the variance of p₁,

σ_{p_{2}}^{2}

is the variance of p₂, and

σ_{p_{1} p_{2}}

is the covariance of p₁, p₂.

c_{1} = {(k_{1} L)}^{2}

,

c_{2} = {(k_{2} L)}^{2}

are the constants used to maintain stability. L is the dynamic range of the pixel values.

k_{1}

and k₂ are constants. SSIM takes values in the range of [0, 1], and the closer the magnitude of the value is to 1, the more similar the two graphs are proven to be. Set SSIM threshold to splice similar images and then compress them uniformly.

Take 4 pictures, for example. They can be spliced as 2 × 2 dimensions, and the size after splicing is 2M × 2N. They can also be spliced as 1 × 4 dimensions, and the size after splicing is M × 4N. Figure 2a represents two different splicing methods; when the four diagrams are spliced together, and the four diagrams have similar structural properties, the four diagrams in Figure 2b have a large amount of repeated information.

When the spliced image needs to be decomposed, segmentation is performed according to the original splicing method.

2.2. Deep Neural Network for Image Compression

2.2.1. Encoder

Figure 3 shows a detailed block diagram of the coding layer of the proposed algorithm.

First, extract the pixel matrix of the stitched image P as

P = [p_{1} p_{2} \dots p_{\tilde{N}}]

, input P into the input layer of the neural network; the input and hidden layers are connected by the weights W₁ (w₁, w₂) and the bias B₁ (b₁, b₂). At this point, the output of the first input layer is

C = [c_{1} c_{2} \dots c_{j}]

, where j <

\tilde{N}

, and as input to the second layer; the output of the second input layer is

D = [d_{1} d_{2} \dots d_{i}]

, where i < j. C and D are calculated by the following equation:

\begin{array}{l} C = sigmoid (\sum_{a = 1}^{\tilde{N}} (w_{1} p_{a} + b_{1})) \\ D = sigmoid (\sum_{a = 1}^{j} (w_{2} c_{a} + b_{2})) \end{array}

(2)

The output of the coding layer needs to be normalized by the activation function in the hidden layer. The commonly used activation functions are sigmoid, Tanh, ReLU, etc. In this paper, the sigmoid function is used. Sigmoid is a common S-type function, often used as a threshold function for neural networks; it normalizes the variables and maps them to between [0, 1]. The calculation formula is as follows:

f (x) = \frac{1}{1 + e^{- x}}

(3)

2.2.2. Decoder

Figure 4 shows a detailed block diagram of the decoder part of the proposed algorithm.

Input the output D of the encoding layer to the decoding layer for decompression; layers are connected to each other by the weights W₂ (w₂₁, w₂₂) and the bias B₂ (b₂₁, b₂₂). The output of the first decoding layer is

E = [e_{1} e_{2} \dots e_{j}]

, and the output of the second decoding layer is

Y = [y_{1} y_{2} \dots y_{\tilde{N}}]

; E and Y can be calculated from following equation:

\begin{array}{l} E = sigmoid (\sum_{a = 1}^{i} (w_{21} D_{a} + b_{21})) \\ Y = sigmoid (\sum_{a = 1}^{j} (w_{22} e_{a} + b_{22})) \end{array}

(4)

Y is the pixel matrix of the final decompressed output, and to measure the magnitude of its error with respect to the original input, the loss function mean square error (MSE) is introduced. The reconstruction error L is obtained by inputting the decompression matrix Y with the original input matrix P into the loss function, and L is obtained by the following equation:

L = \frac{1}{2} \frac{\sum_{a = 1}^{\tilde{N}} {(y_{a} - p_{a})}^{2}}{\tilde{N}}

(5)

2.3. Extended Kalman Filter Training Network

To make the error between the output decompressed image and the original image as small as possible, the neural network parameters generated by the initialization need to be retrained. At the same time, in order to improve the speed of parameter optimization, increase the compression rate, and ensure the timeliness of the compressed return image, the extended Kalman filter is introduced to replace the gradient descent method in back propagation for tuning the network parameters.

Deep neural networks can be described as continuous nonlinear combinations, and the extended Kalman filter can be used to estimate the parameters of the neural network as a nonlinear model, allowing the error to converge to the optimal value faster.

The weights and biases in the neural network can be expressed in terms of the state vector of the extended Kalman filter, as

W_{k} = {[{\{W_{l}, B_{l}\}}_{1 \leq l \leq 2}]}^{T}

(6)

The neural network can be formulated as a nonlinear discrete-time system and can be calculated by the following equation:

W_{k + 1} = f (W_{k})

(7)

Y_{k + 1} = sigmoid [W_{2} (sigmoid (W_{1} P) + B_{1}) + B_{2}]

(8)

2.3.1. Predicted Status

The covariance matrix of the prediction state and the prediction error of the extended Kalman filter is expressed by the following equation:

W_{k + 1} = W_{k} + α \frac{\partial L}{\partial W_{k}}

(9)

P_{k + 1} = P_{k} + Q_{k}

(10)

where L is the magnitude of the mean square error between the decompressed matrix of the deep neural network initialized by the weights and bias to the original input image matrix,

α

is the learning rate, generally set to 0.01,

Q_{k}

is the covariance matrix of the process noise error.

2.3.2. Update Status

The extended Kalman filter training parameters begin with determining the Kalman gain, which is expressed as follows:

K_{k} = P_{k} H_{k} {[R_{k} + H_{k}^{T} P_{k} H_{k}]}^{- 1}

(11)

where

R_{k}

, the measurement covariance, and

P_{k}

, the error covariance matrix of the state, are calculated by the following equation, where P is the pixel matrix of the image input to the neural network compression algorithm after stitching, and

Y_{k + 1}

is the decompressed pixel matrix.

R_{k + 1} = R_{k} + \frac{1}{k} [(P - Y_{k + 1}) {(P - Y_{k + 1})}^{T} - R_{k}]

(12)

P_{k} = E [(W_{k + 1} - W_{k}) {(W_{k + 1} - W_{k})}^{T}]

(13)

H_{k}

can be calculated by the following equation, which represents the decompressed pixel matrix biased against the neural network parameters.

H_{k} = \frac{\partial Y_{k + 1}}{\partial W_{k + 1}}

(14)

In summary, the updated formula for the predicted state and its error covariance matrix is as follows:

\begin{array}{l} {\hat{W}}_{k + 1} = W_{k + 1} + K_{k} [P - Y_{k + 1}] \\ {\hat{P}}_{k + 1} = P_{k + 1} - K_{k} H_{k}^{T} P_{k + 1} \end{array}

(15)

The network parameters are updated at each iteration, and the iteration can be stopped either by reaching a set number of iterations or by reaching a set error size.

2.4. Evaluation Parameters

Performance indicators: the compression performance is measured by the structural similarity (SSIM) and peak signal-to-noise ratio (PSNR), and the reconstructed images are split and compared with the original images, one by one.

Compression ratio is a further measurement metric used for compression measurements, and the compression ratio C_R is calculated as follows:

C_{R} = \frac{U n c o m p r e s s e d (o r i g i n a l) i m a g e s i z e}{C o m p r e s s e d i m a g e s i z e}

(16)

3. Experimental Results and Analysis

To verify the performance of the proposed algorithm, ablation experiments and comprehensive comparison experiments are conducted on the real UAV dataset, VisDrone. First, the dataset images are preprocessed, the size is reset to 256 × 512, and the dataset is divided into three scenarios, which are Building, Lane, and Street view. The following Figure 5 are examples of the three scene images:

3.1. Ablation Experiment

3.1.1. Splicing Module

Grayscale map experiment

Firstly, we perform grayscale processing on an image and compress the single image with a compression ratio of 2:1, then compare the average SSIM and PSNR between the reconstructed images and the original one. Secondly, in this experiment, the images selected according to SSIM standard are stitched into a matrix of two rows and two columns, size 512 × 1024. The maximum compression ratio that can be achieved by splicing compression is obtained experimentally under the condition of a maximum difference of −0.1 from the average SSIM value (i.e., relative error Re(SSIM) ≥ −0.1) at a single compression ratio of 2:1.

Relative error, Re(SSIM), is obtained by subtracting the average SSIM value of the spliced and compressed split image from the average SSIM value of the single compression. Re(PSNR) is similar to Re(SSIM), and is calculated by Equations (17) and (18).

R e (SSIM) = a v e . (M_{1}) - a v e . (M_{2})

(17)

R e (PSNR) = a v e . (R_{1}) - a v e . (R_{2})

(18)

where ave.(M₁) denotes the average SSIM value of the stitched compressed split image, and ave.(M₂) denotes the average SSIM value of the single compressed. ave.(R₁) denotes the average PSNR value of the stitched compressed split image, and ave.(R₂) denotes the average PSNR value of the single compressed.

For comparison of the average SSIM and average PSNR values for spliced image compression with different compression ratios in different scenes and single compression with a compression ratio of 2:1, the experiment seeks to obtain the maximum compression ratio that can be achieved by splicing compression at Re(SSIM) ≥ −0.1. Single 2:1 means the single compression ratio is 2:1. Splice 2:1 means the compression ratio of 2:1 for splicing compression. Re(SSIM) > 0, indicating that the spliced compression mass is better than that of a single sheet with a compression ratio of 2:1; the opposite is true when Re(SSIM) < 0. Under the condition of Re(SSIM) ≥ −0.1, the maximum compression ratio that can be achieved by splicing compression and the magnitude of the Re(SSIM) and Re(PSNR) values at that point are shown in the Table 1 below.

From the above table, it can be seen that the proposed splicing compression algorithm can achieve higher compression ratios in all three scenarios with the preset relative error of the Re(SSIM) ≥ −0.1. The compression ratio of 8:1 can be realized in all three scenes, and the average SSIM value difference of stitching compression over single compression is −0.032 in the Lane scene and −0.025 in the Building scene. The splicing compression in the Street view scene is better than the single compression at higher compression ratios.

2.: Color chart experiment

The following experiments are conducted on the color map to verify the feasibility of the algorithm. The experimental procedure is similar to the gray-scale graph experiment: extract and compress the R, G, and B channel pixel matrices of color images separately. The maximum compression ratio that can be achieved by splicing and compressing color images and the difference between the Re(SSIM) and Re(PSNR) values at this time are shown in the following Table 2.

From the above table, it can be seen that under the set threshold of Re(PSNR) ≥ −0.1, Street view scenarios can be further compressed—the compression ratio can be achieved up to 25:1 compared to 8:1 for grayscale images. First, setting the compression ratio to 32:1 in the street view gives an average SSIM value size of −0.16, which is larger than the set condition. Then, the compression ratio is dichotomized to 24:1, which does not exceed the set condition size, and the compression ratio size at 25:1 is −0.057— setting a larger compression ratio will exceed the set threshold. Therefore, the maximum achievable compression ratio in the street scene is 25:1.

The following Figure 6 shows the effect of the original, single compression, and splicing compression split under the three scenes; the first line is Street view, the second line is Lane, and the third line is Building.

From the above table and the above chart, it can be seen that, when compressing a single image with a compression ratio of 2:1, splicing compression can achieve a higher compression ratio at Re(PSNR) ≥ −0.1. The maximum compression ratio that can be achieved varies from scene to scene because the image characteristics are different, but the compression ratios will all be higher than the compression ratio of a single sheet. When the two parts of grayscale image compression and color image compression experiments are combined, splicing compression can achieve higher compression ratios than single compression while maintaining compression quality; it can further improve compression efficiency and save transmission bandwidth and storage space.

3.1.2. Extended Kalman Filter Module

The proposed algorithm uses the extended Kalman filter module instead of the back propagation module when training the neural network. Compared with back propagation, training the network with the extended Kalman filter will reach the optimal value faster, and the mean square error will be more stable.

The following figure is a comparison of the number of cycles when the extended Kalman filter and the BP algorithm train the network during network operation. The large image is the result of the BP algorithm, and the small image is the result of the extended Kalman filter.

From Figure 7, it can be seen that when we use the extended Kalman filter to train the network, a smaller error is obtained compared to the BP algorithm when the number of iterations is the same (X = 30). The number of iterations is around 30 by using the extended Kalman filter, while it is more than 4000 for the BP algorithm, when the two errors are basically the same (Y = 0.000986). The network incorporating the extended Kalman filter module runs 89.76% faster than that using the conventional BP algorithm. Therefore, training the neural network based on the extended Kalman filter is faster than the BP algorithm and does not affect the quality of the image compression.

3.2. Comprehensive Comparison Experiment

To verify the performance of the proposed algorithm, we compared it with other algorithms. The images selected for the experiment are from the VisDrone dataset. The SEKF-UC algorithm in the experiment chooses the splicing method of one row and two columns, and the size of the spliced image is 256 × 1024. The criteria for evaluating compression algorithms are SSIM and PSNR. Under the condition of a compression ratio of 32:1, the average SSIM results under different compression algorithms are shown in Table 3, and the average PSNR results are shown in Table 4.

Table 3 and Table 4 show the average SSIM values and average PSNR values of the proposed algorithm and the comparison algorithm in the test images at a compression ratio of 32:1. The average SSIM value of the proposed algorithm improved by 0.078 (9.03%) over [29], 0.041 (4.6%) over [30], and 0.276 (42.1%) over [31]. the average PSNR value improved by 6.26 dB (24.4%) over [29], 5.1 dB (19%) over [30], and 15.36 dB (92.7%) over [31]. From the data in the table, it can be seen that the proposed algorithms in all three scenarios outperform the comparison algorithms in both measures.

To verify the applicability of the algorithm, experiments with a compression ratio of 32:1 were performed on the Kodak dataset. We compared these results with other algorithms. The metric is the average PSNR value. In the comparison algorithm, the baseline model is the JPEG algorithm, and its benchmark result under the Kodak dataset, i.e., the average PSNR value, is 26.52 dB. The SEKF-UC algorithm chooses the splicing method of two rows and two columns, and the size of the spliced image is 512 × 1024. The experimental results are shown in Table 5.

As can be seen from Table 5, the proposed algorithm also shows good performance with the Kodak dataset at the same compression ratio. The highest average PSNR value among the compared algorithms is Algorithm [18] and the lowest is Algorithm JPEG. The proposed algorithm improves 1.37 dB (4.67%) over [18] and 4.15 dB (15.6%) over JPEG.

4. Conclusions

In this paper, we propose a compression algorithm (SEKF-UC) for images transmitted back from UAVs. We firstly splice the image dataset based on SSIM, which leads to a higher compression ratio. Then, the spliced image is fed to DNN for compression, which can guarantee the compression quality. Finally, extended Kalman filters replace a BP algorithm in each layer to train DNN, which can improve its iteration speed in the case of large amount of processing data. We perform ablation experiments and comprehensive comparison experiments. Experimental results demonstrate the following conclusions. Firstly, the splicing process can achieve higher compression ratios with guaranteed image compression quality. Secondly, the introduction of the extended Kalman filter module in the algorithm significantly enhances the compression rate. Thirdly, the number of iterations required for DNN to reach the steady state is reduced significantly. Hence, SEKF-UC successfully enhances the image compression ratio and speed while maintaining high quality.

UAV communication plays a crucial role in supporting 6G communication. However, spliced images are directly put into DNN for compression in this proposed algorithm. We can add a module after splice to distinguish the different and similar parts of the images. So, the different parts in the spliced images are compressed as a focus, and the similar parts are compressed in a one-time compression. Therefore, higher compression ratio can be obtained. We will make improvements according to the above thoughts and continue to improve the compression ratio based on the characteristics of the UAV images themselves, while ensuring the communication quality, in order to adapt to a narrower bandwidth or in exchange for a larger actual transmission rate.

Author Contributions

Conceptualization, Y.L. and M.Z.; methodology and data curation, M.Z.; writing—original draft preparation, M.Z. and T.J.; writing—review and editing, Y.L. and X.L.; supervision, Y.L., X.L., J.J. and G.L.; project administration, Y.L., X.L., J.J. and G.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by Shaanxi provincial special fund for Technological innovation guidance under Grant 2022CGBX-29, in part by Basic Scientific Research Program in the 14th Five-Year Plan under Grant JCKY2020203XXXX, in part by the National Natural Science Foundation of China under Grant 6187012068 and 61901367, and in part by Natural Science Foundation of Shaanxi Province under Grant 2020JQ-844.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

The authors are very grateful to the editors and anonymous reviewers for their critical comments and suggestions to improve the manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

Long, X.H.; Pan, Z.W.; Gao, X.Q.; Cao, S.M.; Wu, H.Q. The 5G mobile communication: The development trends and its emerging key techniques. Sci. China Inf. Sci. 2014, 44, 551–563. [Google Scholar]
Guan, Y.; Wang, Y.; Bian, Q.; Hu, X.; Wang, W.; Xu, D. High-Efficiency Self-Driven Circuit With Parallel Branch For High Frequency Converters. IEEE Trans. Power. Electron. 2018, 33, 926–931. [Google Scholar] [CrossRef]
Lyu, F.; Cheng, N.; Zhu, H.Z. Intelligent Context-Aware Communication Paradigm Design for IoVs Based on Data Analytics. IEEE Netw. 2018, 32, 74–82. [Google Scholar] [CrossRef]
Chen, X.Y.; Sheng, M.; Li, B.; Zhao, N. Survey on Unmanned Aerial Vehicle Communications for 6G. J. Electron. Inf. Technol. 2022, 44, 781–789. [Google Scholar]
Jin, Y.; Zhang, H.; Zhang, S.; Han, Z.; Song, L. Sense-Store-Send: Trajectory Optimization for a Buffer-Aided Internet of UAVs. IEEE Commun. Lett. 2020, 44, 2888–2892. [Google Scholar] [CrossRef]
Sun, Y.; Xu, D.; Ng, D.W.K.; Dai, L.; Schober, R. Optimal 3D-Trajectory Design and Resource Allocation for Solar-Powered UAV Communication Systems. IEEE Trans. Commun. 2019, 67, 4281–4298. [Google Scholar] [CrossRef] [Green Version]
Roy, A.M.; Bhaduri, J. DenseSPH-YOLOv5: An automated damage detection model based on DenseNet and Swin-Transformer prediction head-enabled YOLOv5 with attention mechanism. Adv. Eng. Inform. 2023, 56, 102007. [Google Scholar] [CrossRef]
Jiang, B.; Chen, S.; Wang, B.; Luo, B. MGLNN: Semi-supervised learning via Multiple Graph Cooperative Learning Neural Networks. Neural Netw. 2022, 153, 204–214. [Google Scholar] [CrossRef]
Minu, M.S.; Canessane, R.A. An Efficient Squirrel Search Algorithm based Vector Quantization for Image Compression in Unmanned Aerial Vehicles. In Proceedings of the 2021 International Conference on Artificial Intelligence and Smart Systems, Dublin, Ireland, 25–27 March 2021. [Google Scholar]
Linde, Y.; Buzo, A.; Gray, R.M. An algorithm for vector quantizer design. IEEE Trans. Commun. 1980, 28, 84–95. [Google Scholar] [CrossRef] [Green Version]
Jain, M.; Singh, V.; Rani, A. A novel nature-inspired algorithm for optimization: Squirrel search algorithm. Swarm Evol. Comput. 2019, 44, 148–175. [Google Scholar] [CrossRef]
Yoo, S.B.; Choi, K.; Ra, J.B. Post-Processing for Blocking Artifact Reduction Based on Inter-Block Correlation. IEEE Trans. Multimed. 2014, 16, 1536–1548. [Google Scholar] [CrossRef]
Akbari, M.; Liang, J.; Han, J. DSSLIC: Deep Semantic Segmentation-based Layered Image Compression. In Proceedings of the 2019 IEEE International Conference on Acoustics, Speech and Signal Processing, Brighton, UK, 12–17 May 2019. [Google Scholar]
Hu, Q.; Wu, C.; Wu, Y.; Xiong, N. UAV Image High Fidelity Compression Algorithm Based on Generative Adversarial Networks Under Complex Disaster Conditions. IEEE Access 2019, 7, 91980–91991. [Google Scholar] [CrossRef]
Chen, X.; Xu, C.; Yang, X.; Song, L.; Tao, D. Gated-GAN: Adversarial Gated Networks for Multi-Collection Style Transfer. IEEE Trans. Image Process. 2019, 28, 546–560. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Savant, Y.; Admuthe, L.S. Compression of grayscale images using KSOFM neural network. Int. J. Sci. Eng. Res. 2013, 4, 2229–5518. [Google Scholar]
Agustsson, E.; Mentzer, F.; Tschannen, M. Soft–to–hard vector quantization for end–to–end learning compressible representations. In Proceedings of the 31st Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
Cai, J.; Cao, Z.; Zhang, L. Learning a Single Tucker Decomposition Network for Lossy Image Compression With Multiple Bits-per-Pixel Rates. IEEE Trans. Image Process. 2020, 29, 3612–3625. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Minnen, D.; Balle, J.; Toderici, G. Joint autoregressive and hierarchical priors for learned image compression. In Proceedings of the 32st Conference on Neural Information Processing Systems, Montreal, QC, Canada, 2–8 December 2018. [Google Scholar]
Ma, H.; Liu, D.; Xiong, R.; Wu, F. A CNN-Based Image Compression Scheme Compatible with JPEG-2000. In Proceedings of the 2019 IEEE International Conference on Image Processing, Taiwan, China, 22–25 September 2019. [Google Scholar]
Li, X.; Naman, A.; Taubman, D. Machine-Learning Based Secondary Transform for Improved Image Compression in JPEG2000. In Proceedings of the 2021 IEEE International Conference on Image Processing, Anchorage, AK, USA, 19–22 September 2021. [Google Scholar]
Alkishriwo, O.A.S. Image compression using adaptive multiresolution image decomposition algorithm. IET Image Process. 2020, 14, 3572–3578. [Google Scholar] [CrossRef]
Pei, F.; Yang, S.; Yin, S. In-Motion Initial Alignment Using State-Dependent Extended Kalman Filter for Strapdown Inertial Navigation System. IEEE Trans. Instrum. Meas. 2021, 70, 1–12. [Google Scholar] [CrossRef]
Yang, R.; Xiong, R.; Shen, W. On-board diagnosis of soft short circuit fault in lithium-ion battery packs for electric vehicles using an extended Kalman filter. CSEE J. Power Energy Syst. 2022, 8, 258–270. [Google Scholar]
Zhang, H.-W.; Jiang, D.; Wang, X.-H.; Wang, M.-R. Direct Torque Sensorless Control of PMSM Based on Dual Extended Kalman Filter. In Proceedings of the 2021 33rd Chinese Control and Decision Conference, Kunming, China, 22–24 May 2021. [Google Scholar]
Ghorbani, S.; Janabi-Sharifi, F. Extended Kalman Filter State Estimation for Aerial Continuum Manipulation Systems. IEEE Sens. Lett. 2022, 6, 1–4. [Google Scholar] [CrossRef]
Chen, L.; Zhou, S.; Wang, W. MmWave Beam Tracking with Spatial Information Based on Extended Kalman Filter. IEEE Wirel. Commun. Lett. 2023, 12, 615–619. [Google Scholar] [CrossRef]
Liang, Y.X.; Zhao, M.; Liu, X.; Jiang, J.; Lu, G.Y.; Jia, T. An Adaptive Image Compression Algorithm Based on Joint Clustering Algorithm and Deep Learning. IET Image Process. 2023; under review. [Google Scholar]
Omar, H.M.; Morsli, M.; Yaichi, S. Image Compression using Principal Component Analysis. In Proceedings of the 2020 2nd International Conference on Mathematics and Information Technology, Algeria, North Africa, 18–19 February 2020. [Google Scholar]
Chen, S.X.; Zhang, Y.Q. Hyperspectral Image Compression Based on Adaptive Band Clustering Principal Component Analysis and Back Propagation Neural Network. J. Electron. Inf. Technol. 2018, 40, 2478–2483. [Google Scholar]
ARTUĞER, F.; ÖZKAYNAK, F. I Fractal Image Compression Method for Lossy Data Compression. In Proceedings of the 2018 International Conference on Artificial Intelligence and Data Processing, Malatya, Turkey, 28–30 September 2018. [Google Scholar]
Cheng, Z.; Sun, H.; Takeuchi, M.; Katto, J. Energy Compaction-Based Image Compression Using Convolutional AutoEncoder. IEEE Trans. Multimed. 2020, 22, 860–873. [Google Scholar] [CrossRef]

Figure 1. Block diagram of the proposed compression algorithm SEKF-UC.

Figure 2. (a) Two different splicing methods (2 × 2, 1 × 4) when stitching 4 structurally similar images; (b) Splicing of 4 images with a lot of similar information.

Figure 3. Detailed block diagram of the encoder part of the proposed algorithm.

Figure 4. Block diagram of the decoder part of the proposed algorithm.

Figure 5. Legend of the three scene images.

Figure 6. (a) Original image in three scenarios. (b) The image obtained by decompressing a single image with a compression ratio of 2:1. (c) After splicing and compressing the split image, the compression ratio is 8:1 in Lane scene and Building scene and 25:1 in Street view scene.

Figure 7. Comparison of the number of iterations when training the network with the extended Kalman filter and the BP algorithm.

Table 1. The maximum compression ratio that can be achieved by splicing compression.

	Lane	Street View	Building
Single compression	2:1	2:1	2:1
Splice compression	8:1	8:1	8:1
Re(SSIM)	−0.032 (−3.7%)	+0.035 (+4.3%)	−0.025 (−3.1%)
Re(PSNR)	−0.809	+2.644	−0.863

Table 2. The maximum compression ratio that can be achieved by splicing compression.

	Lane	Street View	Building
Single compression	2:1	2:1	2:1
Splice compression	8:1	25:1	8:1
Re(SSIM)	−0.050 (−5.6%)	−0.057 (−6.1%)	−0.049 (−4.7%)
Re(PSNR)	−0.867	−1.075	−0.903

Table 3. Different algorithms compress the SSIM values of different images.

Method	Street View 1	Street View 2	Building 1	Building 2	Lane 1	Lane 2	Average
Ours	0.955	0.941	0.953	0.955	0.883	0.907	0.932
Algorithm in [29]	0.93	0.878	0.904	0.902	0.772	0.797	0.864
Algorithm in [30]	0.937	0.901	0.929	0.917	0.823	0.84	0.891
Algorithm in [31]	0.778	0.446	0.618	0.666	0.682	0.748	0.656

Table 4. Different algorithms compress the PSNR values of different images.

Method	Average
Ours	31.93
Algorithm in [29]	25.67
Algorithm in [30]	26.83
Algorithm in [31]	16.57

Table 5. Average PSNR values of different compression algorithms with Kodak dataset.

Method	Average
JPEG	26.52
JPEG2000	28.89
Cheng’s [32]	29.00
Agustsson’s [17]	28.40
Cai’s [18]	29.36
Ma’s [20]	29.30
Ours	30.67

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liang, Y.; Zhao, M.; Liu, X.; Jiang, J.; Lu, G.; Jia, T. Image Splicing Compression Algorithm Based on the Extended Kalman Filter for Unmanned Aerial Vehicles Communication. Drones 2023, 7, 488. https://doi.org/10.3390/drones7080488

AMA Style

Liang Y, Zhao M, Liu X, Jiang J, Lu G, Jia T. Image Splicing Compression Algorithm Based on the Extended Kalman Filter for Unmanned Aerial Vehicles Communication. Drones. 2023; 7(8):488. https://doi.org/10.3390/drones7080488

Chicago/Turabian Style

Liang, Yanxia, Meng Zhao, Xin Liu, Jing Jiang, Guangyue Lu, and Tong Jia. 2023. "Image Splicing Compression Algorithm Based on the Extended Kalman Filter for Unmanned Aerial Vehicles Communication" Drones 7, no. 8: 488. https://doi.org/10.3390/drones7080488

Article Menu

Image Splicing Compression Algorithm Based on the Extended Kalman Filter for Unmanned Aerial Vehicles Communication

Abstract

1. Introduction

2. The Proposed Algorithm SEKF-UC

2.1. Image Splicing and De-Splicing

2.2. Deep Neural Network for Image Compression

2.2.1. Encoder

2.2.2. Decoder

2.3. Extended Kalman Filter Training Network

2.3.1. Predicted Status

2.3.2. Update Status

2.4. Evaluation Parameters

3. Experimental Results and Analysis

3.1. Ablation Experiment

3.1.1. Splicing Module

3.1.2. Extended Kalman Filter Module

3.2. Comprehensive Comparison Experiment

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI