A Heterogeneous Dynamic Convolutional Neural Network for Image Super-resolution
Abstract
Convolutional neural networks can automatically learn features via deep network architectures and given input samples. However, robustness of obtained models may have challenges in varying scenes. Bigger differences of a network architecture are beneficial to extract more complementary structural information to enhance robustness of an obtained super-resolution model. In this paper, we present a heterogeneous dynamic convolutional network in image super-resolution (HDSRNet). To capture more information, HDSRNet is implemented by a heterogeneous parallel network. The upper network can facilitate more contexture information via stacked heterogeneous blocks to improve effects of image super-resolution. Each heterogeneous block is composed of a combination of a dilated, dynamic, common convolutional layers, ReLU and residual learning operation. It can not only adaptively adjust parameters, according to different inputs, but also prevent long-term dependency problem. The lower network utilizes a symmetric architecture to enhance relations of different layers to mine more structural information, which is complementary with a upper network for image super-resolution. The relevant experimental results show that the proposed HDSRNet is effective to deal with image resolving. The code of HDSRNet can be obtained at https://github.com/hellloxiaotian/HDSRNet.
Index Terms:
Dynamic convolutions, dilated convolutions, heterogeneous networks, image super-resolution.I Introduction
Single image super-resolution (SISR) techniques can obtain high-quality images from given low-resolution images (LR), according to solution of ill-posed inverse problem [58, 59]. In recent years, machine learning techniques have obtained huge achievements in many applications, i.e., disease diagnosis [64], classification [70], object recognition [65], multimodal data fusion [66, 68], Metaverse [67, 69] and SISR [2]. Specifically, deep learning techniques with end-to-end architectures have obtained higher performance in image super-resolution. Deep convolutional neural networks (CNNs) use end-to-end architectures rather than manual setting parameters to obtain strong learning abilities to improve visual effects of image super-resolution (SR) [55]. For instance, Dong et al. designed 3-layer CNN in pixel mapping manner to convert a low-resolution (LR) image to a high-resolution (HR) image [3]. Although it can improve resolution of predicted images, it cannot make a tradeoff between network depth and performance in image super-resolution. To address this issue, scholars try to enlarge network depth to pursue improved performance in image super-resolution [4]. Stacking some small filters is used to achieve a very deep SR network [4]. Also, a residual learning technique is used on a deep layer of a deep network to make a balance between SR performance and computational costs. A deeply-recursive convolutional network (DRCN) [5] and deep recursive residual network (DRRN) [6] can exploit recursive networks to decrease parameters of training a SR model. To reduce computational complexities, some up-sampling operations, i.e., a transposed convolution [7], deconvolution and a sub-pixel convolution [8] are set on a deep layer to amplify low-frequency features for constructing HR images. For instance, a fast SR convolutional neural network (FSRCNN) [9] directly inputted low-resolution images to a deep CNN to obtain low-frequency features and applied a sub-pixel convolutional layer to transform low-frequency features to high-frequency features and obtain HR images. To further improve SR performance, an enhanced deep SR network as well as EDSR used enhanced residual blocks to extract more structural information for image super-resolution [10]. Also, deleting many unnecessary batch normalization can improve effect of training efficiency a SR model [10]. Although these algorithms are effective for SISR, they still may suffer from challenges for complex scenes.
In this work, we propose a heterogeneous dynamic convolutional network for SISR as well as HDSRNet. HDSRNet uses a heterogeneous parallel network to capture more information to improve performance of image SR. The upper network uses stacked heterogeneous blocks to extract more contexture information in image SR. Each heterogeneous block includes a dilated, dynamic, common convolutional layer, ReLU and residual learning operation to adaptively adjust parameters, according to different inputs. Also, it can prevent long-term dependency problem. The lower network can use a symmetric architecture to enhance relationships of different layers to obtain more complementary structural information. Quantitative and qualitative analysis show that the proposed method is a good tool for image super-resolution.
Our main contributions can be summarized as follows.
(1) A heterogeneous parallel network is used to facilitate complementary structural information to improve performance for image super-resolution in terms of contextual and hierarchical information.
(2) Dynamic convolutions are embedded into a convolutional neural network to enhance robustness of obtained super-resolution model for complex scenes.
(3) An enhanced residual architecture is designed to address long-term dependency for image super-resolution.
The remaining of this paper is organized as follows. Section II describes related work. Section III illustrates the proposed method. Section IV collects massive comparative experimental results. And Section V summarizes our work.
II Related Work
II-A Deep CNNs for Image Super-resolution
Due to shooting distance, captured images by cameras are unclear. Also, traditional image super-resolution methods need manual setting parameters and complex optimization parameters. To overcome this issue, deep learning techniques are extended for SISR [39, 40]. For instance, a deep recursive residual network used local residual connection and unit to improve effects of image super-resolution [6]. Alternatively, Yang et al. used recursive and gate units to transfer information shallow layers to deep layers to address long-term dependency problem [29]. To improve efficiency of image SR, given LR images are directly as an input of a convolutional neural network and high-quality images are constructed via an up-sampling operation in deep layer [9]. For instance, Ahn et al. used group convolutions to implement a cascading residual CNN to extract more robustness low-frequency information and decrease the number of parameters without causing significantly performance loss in SISR [11]. Tian et al. exploited symmetric group convolutional block to enhance relations of different channels to mine more accurate low-frequency information for image super-resolution [33]. Zhang et al. extended the depth of network and fully used hierarchical information to facilitate more low-frequency information for image SR [41]. According to mentioned methods, we can see that deep CNNs are good tools to obtain clearer images. Motivated by that, we propose a deep CNN for image super-resolution.
II-B Dynamic convolution
Existing CNNs can share parameters to extract useful features to better represent images. However, they may suffer from challenges from varying input images in complex scenes [13]. To address this problem, dynamic convolutions are presented [13]. That is, they can dynamically adjust parameters to adaptively learn models for image applications, according to different inputs. For instance, Wan et al. achieved a graph convolution by arbitrarily structuring non Euclidean data for hyperspectral image classification [42]. Alternatively, Ding et al. can atomically find valuable receptive field of each target node to adaptively capture neighbor information in hyperspectral image classification [43]. To deal with native effects from incurring artifacts of textureless and edge regions, Duan et al. can dynamically generate kernels by region context from input images to better achieve image fusion [44]. To adaptively adjust brightness of enhanced images, Dai et al. fused Taylor expansion and dynamic convolution into a Retinex to intelligently improve clarity of low-light images and degree of brightness flexibly [45]. To extract more context information, Hou et al. proposed dynamic hybrid gradient convolution and coordinate sensitive attention to enhance boundary information extraction for remote sensing image segmentation [46]. To find more high-frequency information, Shen et al. used a spatially enhanced kernel generation to dynamic learn high-frequency and multi-scale features to better achieve denoising effect [47]. Additionally, Tian et al. combined dynamic convolution and wavelet transform to train an adaptive denoiser, according to different given noisy images [48]. According to illustrations, we find that dynamic convolutions are effective for image applications. Motivated by that, we use dynamic convolutions in a CNN for image super-resolution in this work.
III Proposed Method
III-A Network Architecture
The proposed 18-layer HDSRNet contains two 16-layer parallel heterogeneous networks and a 2-layer construction block. The 16-layer parallel heterogeneous is composed of a 16-layer heterogeneous upper network and symmetrical lower network. The heterogeneous upper network is composed of a Conv+ReLU and five stacked heterogeneous blocks, which can extract more contexture information for image super-resolution. A Conv+ReLU is a composite of a convolutional layer and a ReLU operation, which is used to extract non-linear information from given low-resolution images. Also, its input and output channel numbers are 3 and 64, respectively. Its kernel size is .These stacked heterogeneous blocks utilize different convolutional layers (i.e., dynamic and common convolutional layers) and ReLU to dynamically adjust parameters to obtain robust low-frequency information, according to low-resolution images of different inputs. To obtain complementary low-frequency information, a 16-layer symmetrical lower network is designed. It depends on a symmetrical architecture to enhance inter hierarchical connections to obtain more complementary structural information. Also, two sub-networks can interact information via a multiplication operation. A 2-layer construction block is used to transform low-frequency information to high-frequency information and construct predicted high-quality images. The process can be illustrated via the following formulates.
(1) |
where , and denote functions of HDSRNet, heterogeneous upper network and symmetrical lower network, respectively. is a Conv+ReLU. is five stacked heterogeneous blocks. denotes a construction block. represents multiplication operation. Parameters of obtained HDSRNet can be obtained via a loss function in Section III. B. and are outputs of the HUNet and SLNet, respectively.
III-B Loss Function
HDSRNet choose mean absolute error (MAE) [12] as the loss function to obtain parameters. Work process of MAE in the HDSRNet can be represented as below.
(2) |
where and denote the low- and high-resolution images, respectively. represents the number of low-resolution images. is a loss function of HDSRNet. Also, stands for parameters of the trained HDSRNet. Also, it can be optimized by the Adam optimizer [49].
III-C Heterogeneous Block
To train a robust denoiser, heterogeneous blocks are conducted to dynamically adjust parameters to obtain robust low-frequency information, according to different input low-resolution images. Each heterogeneous block is composed of a dilated Conv+ReLU, dynamic Conv+ReLU and Conv+ReLU, where dilated Conv+ReLU represents a combination of a dilated convolution [50] and a ReLU [51]. That is used to capture more context information. A dynamic Conv+ReLU is a combination of a dynamic convolution [13] and ReLU, where can adaptively learn parameters, according to different input information. To prevent long-term dependency, a residual operation is acted between an input of a dilated Conv+ReLU and output of Conv+ReLU. All the convolutional kernels are . Input, output channel numbers, i.e., dilated, dynamic and common convolutional layers are 64. Also, dilated factor is 2 in the dilated convolutional layers. The mentioned process can be conducted the following equation.
(3) |
where is used to define an input of a heterogeneous block. stands for a dilated Conv+ReLU. represents a dynamic Conv+ReLU. is used to express a residual learning operation as well as in Fig. 1.
III-D Symmetrical Lower Sub-network
To obtain complementary low-frequency information, a 16-layer symmetrical lower network is conducted. Each layer contains a Conv+ReLU, where its input and output channel number are 64 besides the first layer, its kernel is . Input and output channel number of the first layer are 3 and 64, respectively. To enhance relations of different layers, residual learning operations are used to act between the 1st and 16th, the 2nd and 15th, 3nd and 14th, 4th and 13th, 5th and 12th, 6th and 11th, 7th and 10th, 8th and 9th layers to transfer obtained information of shallow layers to deep layers to prevent long-term dependency and obtain robust information for image super-resolution. The procedure can be summarized as Eq. (4).
(4) |
where denotes an output of the layer and . Also, , where stands for stacked Conv+ReLU.
III-E Construction Block
A 2-layer construction block is used to construct predicted HR images. It contains two phases. The first phase uses a sub-pixel convolutional layer to convert low-frequency information to high-frequency information, its input and output channel number are 128 and 64, respectively. The second phase only utilizes a convolutional layer (Conv) to construct predicted resolution images, where its input and output channel number are 64 and 3, respectively. Their kernel sizes are . These illustrations can be symbolled as Eq. (5).
(5) |
where and are functions of a sub-pixel convolutional layer and a convolutional layer, respectively.
IV Experiments
IV-A Training Dataset
We utilize the public DIV2K dataset[14] as a training set to develop a HDSRNet for enhancing image super-resolution capabilities. Specifically, the DIV2K contains three parts, i.e., a training dataset of 800 images, a validation dataset of 100 images and a test dataset of 100 images. To enlarge diversity of a training dataset, we merge an original dataset and a validation dataset to form a new training dataset, which includes 900 high-quality images and corresponding low-resolution images of x2, x3 and x4. All training images are saved format of ‘.png’.
IV-B Test Datasets
To fairly test SR performance of our HDSRNet, four public SR datasets, i.e., Set5 containing 5 natural images [15], Set14 containing 14 natural images [16], BSD100 (B100) containing 100 natural images [17] and Urban100 (U100) containing 100 containing [18] are used to conduct experiments. These datasets include three different scales, i.e., x2, x3 and x4. They are saved as format of ‘.png’.
IV-C Implementation Details
Original parameters of training a HDSRNet are of 0.9, of 0.999, epsilon of 1e-8, batch size of 64 and initial learning rate of 1e-4, which is halved every 300,000 steps. Also, it more parameters are the same as Ref. [10].
The proposed HDSRNet is implemented by Pytorch 1.8.0 and Python 3.8. And all the experiments run on a workstation with Ubuntu 20.04, which equipped AMD EPYC 7502P CPU and four GPUs of Nvidia GeForce RTX 3090 with Nvidia CUDA 11.1, and cuDNN 8005. All experiments are trained via one 3090 GPU.
IV-D Network Analysis
We analyze architecture of designed HDSRNet containing a heterogeneous upper sub-network, symmetrical lower-network and construction block for image super-resolution, according to its rationality and validity.
Heterogeneous upper sub-network: Most of existing SR models cannot adaptively learn parameters, according to different given low-resolution images [52]. Dynamic convolutions can automatically adjust parameters, according to different inputs [13]. In response to this motivation, we have developed a heterogeneous upper sub-network specifically for image super-resolution. We design a main network, according to VGG [53]. That is, six stacked Conv+ReLU is used as a basic network to extract structural low-frequency information. A combination of six stacked Conv+ReLU and a CB has obtained peak signal-to-noise ratio (PSNR) [54] of 25.13dB and structural similarity (SSIM) [54] of 0.7525 on U100 for x4, which shows effectiveness of six stacked Conv+ReLU. A CB is used to build HR images, which can be shown in latter section. To prevent long-term dependency problem, deep fusion idea [57] is used in this paper. That is, each residual learning operation is used to act between an input and output of Conv+ReLU besides the first Conv+ReLU to transform low-frequency information from shallow layers to deep layers to pursue better performance on super-resolution. Its effectiveness can be shown by comparing ‘Heterogeneous upper sub-network without Dilated Conv+ReLU, Dynamic Conv+ReLU and a CB’ and ‘A combination of six stacked Conv’ in terms of PSNR and SSIM in TABLE I. To extract more context, each dilated Conv+ReLU with dilated factor of 2 is used before each Conv+ReLU besides the first Conv+ReLU in heterogeneous upper sub-network and its effectiveness can be proved by comparing ‘Heterogeneous upper sub-network without Dilated Conv+ReLU, Dynamic Conv+ReLU and a CB’ and ‘Heterogeneous upper sub-network without dynamic Conv+ReLU and a CB’ in terms of PSNR and SSIM on U100 for x4. Specifically, five stacked blocks besides the first Conv+ReLU in the heterogeneous upper sub-network (HUNet) are five heterogeneous blocks (HBs) and its rationality and validity are shown in latter section. To make HUNet robust, each Dynamic Conv+ReLU is set behind each Dilated Conv+ReLU in each HB to dynamically adjust parameters, according to different inputs. Its better SR results can be found via comparing ‘Heterogeneous upper sub-network and a CB’ and ‘Heterogeneous upper sub-network without dynamic Conv+ReLU and a CB’ in TABLE I, where effectiveness of dynamic convolutions is verified. It is known that difference of network architecture is bigger, its performance is better [41]. Motivated by that, we respectively use Conv+ReLU rather than Dynamic Conv+ReLU and Dilated Conv+ReLU to conduct experiments. As shown in TABLE I, we can see that ‘Heterogeneous upper sub-network and a CB’ has obtained higher PSNR and SSIM values than that of ‘Heterogeneous upper sub-network with Conv+ReLU rather than dilated Conv+ReLU’ and ‘Heterogeneous upper sub-network with Conv+ReLU rather than dynamic Conv+ReLU’. This approach demonstrates the effectiveness of dynamic Conv+ReLU and dilated Conv+ReLU in HB, and highlights the advantages of various network architectures within HB for image super-resolution.
Symmetrical lower sub-network: To prevent poor representation of single network for image super-resolution, a symmetrical lower sub-network is designed, which can be used to assist an upper network to extract more complementary structural information to improve recovered high-resolution images. That is implemented by the following phases.
Scale | Methods | PSNR(dB)/SSIM |
---|---|---|
×4 | A combination of six stacked Conv+ReLU and a CB | 25.13/0.7525 |
Heterogeneous upper sub-network without Dilated | 25.22/0.7559 | |
Conv+ReLU, Dynamic Conv+ReLU and a CB | ||
Heterogeneous upper sub-network without dynamic | 25.46/0.7649 | |
Conv+ReLU and a CB | ||
Heterogeneous upper sub-network without dilated | 25.61/0.7697 | |
Conv+ReLU and a CB | ||
Heterogeneous upper sub-network with Conv+ReLU | 25.74/0.7739 | |
rather than dilated Conv+ReLU | ||
Heterogeneous upper sub-network with Conv+ReLU | 25.68/0.7729 | |
rather than dynamic Conv+ReLU | ||
Heterogeneous upper sub-network and a CB | 25.75/0.7740 | |
HDSRNet without residual learning operations | 25.99/0.7825 | |
in symmetrical lower sub-network | ||
HDSRNet | 26.01/0.7827 |
Dataset | Methods | ×2 | ×3 | ×4 |
PSNR(dB)/SSIM | PSNR(dB)/SSIM | PSNR(dB)/SSIM | ||
Set5 | Bicubic | 33.66/0.9299 | 30.39/0.8682 | 28.42/0.8104 |
SRCNN [3] | 36.66/0.9542 | 32.75/0.9090 | 30.48/0.8628 | |
VDSR [4] | 37.53/0.9587 | 33.66/0.9213 | 31.35/0.8838 | |
DRRN [6] | 37.74/0.9591 | 34.03/0.9244 | 31.68/0.8888 | |
FSRCNN [9] | 37.00/0.9558 | 33.16/0.9140 | 30.71/0.8657 | |
CARN-M [11] | 37.53/0.9583 | 33.99/0.9236 | 31.92/0.8903 | |
IDN [12] | 37.83/0.9600 | 34.11/0.9253 | 31.82/0.8903 | |
A+ [19] | 36.54/0.9544 | 32.58/0.9088 | 30.28/0.8603 | |
JOR [20] | 36.58/0.9543 | 32.55/0.9067 | 30.19/0.8563 | |
RFL [21] | 36.54/0.9537 | 32.43/0.9057 | 30.14/0.8548 | |
SelfEx [18] | 36.49/0.9537 | 32.58/0.9093 | 30.31/0.8619 | |
CSCN [22] | 36.93/0.9552 | 33.10/0.9144 | 30.86/0.8732 | |
RED [23] | 37.56/0.9595 | 33.70/0.9222 | 31.33/0.8847 | |
DnCNN [24] | 37.58/0.9590 | 33.75/0.9222 | 31.40/0.8845 | |
TNRD [25] | 36.86/0.9556 | 33.18/0.9152 | 30.85/0.8732 | |
FDSR [26] | 37.40/0.9513 | 33.68/0.9096 | 31.28/0.8658 | |
RCN [27] | 37.17/0.9583 | 33.45/0.9175 | 31.11/0.8736 | |
DRCN [5] | 37.63/0.9588 | 33.82/0.9226 | 31.53/0.8854 | |
LapSRN [60] | 37.52/0.9590 | - | 31.54/0.8850 | |
NDRCN [63] | 37.73/0.9596 | 33.90/0.9235 | 31.50/0.8859 | |
MemNet [29] | 37.78/0.9597 | 34.09/0.9248 | 31.74/0.8893 | |
LESRCNN [30] | 37.65/0.9586 | 33.93/0.9231 | 31.88/0.8903 | |
LESRCNN-S [30] | 37.57/0.9582 | 34.05/0.9238 | 31.88/0.8907 | |
ScSR [31] | 35.78/0.9485 | 31.34/0.8869 | 29.07/0.8263 | |
DSRCNN [32] | 37.73/0.9588 | 34.17/0.9247 | 31.89/0.8909 | |
DAN [34] | 37.34/0.9526 | 34.04/0.9199 | 31.89/0.8864 | |
PGAN [37] | - | - | 31.03/0.8798 | |
HDSRNet (Ours) | 37.95/0.9604 | 34.31/0.9263 | 32.15/0.8927 |
Dataset | Methods | ×2 | ×3 | ×4 |
PSNR(dB)/SSIM | PSNR(dB)/SSIM | PSNR(dB)/SSIM | ||
Set14 | Bicubic | 30.24/0.8688 | 27.55/0.7742 | 26.00/0.7027 |
SRCNN [3] | 32.42/0.9063 | 29.28/0.8209 | 27.49/0.7503 | |
VDSR [4] | 33.03/0.9124 | 29.77/0.8314 | 28.01/0.7674 | |
DRRN [6] | 33.23/0.9136 | 29.96/0.8349 | 28.21/0.7720 | |
FSRCNN [9] | 32.63/0.9088 | 29.43/0.8242 | 27.59/0.7535 | |
CARN-M [11] | 33.26/0.9141 | 30.08/0.8367 | 28.42/0.7762 | |
IDN [12] | 33.30/0.9148 | 29.99/0.8354 | 28.25/0.7730 | |
A+ [19] | 32.28/0.9056 | 29.13/0.8188 | 27.32/0.7491 | |
JOR [20] | 32.38/0.9063 | 29.19/0.8204 | 27.27/0.7479 | |
RFL [21] | 32.26/0.9040 | 29.05/0.8164 | 27.24/0.7451 | |
SelfEx [18] | 32.22/0.9034 | 29.16/0.8196 | 27.40/0.7518 | |
CSCN [22] | 32.56/0.9074 | 29.41/0.8238 | 27.64/0.7578 | |
RED [23] | 32.81/0.9135 | 29.50/0.8334 | 27.72/0.7698 | |
DnCNN [24] | 33.03/0.9128 | 29.81/0.8321 | 28.04/0.7672 | |
TNRD [25] | 32.51/0.9069 | 29.43/0.8232 | 27.66/0.7563 | |
FDSR [26] | 33.00/0.9042 | 29.61/0.8179 | 27.86/0.7500 | |
RCN [27] | 32.77/0.9109 | 29.63/0.8269 | 27.79/0.7594 | |
DRCN [5] | 33.04/0.9118 | 29.76/0.8311 | 28.02/0.7670 | |
LapSRN [60] | 33.08/0.9130 | 29.63/0.8269 | 28.19/0.7720 | |
NDRCN [63] | 33.20/0.9141 | 29.88/0.8333 | 28.10/0.7697 | |
MemNet [29] | 33.28/0.9142 | 30.00/0.8350 | 28.26/0.7723 | |
LESRCNN [30] | 33.32/0.9148 | 30.12/0.8380 | 28.44/0.7772 | |
LESRCNN-S [30] | 33.30/0.9145 | 30.16/0.8384 | 28.43/0.7776 | |
ScSR [31] | 31.64/0.8940 | 28.19/0.7977 | 26.40/0.7218 | |
DSRCNN [32] | 33.43/0.9157 | 30.24/0.8402 | 28.46/0.7796 | |
DAN [34] | 33.08/0.9041 | 30.09/0.8287 | 28.42/0.7687 | |
PGAN [37] | - | - | 27.75/0.8164 | |
HDSRNet (Ours) | 33.52/0.9170 | 30.27/0.8412 | 28.56/0.7799 |
Dataset | Methods | ×2 | ×3 | ×4 |
PSNR(dB)/SSIM | PSNR(dB)/SSIM | PSNR(dB)/SSIM | ||
B100 | Bicubic | 29.56/0.8431 | 27.21/0.7385 | 25.96/0.6675 |
SRCNN [3] | 31.36/0.8879 | 28.41/0.7863 | 26.90/0.7101 | |
VDSR [4] | 31.90/0.8960 | 28.82/0.7976 | 27.29/0.7251 | |
DRRN [6] | 32.05/0.8973 | 28.95/0.8004 | 27.38/0.7284 | |
FSRCNN [9] | 31.53/0.8920 | 28.53/0.7910 | 26.98/0.7150 | |
CARN-M [11] | 31.92/0.8960 | 28.91/0.8000 | 27.44/0.7304 | |
IDN [12] | 32.08/0.8985 | 28.95/0.8013 | 27.41/0.7297 | |
A+ [19] | 31.21/0.8863 | 28.29/0.7835 | 26.82/0.7087 | |
JOR [20] | 31.22/0.8867 | 28.27/0.7837 | 26.79/0.7083 | |
RFL [21] | 31.16/0.8840 | 28.22/0.7806 | 26.75/0.7054 | |
SelfEx [18] | 31.18/0.8855 | 28.29/0.7840 | 26.84/0.7106 | |
CSCN [22] | 31.40/0.8884 | 28.50/0.7885 | 27.03/0.7161 | |
RED [23] | 31.96/0.8972 | 28.88/0.7993 | 27.35/0.7276 | |
DnCNN [24] | 31.90/0.8961 | 28.85/0.7981 | 27.29/0.7253 | |
TNRD [25] | 31.40/0.8878 | 28.50/0.7881 | 27.00/0.7140 | |
FDSR [26] | 31.87/0.8847 | 28.82/0.7797 | 27.31/0.7031 | |
DRCN [5] | 31.85/0.8942 | 28.80/0.7963 | 27.23/0.7233 | |
LapSRN [60] | 31.80/0.8950 | - | 27.32/0.7280 | |
NDRCN [63] | 32.00/0.8975 | 28.86/0.7991 | 27.30/0.7263 | |
MemNet [29] | 32.08/0.8978 | 28.96/0.8001 | 27.40/0.7281 | |
LESRCNN [30] | 31.95/0.8964 | 28.91/0.8005 | 27.45/0.7313 | |
LESRCNN-S [30] | 31.95/0.8965 | 28.94/ 0.8012 | 27.47/0.7321 | |
ScSR [31] | 30.77/0.8744 | 27.72/0.7647 | 26.61/0.6983 | |
DSRCNN [32] | 32.05/0.8978 | 29.01/0.8029 | 27.50/0.7341 | |
DAN [34] | 31.76/0.8858 | 28.94/0.7919 | 27.51/0.7248 | |
PGAN [37] | - | - | 26.35/0.6926 | |
HDSRNet (Ours) | 32.14/0.8994 | 29.06/0.8048 | 27.55/0.7357 |
Dataset | Methods | ×2 | ×3 | ×4 |
PSNR(dB)/SSIM | PSNR(dB)/SSIM | PSNR(dB)/SSIM | ||
U100 | Bicubic | 26.88/0.8403 | 24.46/0.7349 | 23.14/0.6577 |
SRCNN [3] | 29.50/0.8946 | 26.24/0.7989 | 24.52/0.7221 | |
VDSR [4] | 30.76/0.9140 | 27.14/0.8279 | 25.18/0.7524 | |
DRRN [6] | 31.23/0.9188 | 27.53/0.8378 | 25.44/0.7638 | |
FSRCNN [9] | 29.88/0.9020 | 26.43/0.8080 | 24.62/0.7280 | |
CARN-M [11] | 31.23/0.9193 | 27.55/0.8385 | 25.62/0.7694 | |
IDN [12] | 31.27/0.9196 | 27.42/0.8359 | 25.41/0.7632 | |
A+ [19] | 29.20/0.8938 | 26.03/0.7973 | 24.32/0.7183 | |
JOR [20] | 29.25/0.8951 | 25.97/0.7972 | 24.29/0.7181 | |
RFL [21] | 29.11/0.8904 | 25.86/0.7900 | 24.19/0.7096 | |
SelfEx [18] | 29.54/0.8967 | 26.44/0.8088 | 24.79/0.7374 | |
RED [23] | 30.91/0.9159 | 27.31/0.8303 | 25.35/0.7587 | |
DnCNN [24] | 30.74/0.9139 | 27.15/0.8276 | 25.20/0.7521 | |
TNRD [25] | 29.70/0.8994 | 26.42/0.8076 | 24.61/0.7291 | |
FDSR [26] | 30.91/0.9088 | 27.23/0.8190 | 25.27/0.7417 | |
DRCN [5] | 30.75/0.9133 | 27.15/0.8276 | 25.14/0.7510 | |
LapSRN [60] | 30.41/0.9100 | - | 25.21/0.7560 | |
WaveResNet [61] | 30.96/0.9169 | 27.28/0.8334 | 25.36/0.7614 | |
CPCA [62] | 28.17/0.8990 | 25.61/0.8123 | 23.62/0.7257 | |
NDRCN [63] | 31.06/0.9175 | 27.23/0.8312 | 25.16/0.7546 | |
MemNet [29] | 31.31/0.9195 | 27.56/0.8376 | 25.50/0.7630 | |
LESRCNN [30] | 31.45/0.9206 | 27.70/0.8415 | 25.77/0.7732 | |
LESRCNN-S [30] | 31.45/0.9207 | 27.76/0.8424 | 25.78/0.7739 | |
ScSR [31] | 28.26/0.8828 | - | 24.02/0.7024 | |
DSRCNN [32] | 31.83/0.9252 | 27.99/0.8483 | 25.94/0.7815 | |
DAN [34] | 30.60/0.9060 | 27.65/0.8352 | 25.86/0.7721 | |
LDRAN[36] | - | - | 25.91/0.7786 | |
PGAN [37] | - | - | 25.47/0.9574 | |
HDSRNet (Ours) | 32.00/0.9267 | 28.02/0.8493 | 26.01/0.7827 |
The first phase design a 16-layer stacked Conv+ReLU to extract low-frequency structural information, according to VGG [53]. Its effectiveness can be proved via ‘Heterogeneous upper sub-network and a CB’ and ‘HDSRNet without residual learning operations in symmetrical lower sub-network’ in TABLE I. To enhance relationship of hierarchical features, the second phase is conducted. It uses residual learning operations to merge obtained information of shallow and deep layers to achieve a symmetrical lower sub-network to extract more accurate information as shown in Section III.D. As illustrated in TABLE I, we can see that ‘HDSRNet’ has obtained higher PSNR and SSIM than that of HDSRNet without residual learning operations in symmetrical lower sub-network, which shows effectiveness of residual learning operations in the symmetrical lower sub-network for image SR. Besides, ’HDSRNet’ has obtained higher PSNR value than that of ’Heterogeneous upper sub-network and a CB’ in TABLE I, which shows effectiveness of Heterogeneous parallel networks for image super-resolution.
Construction block: To construct HR images, we design a 2-layer construction block. It consists of a sub-pixel convolutional layer and a convolutional layer. Sub-pixel convolutional layer is used to amplify low-frequency information to high-frequency information. A convolutional layer is utilized to construct predicted HR images.
IV-E Comparisons with Popular Methods for SR
In this section, we use both quantitative and qualitative analysis to evaluate results of our HDSRNet on SISR. Quantitative analysis includes PSNR, SSIM, running time and complexity of our HDSRNet. Specifically, PSNR and SSIM are used to measure quality of predicted HR images. Also, running time and parameters are used to test feasibility of our HDSRNet for real applications, i.e., phones and cameras. We use A+ [19], jointly optimized regressors (JOR) [20], image upscaling with super-resolution forests (RFL) [21], self-exemplars SR method (SelfEx) [18], cascade of sparse coding based network (CSCN) [22], image restoration using encoder-decoder networks (RED) [23], denoising convolutional neural network (DnCNN) [24], trainable non-linear reaction diffusion (TNRD) [25], fast dilated residual SR convolutional net-work (FDSR) [26], SRCNN [3], FSRCNN [9], residue context sub-network (RCN) [27], VDSR [4], deeply-recursive convolutional network (DRCN) [5], IDN [12], DRRN [6], Laplacian SR network (LapSRN) [60], new architecture of deep recursive convolution networks for SR (NDRCN) [63], a persistent memory network (MemNet) [29], CARN-M [11], light-weight image super-resolution with enhanced CNN (LESRCNN) [30], deep alternating network (DAN) [34], Pixel-Level Generative Adversarial Network (PGAN) [37] and our HDSRNet on four public datasets, i.e., Set5, Set14, B100 and U100 for x2, x3 and x4 to conduct experiments. As shown in TABLEs II and III, we can see that our HDSRNet has obtained the best performance in terms of PSNR and SSIM for x2, x3 and x4. For instance, our HDSRNet has improvements of 0.12dB on PSNR and 0.004 on SSIM on Set 5 for x2 than that of IDN in TABLE II. Our HDSRNet has obtained improvements of 0.1dB on PSNR and 0.003 on SSIM on Set14 for x4 than that of DSRCNN in TABLE III. That also shows that our HDSRNet is effective on small datasets for image super-resolution. For big datasets, our HDSRNet still has an advantage for image super-resolution in TABLEs IV and V, we can see that our HDSRNet has obtained the best SR results for all the scale factors, i.e., x2, x3 and x4. For instance, our HDSRNet has exceeded 0.09dB on PSNR and 0.0016 on SSIM than that of DSRCNN for x2 on B100 in TABLE IV. Our HDSRNet has exceeded 0.07dB on PSNR and 0.0012 on SSIM than that of DSRCNN for x4 on U100 in TABLE V. That shows that our HDSRNet is suitable to big datasets for image super-resolution. Specifically, red and blue lines denote the best and second SR results from TABLE II to TABLE V, respectively.
To test practicality of our HDSRNet, we use running time and complexity to test performance of our HDSRNet for image SR. As shown in TABLE VI, our HDSRNet has obtained competitive running time for restoring low-resolution images sizes of and . As illustrated in TABLE VII, although our HDSRNet has obtained more parameters than that of ACNet, it has obtained less flops than that of ACNet. Thus, it is competitive in complexity and suitable for application in consumer electronic products. According to mentioned experiment results, we can find that our HDSRNet is effective in terms of quantitative analysis.
For qualitative analysis, we use Bicubic, SRCNN, LESRCNN, DCLS, VDSR, DRCN and HDSRNet on a low-resolution image from the B100 and U100 for x3 and x4 to recover high-quality images, which are used to compare with given HR images. That is, we amplify one area of predicted high-quality images from different methods as observation areas, observation areas are clearer, their corresponding methods are more effective for image super-resolution. As shown in Figs.2-7, we can see that our HDSRNet has obtained clearer areas, it shows that our HDSRNet is effective for qualitative analysis. In a summary, our HDSRNet is a good tool for image resolution, according to quantitative and qualitative analysis.
V Conclusion
In this paper, we propose a heterogeneous dynamic convolutional network in SISR. This paper designs a heterogeneous dynamic convolutional network to capture more structural information. The upper network depends on stacked heterogeneous blocks to facilitate more contexture information for improving effects of image super-resolution. Also, each heterogeneous block is composed of a dilated, dynamic, common convolutional layers, ReLU and residual learning operation is used to adjust parameters for different inputs and prevent long-term dependency problem. The lower network uses symmetric architecture to enhance relations of different layers to extract more structural information for SISR. We will deal with SISR with non-reference images in the future.
References
- [1] K. Zhang, W. Zuo, and L. Zhang, “Deep plug-and-play super-resolution for arbitrary blur kernels,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 1671–1681.
- [2] X. Zhou, H. Huang, R. He, Z. Wang, J. Hu, and T. Tan, “Msra-sr: Image super-resolution transformer with multi-scale shared representation acquisition,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 12 665–12 676.
- [3] C. Dong, C. C. Loy, K. He, and X. Tang, “Image super-resolution using deep convolutional networks,” IEEE transactions on pattern analysis and machine intelligence, vol. 38, no. 2, pp. 295–307, 2015.
- [4] J. Kim, J. K. Lee, and K. M. Lee, “Accurate image super-resolution using very deep convolutional networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 1646–1654.
- [5] J. Kim, J. K. Lee, and K. M. Lee, “Deeply-recursive convolutional network for image super-resolution,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 1637–1645.
- [6] Y. Tai, J. Yang, and X. Liu, “Image super-resolution via deep recursive residual network,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 3147–3155.
- [7] V. Dumoulin and F. Visin, “A guide to convolution arithmetic for deep learning (2016),” arXiv preprint arXiv:1603.07285, 2016.
- [8] W. Shi, J. Caballero, F. Huszár, J. Totz, A. P. Aitken, R. Bishop, D. Rueckert, and Z. Wang, “Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 1874–1883.
- [9] C. Dong, C. C. Loy, and X. Tang, “Accelerating the super-resolution convolutional neural network,” in Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part II 14. Springer, 2016, pp. 391–407.
- [10] B. Lim, S. Son, H. Kim, S. Nah, and K. Mu Lee, “Enhanced deep residual networks for single image super-resolution,” in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2017, pp. 136–144.
- [11] N. Ahn, B. Kang, and K.-A. Sohn, “Fast, accurate, and lightweight super-resolution with cascading residual network,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 252–268.
- [12] Z. Hui, X. Wang, and X. Gao, “Fast and accurate single image super-resolution via information distillation network,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 723–731.
- [13] Y. Chen, X. Dai, M. Liu, D. Chen, L. Yuan, and Z. Liu, “Dynamic convolution: Attention over convolution kernels,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 11 030–11 039.
- [14] E. Agustsson and R. Timofte, “Ntire 2017 challenge on single image super-resolution: Dataset and study,” in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2017, pp. 126–135.
- [15] M. Bevilacqua, A. Roumy, C. Guillemot, and M. L. Alberi-Morel, “Low-complexity single-image super-resolution based on nonnegative neighbor embedding,” 2012.
- [16] R. Zeyde, M. Elad, and M. Protter, “On single image scale-up using sparse-representations,” in Curves and Surfaces: 7th International Conference, Avignon, France, June 24-30, 2010, Revised Selected Papers 7. Springer, 2012, pp. 711–730.
- [17] D. Martin, C. Fowlkes, D. Tal, and J. Malik, “A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics,” in Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001, vol. 2. IEEE, 2001, pp. 416–423.
- [18] J.-B. Huang, A. Singh, and N. Ahuja, “Single image super-resolution from transformed self-exemplars,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 5197–5206.
- [19] R. Timofte, V. De Smet, and L. Van Gool, “A+: Adjusted anchored neighborhood regression for fast super-resolution,” in Computer Vision–ACCV 2014: 12th Asian Conference on Computer Vision, Singapore, Singapore, November 1-5, 2014, Revised Selected Papers, Part IV 12. Springer, 2015, pp. 111–126.
- [20] D. Dai, R. Timofte, and L. Van Gool, “Jointly optimized regressors for image super-resolution,” in Computer Graphics Forum, vol. 34, no. 2. Wiley Online Library, 2015, pp. 95–104.
- [21] S. Schulter, C. Leistner, and H. Bischof, “Fast and accurate image upscaling with super-resolution forests,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 3791–3799.
- [22] Z. Wang, D. Liu, J. Yang, W. Han, and T. Huang, “Deep networks for image super-resolution with sparse prior,” in Proceedings of the IEEE international conference on computer vision, 2015, pp. 370–378.
- [23] X. Mao, C. Shen, and Y.-B. Yang, “Image restoration using very deep convolutional encoder-decoder networks with symmetric skip connections,” Advances in neural information processing systems, vol. 29, 2016.
- [24] K. Zhang, X. Gao, D. Tao, and X. Li, “Single image super-resolution with non-local means and steering kernel regression,” IEEE Transactions on Image Processing, vol. 21, no. 11, pp. 4544–4556, 2012.
- [25] Y. Chen and T. Pock, “Trainable nonlinear reaction diffusion: A flexible framework for fast and effective image restoration,” IEEE transactions on pattern analysis and machine intelligence, vol. 39, no. 6, pp. 1256–1272, 2016.
- [26] Z. Lu, Z. Yu, P. Yali, L. Shigang, W. Xiaojun, L. Gang, and R. Yuan, “Fast single image super-resolution via dilated residual networks,” IEEE Access, vol. 7, pp. 109 729–109 738, 2018.
- [27] Y. Shi, K. Wang, C. Chen, L. Xu, and L. Lin, “Structure-preserving image super-resolution via contextualized multitask learning,” IEEE transactions on multimedia, vol. 19, no. 12, pp. 2804–2815, 2017.
- [28] H. Ren, M. El-Khamy, and J. Lee, “Image super resolution based on fusing multiple convolution neural networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2017, pp. 54–61.
- [29] Y. Tai, J. Yang, X. Liu, and C. Xu, “Memnet: A persistent memory network for image restoration,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 4539–4547.
- [30] C. Tian, R. Zhuge, Z. Wu, Y. Xu, W. Zuo, C. Chen, and C.-W. Lin, “Lightweight image super-resolution with enhanced cnn,” Knowledge-Based Systems, vol. 205, p. 106235, 2020.
- [31] J. Yang, J. Wright, T. S. Huang, and Y. Ma, “Image super-resolution via sparse representation,” IEEE transactions on image processing, vol. 19, no. 11, pp. 2861–2873, 2010.
- [32] J. Song, J. Xiao, C. Tian, Y. Hu, L. You, and S. Zhang, “A dual cnn for image super-resolution,” Electronics, vol. 11, no. 5, p. 757, 2022.
- [33] C. Tian, Y. Zhang, W. Zuo, C.-W. Lin, D. Zhang, and Y. Yuan, “A heterogeneous group cnn for image super-resolution,” IEEE transactions on neural networks and learning systems, 2022.
- [34] Y. Huang, S. Li, L. Wang, T. Tan et al., “Unfolding the alternating optimization for blind super resolution,” Advances in Neural Information Processing Systems, vol. 33, pp. 5632–5643, 2020.
- [35] Z. Luo, H. Huang, L. Yu, Y. Li, H. Fan, and S. Liu, “Deep constrained least squares for blind image super-resolution,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 17 642–17 652.
- [36] J. Sahambi et al., “A lightweight deep residual attention network for single image super resolution,” in 2023 National Conference on Communications (NCC). IEEE, 2023, pp. 1–6.
- [37] W. Shi, F. Tao, and Y. Wen, “Structure-aware deep networks and pixel-level generative adversarial training for single image super-resolution,” IEEE Transactions on Instrumentation and Measurement, vol. 72, pp. 1–14, 2023.
- [38] C. Tian, Y. Xu, W. Zuo, C.-W. Lin, and D. Zhang, “Asymmetric cnn for image superresolution,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 52, no. 6, pp. 3718–3730, 2021.
- [39] C. Tian, X. Zhang, J. C.-W. Lin, W. Zuo, Y. Zhang, and C.-W. Lin, “Generative adversarial networks for image super-resolution: A survey,” arXiv preprint arXiv:2204.13620, 2022.
- [40] F. Yu, X. Wang, M. Cao, G. Li, Y. Shan, and C. Dong, “Osrt: Omnidirectional image super-resolution with distortion-aware transformer,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 13 283–13 292.
- [41] Y. Zhang, Y. Tian, Y. Kong, B. Zhong, and Y. Fu, “Residual dense network for image super-resolution,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 2472–2481.
- [42] S. Wan, C. Gong, P. Zhong, B. Du, L. Zhang, and J. Yang, “Multiscale dynamic graph convolutional network for hyperspectral image classification,” IEEE Transactions on Geoscience and Remote Sensing, vol. 58, no. 5, pp. 3162–3177, 2019.
- [43] Y. Ding, J. Feng, Y. Chong, S. Pan, and X. Sun, “Adaptive sampling toward a dynamic graph convolutional network for hyperspectral image classification,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–17, 2021.
- [44] Z. Duan, T. Zhang, X. Luo, and J. Tan, “Dckn: multi-focus image fusion via dynamic convolutional kernel network,” Signal Processing, vol. 189, p. 108282, 2021.
- [45] C. Dai, Z. Guan, and M. Lin, “Single low-light image enhancer using taylor expansion and fully dynamic convolution,” Signal Processing, vol. 189, p. 108280, 2021.
- [46] J. Hou, Z. Guo, Y. Wu, W. Diao, and T. Xu, “Bsnet: Dynamic hybrid gradient convolution based boundary-sensitive network for remote sensing image segmentation,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–22, 2022.
- [47] H. Shen, Z.-Q. Zhao, and W. Zhang, “Adaptive dynamic filtering network for image denoising,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 2, 2023, pp. 2227–2235.
- [48] C. Tian, M. Zheng, W. Zuo, B. Zhang, Y. Zhang, and D. Zhang, “Multi-stage image denoising with the wavelet transform,” Pattern Recognition, vol. 134, p. 109050, 2023.
- [49] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
- [50] F. Yu and V. Koltun, “Multi-scale context aggregation by dilated convolutions,” arXiv preprint arXiv:1511.07122, 2015.
- [51] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Advances in neural information processing systems, vol. 25, 2012.
- [52] Y.-S. Xu, S.-Y. R. Tseng, Y. Tseng, H.-K. Kuo, and Y.-M. Tsai, “Unified dynamic convolutional network for super-resolution with variational degradations,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 12 496–12 505.
- [53] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
- [54] A. Hore and D. Ziou, “Image quality metrics: Psnr vs. ssim,” in 2010 20th international conference on pattern recognition. IEEE, 2010, pp. 2366–2369.
- [55] K. Jiang, Z. Wang, P. Yi, and J. Jiang, “Hierarchical dense recursive network for image super-resolution,” Pattern Recognition, vol. 107, p. 107475, 2020.
- [56] M. Hu, K. Jiang, Z. Wang, X. Bai, and R. Hu, “Cycmunet+: Cycle-projected mutual learning for spatial-temporal video super-resolution,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
- [57] K. Jiang, Z. Wang, P. Yi, T. Lu, J. Jiang, and Z. Xiong, “Dual-path deep fusion network for face image hallucination,” IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no. 1, pp. 378–391, 2020.
- [58] Z. Zha, X. Yuan, B. Wen, J. Zhou, J. Zhang, and C. Zhu, “From rank estimation to rank approximation: Rank residual constraint for image restoration,” IEEE Transactions on Image Processing, vol. 29, pp. 3254–3269, 2020.
- [59] Z. Zha, X. Yuan, B. Wen, J. Zhou, and C. Zhu, “Group sparsity residual constraint with non-local priors for image restoration,” IEEE Transactions on Image Processing, vol. 29, pp. 8960–8975, 2020.
- [60] W.-S. Lai, J.-B. Huang, N. Ahuja, and M.-H. Yang, “Deep laplacian pyramid networks for fast and accurate super-resolution,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 624–632.
- [61] W. Bae, J. Yoo, and J. Chul Ye, “Beyond deep residual learning for image restoration: Persistent homology-guided manifold simplification,” in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2017, pp. 145–153.
- [62] J. Xu, M. Li, J. Fan, X. Zhao, and Z. Chang, “Self-learning super-resolution using convolutional principal component analysis and random matching,” IEEE Transactions on Multimedia, vol. 21, no. 5, pp. 1108–1121, 2018.
- [63] F. Cao and B. Chen, “New architecture of deep recursive convolution networks for super-resolution,” Knowledge-Based Systems, vol. 178, pp. 98–110, 2019.
- [64] S. Wang and Y. Zhang, “Deep learning for covid-19 diagnosis via chest images.”
- [65] Y. Zhang, L. Deng, H. Zhu, W. Wang, Z. Ren, Q. Zhou, S. Lu, S. Sun, Z. Zhu, J. M. Gorriz et al., “Deep learning in food category recognition,” Information Fusion, p. 101859, 2023.
- [66] Y.-D. Zhang, Z. Dong, S.-H. Wang, X. Yu, X. Yao, Q. Zhou, H. Hu, M. Li, C. Jiménez-Mesa, J. Ramirez et al., “Advances in multimodal data fusion in neuroimaging: Overview, challenges, and novel orientation,” Information Fusion, vol. 64, pp. 149–187, 2020.
- [67] X. Zhou, Q. Yang, X. Zheng, W. Liang, I. Kevin, K. Wang, J. Ma, Y. Pan, and Q. Jin, “Personalized federation learning with model-contrastive learning for multi-modal user modeling in human-centric metaverse,” IEEE Journal on Selected Areas in Communications, 2024.
- [68] X. Zhou, Q. Yang, Q. Liu, W. Liang, K. Wang, Z. Liu, J. Ma, and Q. Jin, “Spatial–temporal federated transfer learning with multi-sensor data fusion for cooperative positioning,” Information Fusion, vol. 105, p. 102182, 2024.
- [69] X. Zhou, X. Zheng, X. Cui, J. Shi, W. Liang, Z. Yan, L. T. Yang, S. Shimizu, I. Kevin, and K. Wang, “Digital twin enhanced federated reinforcement learning with lightweight knowledge distillation in mobile networks,” IEEE Journal on Selected Areas in Communications, 2023.
- [70] X. Zhou, X. Zheng, T. Shu, W. Liang, I. Kevin, K. Wang, L. Qi, S. Shimizu, and Q. Jin, “Information theoretic learning-enhanced dual-generative adversarial networks with causal representation for robust ood generalization,” IEEE Transactions on Neural Networks and Learning Systems, 2023.