A Heterogeneous Dynamic Convolutional Neural Network for Image Super-resolution

Chunwei Tian, Member, IEEE, Xuanyu Zhang, Jia Ren, Wangmeng Zuo, Senior Member, IEEE, Yanning Zhang, Senior Member, IEEE, Chia-Wen Lin, Fellow, IEEE
This work was supported in part by National Natural Science Foundation of China under Grant 62201468, in part by the Guangdong Basic and Applied Basic Research Foundation under Grant 2021A1515110079, in part by the Youth Science and Technology Talent Promotion Project of Jiangsu Association for Science and Technology under Grant JSTJ-2023-017. (Corresponding author: Chunwei Tian (Email:chunweitian@nwpu.edu.cn) and Yanning Zhang (Email: ynzhang@nwpu.edu.cn).)Chunwei Tian is with the School of Software, Northwestern Polytechnical University, Xi’an, 710129, China. He is with the National Engineering Laboratory for Integrated Aero-Space-Ground-Ocean Big Data Application Technology, Xi’an, 710129, China. He is with the Research & Development Institute, Northwestern Polytechnical University, Shenzhen, 518057, China. (Email:chunweitian@nwpu.edu.cn)Xuanyu Zhang is with the School of Software, Northwestern Polytechnical University, Xi’an, 710129, China. (Email: xuanyuzhang@mail.nwpu.edu.cn)Jia Ren is with the School of Information and Communication Engineering, Hainan University, Haikou, 570228, China. (Email: renjia@hainanu.edu)Wangmeng Zuo is with the School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, China. (Email:cswmzuo@gmail.com)Yanning Zhang is with the School of Computer Science, Northwestern Polytechnical University, Xi’an, 710129, China. She is with the National Engineering Laboratory for Integrated Aero-Space-Ground-Ocean Big Data Application Technology, Xi’an, 710129, China. (Email: ynzhang@nwpu.edu.cn)Chia-Wen Lin is with the Department of Electrical Engineering and the Institute of Communications Engineering, National Tsing Hua University. (Email: cwlin@ee.nthu.edu.tw)

Abstract

Convolutional neural networks can automatically learn features via deep network architectures and given input samples. However, robustness of obtained models may have challenges in varying scenes. Bigger differences of a network architecture are beneficial to extract more complementary structural information to enhance robustness of an obtained super-resolution model. In this paper, we present a heterogeneous dynamic convolutional network in image super-resolution (HDSRNet). To capture more information, HDSRNet is implemented by a heterogeneous parallel network. The upper network can facilitate more contexture information via stacked heterogeneous blocks to improve effects of image super-resolution. Each heterogeneous block is composed of a combination of a dilated, dynamic, common convolutional layers, ReLU and residual learning operation. It can not only adaptively adjust parameters, according to different inputs, but also prevent long-term dependency problem. The lower network utilizes a symmetric architecture to enhance relations of different layers to mine more structural information, which is complementary with a upper network for image super-resolution. The relevant experimental results show that the proposed HDSRNet is effective to deal with image resolving. The code of HDSRNet can be obtained at https://github.com/hellloxiaotian/HDSRNet.

Index Terms:

Dynamic convolutions, dilated convolutions, heterogeneous networks, image super-resolution.

I Introduction

Single image super-resolution (SISR) techniques can obtain high-quality images from given low-resolution images (LR), according to solution of ill-posed inverse problem [58, 59]. In recent years, machine learning techniques have obtained huge achievements in many applications, i.e., disease diagnosis [64], classification [70], object recognition [65], multimodal data fusion [66, 68], Metaverse [67, 69] and SISR [2]. Specifically, deep learning techniques with end-to-end architectures have obtained higher performance in image super-resolution. Deep convolutional neural networks (CNNs) use end-to-end architectures rather than manual setting parameters to obtain strong learning abilities to improve visual effects of image super-resolution (SR) [55]. For instance, Dong et al. designed 3-layer CNN in pixel mapping manner to convert a low-resolution (LR) image to a high-resolution (HR) image [3]. Although it can improve resolution of predicted images, it cannot make a tradeoff between network depth and performance in image super-resolution. To address this issue, scholars try to enlarge network depth to pursue improved performance in image super-resolution [4]. Stacking some small filters is used to achieve a very deep SR network [4]. Also, a residual learning technique is used on a deep layer of a deep network to make a balance between SR performance and computational costs. A deeply-recursive convolutional network (DRCN) [5] and deep recursive residual network (DRRN) [6] can exploit recursive networks to decrease parameters of training a SR model. To reduce computational complexities, some up-sampling operations, i.e., a transposed convolution [7], deconvolution and a sub-pixel convolution [8] are set on a deep layer to amplify low-frequency features for constructing HR images. For instance, a fast SR convolutional neural network (FSRCNN) [9] directly inputted low-resolution images to a deep CNN to obtain low-frequency features and applied a sub-pixel convolutional layer to transform low-frequency features to high-frequency features and obtain HR images. To further improve SR performance, an enhanced deep SR network as well as EDSR used enhanced residual blocks to extract more structural information for image super-resolution [10]. Also, deleting many unnecessary batch normalization can improve effect of training efficiency a SR model [10]. Although these algorithms are effective for SISR, they still may suffer from challenges for complex scenes.

In this work, we propose a heterogeneous dynamic convolutional network for SISR as well as HDSRNet. HDSRNet uses a heterogeneous parallel network to capture more information to improve performance of image SR. The upper network uses stacked heterogeneous blocks to extract more contexture information in image SR. Each heterogeneous block includes a dilated, dynamic, common convolutional layer, ReLU and residual learning operation to adaptively adjust parameters, according to different inputs. Also, it can prevent long-term dependency problem. The lower network can use a symmetric architecture to enhance relationships of different layers to obtain more complementary structural information. Quantitative and qualitative analysis show that the proposed method is a good tool for image super-resolution.

Our main contributions can be summarized as follows.

(1) A heterogeneous parallel network is used to facilitate complementary structural information to improve performance for image super-resolution in terms of contextual and hierarchical information.

(2) Dynamic convolutions are embedded into a convolutional neural network to enhance robustness of obtained super-resolution model for complex scenes.

(3) An enhanced residual architecture is designed to address long-term dependency for image super-resolution.

The remaining of this paper is organized as follows. Section II describes related work. Section III illustrates the proposed method. Section IV collects massive comparative experimental results. And Section V summarizes our work.

II Related Work

II-A Deep CNNs for Image Super-resolution

Due to shooting distance, captured images by cameras are unclear. Also, traditional image super-resolution methods need manual setting parameters and complex optimization parameters. To overcome this issue, deep learning techniques are extended for SISR [39, 40]. For instance, a deep recursive residual network used local residual connection and unit to improve effects of image super-resolution [6]. Alternatively, Yang et al. used recursive and gate units to transfer information shallow layers to deep layers to address long-term dependency problem [29]. To improve efficiency of image SR, given LR images are directly as an input of a convolutional neural network and high-quality images are constructed via an up-sampling operation in deep layer [9]. For instance, Ahn et al. used group convolutions to implement a cascading residual CNN to extract more robustness low-frequency information and decrease the number of parameters without causing significantly performance loss in SISR [11]. Tian et al. exploited symmetric group convolutional block to enhance relations of different channels to mine more accurate low-frequency information for image super-resolution [33]. Zhang et al. extended the depth of network and fully used hierarchical information to facilitate more low-frequency information for image SR [41]. According to mentioned methods, we can see that deep CNNs are good tools to obtain clearer images. Motivated by that, we propose a deep CNN for image super-resolution.

II-B Dynamic convolution

Existing CNNs can share parameters to extract useful features to better represent images. However, they may suffer from challenges from varying input images in complex scenes [13]. To address this problem, dynamic convolutions are presented [13]. That is, they can dynamically adjust parameters to adaptively learn models for image applications, according to different inputs. For instance, Wan et al. achieved a graph convolution by arbitrarily structuring non Euclidean data for hyperspectral image classification [42]. Alternatively, Ding et al. can atomically find valuable receptive field of each target node to adaptively capture neighbor information in hyperspectral image classification [43]. To deal with native effects from incurring artifacts of textureless and edge regions, Duan et al. can dynamically generate kernels by region context from input images to better achieve image fusion [44]. To adaptively adjust brightness of enhanced images, Dai et al. fused Taylor expansion and dynamic convolution into a Retinex to intelligently improve clarity of low-light images and degree of brightness flexibly [45]. To extract more context information, Hou et al. proposed dynamic hybrid gradient convolution and coordinate sensitive attention to enhance boundary information extraction for remote sensing image segmentation [46]. To find more high-frequency information, Shen et al. used a spatially enhanced kernel generation to dynamic learn high-frequency and multi-scale features to better achieve denoising effect [47]. Additionally, Tian et al. combined dynamic convolution and wavelet transform to train an adaptive denoiser, according to different given noisy images [48]. According to illustrations, we find that dynamic convolutions are effective for image applications. Motivated by that, we use dynamic convolutions in a CNN for image super-resolution in this work.

III Proposed Method

III-A Network Architecture

The proposed 18-layer HDSRNet contains two 16-layer parallel heterogeneous networks and a 2-layer construction block. The 16-layer parallel heterogeneous is composed of a 16-layer heterogeneous upper network and symmetrical lower network. The heterogeneous upper network is composed of a Conv+ReLU and five stacked heterogeneous blocks, which can extract more contexture information for image super-resolution. A Conv+ReLU is a composite of a convolutional layer and a ReLU operation, which is used to extract non-linear information from given low-resolution images. Also, its input and output channel numbers are 3 and 64, respectively. Its kernel size is $3\times 3$ .These stacked heterogeneous blocks utilize different convolutional layers (i.e., dynamic and common convolutional layers) and ReLU to dynamically adjust parameters to obtain robust low-frequency information, according to low-resolution images of different inputs. To obtain complementary low-frequency information, a 16-layer symmetrical lower network is designed. It depends on a symmetrical architecture to enhance inter hierarchical connections to obtain more complementary structural information. Also, two sub-networks can interact information via a multiplication operation. A 2-layer construction block is used to transform low-frequency information to high-frequency information and construct predicted high-quality images. The process can be illustrated via the following formulates.

\begin{array}[]{l}{I_{S}}=HDSRNet({I_{L}})\\ {\rm{}\quad}=CB{\rm{(}}HUNet({I_{L}})\times SLNet({I_{L}}))\\ {\rm{}\quad}=CB{\rm{((}}5HB{\rm{(}}CR({I_{L}})){\rm{)}}\times SLNet({I_{L}}))% \\ {\rm{}\quad}=CB({O_{HUNet}}\times{O_{SLNet}})\end{array}

(1)

where $HDSRNet$ , $HUNet$ and $SLNet$ denote functions of HDSRNet, heterogeneous upper network and symmetrical lower network, respectively. $CR$ is a Conv+ReLU. $5HB$ is five stacked heterogeneous blocks. $CB$ denotes a construction block. $\times$ represents multiplication operation. Parameters of obtained HDSRNet can be obtained via a loss function in Section III. B. ${O_{HUNet}}$ and ${O_{SLNet}}$ are outputs of the HUNet and SLNet, respectively.

Refer to caption — Figure 1: Network architecture of the proposed HDSRNet.

III-B Loss Function

HDSRNet choose mean absolute error (MAE) [12] as the loss function to obtain parameters. Work process of MAE in the HDSRNet can be represented as below.

l(p)={1\mathord{\left/{\vphantom{1T}}\right.\kern-1.2pt}T}\sum\limits_{j=1}^{T% }{\left|{HDSRNet(I_{L}^{j})-I_{H}^{j}}\right|}

(2)

where $I_{L}^{j}$ and $I_{H}^{j}$ denote the $jth$ low- and high-resolution images, respectively. $T$ represents the number of low-resolution images. $l$ is a loss function of HDSRNet. Also, $p$ stands for parameters of the trained HDSRNet. Also, it can be optimized by the Adam optimizer [49].

III-C Heterogeneous Block

To train a robust denoiser, heterogeneous blocks are conducted to dynamically adjust parameters to obtain robust low-frequency information, according to different input low-resolution images. Each heterogeneous block is composed of a dilated Conv+ReLU, dynamic Conv+ReLU and Conv+ReLU, where dilated Conv+ReLU represents a combination of a dilated convolution [50] and a ReLU [51]. That is used to capture more context information. A dynamic Conv+ReLU is a combination of a dynamic convolution [13] and ReLU, where can adaptively learn parameters, according to different input information. To prevent long-term dependency, a residual operation is acted between an input of a dilated Conv+ReLU and output of Conv+ReLU. All the convolutional kernels are $3\times 3$ . Input, output channel numbers, i.e., dilated, dynamic and common convolutional layers are 64. Also, dilated factor is 2 in the dilated convolutional layers. The mentioned process can be conducted the following equation.

HB({O_{t}})=CR(DyCR(DiCR({O_{t}})))+{O_{t}}

(3)

where ${O_{t}}$ is used to define an input of a heterogeneous block. $DiCR$ stands for a dilated Conv+ReLU. $DyCR$ represents a dynamic Conv+ReLU. $+$ is used to express a residual learning operation as well as $\oplus$ in Fig. 1.

III-D Symmetrical Lower Sub-network

To obtain complementary low-frequency information, a 16-layer symmetrical lower network is conducted. Each layer contains a Conv+ReLU, where its input and output channel number are 64 besides the first layer, its kernel is $3\times 3$ . Input and output channel number of the first layer are 3 and 64, respectively. To enhance relations of different layers, residual learning operations are used to act between the 1st and 16th, the 2nd and 15th, 3nd and 14th, 4th and 13th, 5th and 12th, 6th and 11th, 7th and 10th, 8th and 9th layers to transfer obtained information of shallow layers to deep layers to prevent long-term dependency and obtain robust information for image super-resolution. The procedure can be summarized as Eq. (4).

\begin{array}[]{c}{O_{SLNet}}=CR(CR(...(CR(CR({O_{8}}+CR({O_{8}}))\\ \qquad+{O_{7}})+{O_{6}})+....)+{O_{2}})+{O_{1}}\end{array}

(4)

where ${O_{i}}$ denotes an output of the $ith$ layer and $i=1,2,3...,8$ . Also, ${O_{i}}=iCR({I_{L}})$ , where $iCR$ stands for $i$ stacked Conv+ReLU.

III-E Construction Block

A 2-layer construction block is used to construct predicted HR images. It contains two phases. The first phase uses a sub-pixel convolutional layer to convert low-frequency information to high-frequency information, its input and output channel number are 128 and 64, respectively. The second phase only utilizes a convolutional layer (Conv) to construct predicted resolution images, where its input and output channel number are 64 and 3, respectively. Their kernel sizes are $3\times 3$ . These illustrations can be symbolled as Eq. (5).

\begin{array}[]{l}{O_{S}}=CB({O_{HUNet}}\times{O_{SLNet}})\\ \quad\quad=C(Sub({O_{HUNet}}\times{O_{SLNet}}))\end{array}

(5)

where $Sub$ and $C$ are functions of a sub-pixel convolutional layer and a convolutional layer, respectively.

IV Experiments

IV-A Training Dataset

We utilize the public DIV2K dataset[14] as a training set to develop a HDSRNet for enhancing image super-resolution capabilities. Specifically, the DIV2K contains three parts, i.e., a training dataset of 800 images, a validation dataset of 100 images and a test dataset of 100 images. To enlarge diversity of a training dataset, we merge an original dataset and a validation dataset to form a new training dataset, which includes 900 high-quality images and corresponding low-resolution images of x2, x3 and x4. All training images are saved format of ‘.png’.

IV-B Test Datasets

To fairly test SR performance of our HDSRNet, four public SR datasets, i.e., Set5 containing 5 natural images [15], Set14 containing 14 natural images [16], BSD100 (B100) containing 100 natural images [17] and Urban100 (U100) containing 100 containing [18] are used to conduct experiments. These datasets include three different scales, i.e., x2, x3 and x4. They are saved as format of ‘.png’.

IV-C Implementation Details

Original parameters of training a HDSRNet are $\beta 1$ of 0.9, $\beta 2$ of 0.999, epsilon of 1e-8, batch size of 64 and initial learning rate of 1e-4, which is halved every 300,000 steps. Also, it more parameters are the same as Ref. [10].

The proposed HDSRNet is implemented by Pytorch 1.8.0 and Python 3.8. And all the experiments run on a workstation with Ubuntu 20.04, which equipped AMD EPYC 7502P CPU and four GPUs of Nvidia GeForce RTX 3090 with Nvidia CUDA 11.1, and cuDNN 8005. All experiments are trained via one 3090 GPU.

IV-D Network Analysis

We analyze architecture of designed HDSRNet containing a heterogeneous upper sub-network, symmetrical lower-network and construction block for image super-resolution, according to its rationality and validity.

Heterogeneous upper sub-network: Most of existing SR models cannot adaptively learn parameters, according to different given low-resolution images [52]. Dynamic convolutions can automatically adjust parameters, according to different inputs [13]. In response to this motivation, we have developed a heterogeneous upper sub-network specifically for image super-resolution. We design a main network, according to VGG [53]. That is, six stacked Conv+ReLU is used as a basic network to extract structural low-frequency information. A combination of six stacked Conv+ReLU and a CB has obtained peak signal-to-noise ratio (PSNR) [54] of 25.13dB and structural similarity (SSIM) [54] of 0.7525 on U100 for x4, which shows effectiveness of six stacked Conv+ReLU. A CB is used to build HR images, which can be shown in latter section. To prevent long-term dependency problem, deep fusion idea [57] is used in this paper. That is, each residual learning operation is used to act between an input and output of Conv+ReLU besides the first Conv+ReLU to transform low-frequency information from shallow layers to deep layers to pursue better performance on super-resolution. Its effectiveness can be shown by comparing ‘Heterogeneous upper sub-network without Dilated Conv+ReLU, Dynamic Conv+ReLU and a CB’ and ‘A combination of six stacked Conv’ in terms of PSNR and SSIM in TABLE I. To extract more context, each dilated Conv+ReLU with dilated factor of 2 is used before each Conv+ReLU besides the first Conv+ReLU in heterogeneous upper sub-network and its effectiveness can be proved by comparing ‘Heterogeneous upper sub-network without Dilated Conv+ReLU, Dynamic Conv+ReLU and a CB’ and ‘Heterogeneous upper sub-network without dynamic Conv+ReLU and a CB’ in terms of PSNR and SSIM on U100 for x4. Specifically, five stacked blocks besides the first Conv+ReLU in the heterogeneous upper sub-network (HUNet) are five heterogeneous blocks (HBs) and its rationality and validity are shown in latter section. To make HUNet robust, each Dynamic Conv+ReLU is set behind each Dilated Conv+ReLU in each HB to dynamically adjust parameters, according to different inputs. Its better SR results can be found via comparing ‘Heterogeneous upper sub-network and a CB’ and ‘Heterogeneous upper sub-network without dynamic Conv+ReLU and a CB’ in TABLE I, where effectiveness of dynamic convolutions is verified. It is known that difference of network architecture is bigger, its performance is better [41]. Motivated by that, we respectively use Conv+ReLU rather than Dynamic Conv+ReLU and Dilated Conv+ReLU to conduct experiments. As shown in TABLE I, we can see that ‘Heterogeneous upper sub-network and a CB’ has obtained higher PSNR and SSIM values than that of ‘Heterogeneous upper sub-network with Conv+ReLU rather than dilated Conv+ReLU’ and ‘Heterogeneous upper sub-network with Conv+ReLU rather than dynamic Conv+ReLU’. This approach demonstrates the effectiveness of dynamic Conv+ReLU and dilated Conv+ReLU in HB, and highlights the advantages of various network architectures within HB for image super-resolution.

Symmetrical lower sub-network: To prevent poor representation of single network for image super-resolution, a symmetrical lower sub-network is designed, which can be used to assist an upper network to extract more complementary structural information to improve recovered high-resolution images. That is implemented by the following phases.

TABLE I: PSNR and SSIM of different methods on U100 for x4.

Scale	Methods	PSNR(dB)/SSIM
×4	A combination of six stacked Conv+ReLU and a CB	25.13/0.7525
	Heterogeneous upper sub-network without Dilated	25.22/0.7559
	Conv+ReLU, Dynamic Conv+ReLU and a CB	25.22/0.7559
	Heterogeneous upper sub-network without dynamic	25.46/0.7649
	Conv+ReLU and a CB	25.46/0.7649
	Heterogeneous upper sub-network without dilated	25.61/0.7697
	Conv+ReLU and a CB	25.61/0.7697
	Heterogeneous upper sub-network with Conv+ReLU	25.74/0.7739
	rather than dilated Conv+ReLU	25.74/0.7739
	Heterogeneous upper sub-network with Conv+ReLU	25.68/0.7729
	rather than dynamic Conv+ReLU	25.68/0.7729
	Heterogeneous upper sub-network and a CB	25.75/0.7740
	HDSRNet without residual learning operations	25.99/0.7825
	in symmetrical lower sub-network	25.99/0.7825
	HDSRNet	26.01/0.7827

TABLE II: PSNR and SSIM of different methods with different upscale factors on Set5.

Dataset	Methods	×2	×3	×4
Dataset	Methods	PSNR(dB)/SSIM	PSNR(dB)/SSIM	PSNR(dB)/SSIM
Set5	Bicubic	33.66/0.9299	30.39/0.8682	28.42/0.8104
	SRCNN [3]	36.66/0.9542	32.75/0.9090	30.48/0.8628
	VDSR [4]	37.53/0.9587	33.66/0.9213	31.35/0.8838
	DRRN [6]	37.74/0.9591	34.03/0.9244	31.68/0.8888
	FSRCNN [9]	37.00/0.9558	33.16/0.9140	30.71/0.8657
	CARN-M [11]	37.53/0.9583	33.99/0.9236	31.92/0.8903
	IDN [12]	37.83/0.9600	34.11/0.9253	31.82/0.8903
	A+ [19]	36.54/0.9544	32.58/0.9088	30.28/0.8603
	JOR [20]	36.58/0.9543	32.55/0.9067	30.19/0.8563
	RFL [21]	36.54/0.9537	32.43/0.9057	30.14/0.8548
	SelfEx [18]	36.49/0.9537	32.58/0.9093	30.31/0.8619
	CSCN [22]	36.93/0.9552	33.10/0.9144	30.86/0.8732
	RED [23]	37.56/0.9595	33.70/0.9222	31.33/0.8847
	DnCNN [24]	37.58/0.9590	33.75/0.9222	31.40/0.8845
	TNRD [25]	36.86/0.9556	33.18/0.9152	30.85/0.8732
	FDSR [26]	37.40/0.9513	33.68/0.9096	31.28/0.8658
	RCN [27]	37.17/0.9583	33.45/0.9175	31.11/0.8736
	DRCN [5]	37.63/0.9588	33.82/0.9226	31.53/0.8854
	LapSRN [60]	37.52/0.9590	-	31.54/0.8850
	NDRCN [63]	37.73/0.9596	33.90/0.9235	31.50/0.8859
	MemNet [29]	37.78/0.9597	34.09/0.9248	31.74/0.8893
	LESRCNN [30]	37.65/0.9586	33.93/0.9231	31.88/0.8903
	LESRCNN-S [30]	37.57/0.9582	34.05/0.9238	31.88/0.8907
	ScSR [31]	35.78/0.9485	31.34/0.8869	29.07/0.8263
	DSRCNN [32]	37.73/0.9588	34.17/0.9247	31.89/0.8909
	DAN [34]	37.34/0.9526	34.04/0.9199	31.89/0.8864
	PGAN [37]	-	-	31.03/0.8798
	HDSRNet (Ours)	37.95/0.9604	34.31/0.9263	32.15/0.8927

TABLE III: PSNR and SSIM of different methods with different upscale factors on Set14.

Dataset	Methods	×2	×3	×4
Dataset	Methods	PSNR(dB)/SSIM	PSNR(dB)/SSIM	PSNR(dB)/SSIM
Set14	Bicubic	30.24/0.8688	27.55/0.7742	26.00/0.7027
	SRCNN [3]	32.42/0.9063	29.28/0.8209	27.49/0.7503
	VDSR [4]	33.03/0.9124	29.77/0.8314	28.01/0.7674
	DRRN [6]	33.23/0.9136	29.96/0.8349	28.21/0.7720
	FSRCNN [9]	32.63/0.9088	29.43/0.8242	27.59/0.7535
	CARN-M [11]	33.26/0.9141	30.08/0.8367	28.42/0.7762
	IDN [12]	33.30/0.9148	29.99/0.8354	28.25/0.7730
	A+ [19]	32.28/0.9056	29.13/0.8188	27.32/0.7491
	JOR [20]	32.38/0.9063	29.19/0.8204	27.27/0.7479
	RFL [21]	32.26/0.9040	29.05/0.8164	27.24/0.7451
	SelfEx [18]	32.22/0.9034	29.16/0.8196	27.40/0.7518
	CSCN [22]	32.56/0.9074	29.41/0.8238	27.64/0.7578
	RED [23]	32.81/0.9135	29.50/0.8334	27.72/0.7698
	DnCNN [24]	33.03/0.9128	29.81/0.8321	28.04/0.7672
	TNRD [25]	32.51/0.9069	29.43/0.8232	27.66/0.7563
	FDSR [26]	33.00/0.9042	29.61/0.8179	27.86/0.7500
	RCN [27]	32.77/0.9109	29.63/0.8269	27.79/0.7594
	DRCN [5]	33.04/0.9118	29.76/0.8311	28.02/0.7670
	LapSRN [60]	33.08/0.9130	29.63/0.8269	28.19/0.7720
	NDRCN [63]	33.20/0.9141	29.88/0.8333	28.10/0.7697
	MemNet [29]	33.28/0.9142	30.00/0.8350	28.26/0.7723
	LESRCNN [30]	33.32/0.9148	30.12/0.8380	28.44/0.7772
	LESRCNN-S [30]	33.30/0.9145	30.16/0.8384	28.43/0.7776
	ScSR [31]	31.64/0.8940	28.19/0.7977	26.40/0.7218
	DSRCNN [32]	33.43/0.9157	30.24/0.8402	28.46/0.7796
	DAN [34]	33.08/0.9041	30.09/0.8287	28.42/0.7687
	PGAN [37]	-	-	27.75/0.8164
	HDSRNet (Ours)	33.52/0.9170	30.27/0.8412	28.56/0.7799

TABLE IV: PSNR and SSIM of different methods with different upscale factors on B100.

Dataset	Methods	×2	×3	×4
Dataset	Methods	PSNR(dB)/SSIM	PSNR(dB)/SSIM	PSNR(dB)/SSIM
B100	Bicubic	29.56/0.8431	27.21/0.7385	25.96/0.6675
	SRCNN [3]	31.36/0.8879	28.41/0.7863	26.90/0.7101
	VDSR [4]	31.90/0.8960	28.82/0.7976	27.29/0.7251
	DRRN [6]	32.05/0.8973	28.95/0.8004	27.38/0.7284
	FSRCNN [9]	31.53/0.8920	28.53/0.7910	26.98/0.7150
	CARN-M [11]	31.92/0.8960	28.91/0.8000	27.44/0.7304
	IDN [12]	32.08/0.8985	28.95/0.8013	27.41/0.7297
	A+ [19]	31.21/0.8863	28.29/0.7835	26.82/0.7087
	JOR [20]	31.22/0.8867	28.27/0.7837	26.79/0.7083
	RFL [21]	31.16/0.8840	28.22/0.7806	26.75/0.7054
	SelfEx [18]	31.18/0.8855	28.29/0.7840	26.84/0.7106
	CSCN [22]	31.40/0.8884	28.50/0.7885	27.03/0.7161
	RED [23]	31.96/0.8972	28.88/0.7993	27.35/0.7276
	DnCNN [24]	31.90/0.8961	28.85/0.7981	27.29/0.7253
	TNRD [25]	31.40/0.8878	28.50/0.7881	27.00/0.7140
	FDSR [26]	31.87/0.8847	28.82/0.7797	27.31/0.7031
	DRCN [5]	31.85/0.8942	28.80/0.7963	27.23/0.7233
	LapSRN [60]	31.80/0.8950	-	27.32/0.7280
	NDRCN [63]	32.00/0.8975	28.86/0.7991	27.30/0.7263
	MemNet [29]	32.08/0.8978	28.96/0.8001	27.40/0.7281
	LESRCNN [30]	31.95/0.8964	28.91/0.8005	27.45/0.7313
	LESRCNN-S [30]	31.95/0.8965	28.94/ 0.8012	27.47/0.7321
	ScSR [31]	30.77/0.8744	27.72/0.7647	26.61/0.6983
	DSRCNN [32]	32.05/0.8978	29.01/0.8029	27.50/0.7341
	DAN [34]	31.76/0.8858	28.94/0.7919	27.51/0.7248
	PGAN [37]	-	-	26.35/0.6926
	HDSRNet (Ours)	32.14/0.8994	29.06/0.8048	27.55/0.7357

TABLE V: PSNR and SSIM of different methods with different upscale factors on U100.

Dataset	Methods	×2	×3	×4
Dataset	Methods	PSNR(dB)/SSIM	PSNR(dB)/SSIM	PSNR(dB)/SSIM
U100	Bicubic	26.88/0.8403	24.46/0.7349	23.14/0.6577
	SRCNN [3]	29.50/0.8946	26.24/0.7989	24.52/0.7221
	VDSR [4]	30.76/0.9140	27.14/0.8279	25.18/0.7524
	DRRN [6]	31.23/0.9188	27.53/0.8378	25.44/0.7638
	FSRCNN [9]	29.88/0.9020	26.43/0.8080	24.62/0.7280
	CARN-M [11]	31.23/0.9193	27.55/0.8385	25.62/0.7694
	IDN [12]	31.27/0.9196	27.42/0.8359	25.41/0.7632
	A+ [19]	29.20/0.8938	26.03/0.7973	24.32/0.7183
	JOR [20]	29.25/0.8951	25.97/0.7972	24.29/0.7181
	RFL [21]	29.11/0.8904	25.86/0.7900	24.19/0.7096
	SelfEx [18]	29.54/0.8967	26.44/0.8088	24.79/0.7374
	RED [23]	30.91/0.9159	27.31/0.8303	25.35/0.7587
	DnCNN [24]	30.74/0.9139	27.15/0.8276	25.20/0.7521
	TNRD [25]	29.70/0.8994	26.42/0.8076	24.61/0.7291
	FDSR [26]	30.91/0.9088	27.23/0.8190	25.27/0.7417
	DRCN [5]	30.75/0.9133	27.15/0.8276	25.14/0.7510
	LapSRN [60]	30.41/0.9100	-	25.21/0.7560
	WaveResNet [61]	30.96/0.9169	27.28/0.8334	25.36/0.7614
	CPCA [62]	28.17/0.8990	25.61/0.8123	23.62/0.7257
	NDRCN [63]	31.06/0.9175	27.23/0.8312	25.16/0.7546
	MemNet [29]	31.31/0.9195	27.56/0.8376	25.50/0.7630
	LESRCNN [30]	31.45/0.9206	27.70/0.8415	25.77/0.7732
	LESRCNN-S [30]	31.45/0.9207	27.76/0.8424	25.78/0.7739
	ScSR [31]	28.26/0.8828	-	24.02/0.7024
	DSRCNN [32]	31.83/0.9252	27.99/0.8483	25.94/0.7815
	DAN [34]	30.60/0.9060	27.65/0.8352	25.86/0.7721
	LDRAN[36]	-	-	25.91/0.7786
	PGAN [37]	-	-	25.47/0.9574
	HDSRNet (Ours)	32.00/0.9267	28.02/0.8493	26.01/0.7827

TABLE VI: Running time (millisecond) of different methods for image super-resolution via different sizes for

\times 4

Image sizes	VDSR [4]	CARN-M [11]	ACNet [38]	HDSRNet (Ours)
256 × 256	17.2	15.9	18.38	14.17
512 × 512	57.5	19.9	75.79	26.11

TABLE VII: Parameters and flops of different methods with scale factor of 4 for restoration high-quality images of

1024\times 1024

Methods	Parameters	Flops
DCLS [35]	13,626K	498.18G
ACNet [38]	1,357K	132.82G
HDSRNet (Ours)	1,819K	110.99G

The first phase design a 16-layer stacked Conv+ReLU to extract low-frequency structural information, according to VGG [53]. Its effectiveness can be proved via ‘Heterogeneous upper sub-network and a CB’ and ‘HDSRNet without residual learning operations in symmetrical lower sub-network’ in TABLE I. To enhance relationship of hierarchical features, the second phase is conducted. It uses residual learning operations to merge obtained information of shallow and deep layers to achieve a symmetrical lower sub-network to extract more accurate information as shown in Section III.D. As illustrated in TABLE I, we can see that ‘HDSRNet’ has obtained higher PSNR and SSIM than that of HDSRNet without residual learning operations in symmetrical lower sub-network, which shows effectiveness of residual learning operations in the symmetrical lower sub-network for image SR. Besides, ’HDSRNet’ has obtained higher PSNR value than that of ’Heterogeneous upper sub-network and a CB’ in TABLE I, which shows effectiveness of Heterogeneous parallel networks for image super-resolution.

Construction block: To construct HR images, we design a 2-layer construction block. It consists of a sub-pixel convolutional layer and a convolutional layer. Sub-pixel convolutional layer is used to amplify low-frequency information to high-frequency information. A convolutional layer is utilized to construct predicted HR images.

IV-E Comparisons with Popular Methods for SR

In this section, we use both quantitative and qualitative analysis to evaluate results of our HDSRNet on SISR. Quantitative analysis includes PSNR, SSIM, running time and complexity of our HDSRNet. Specifically, PSNR and SSIM are used to measure quality of predicted HR images. Also, running time and parameters are used to test feasibility of our HDSRNet for real applications, i.e., phones and cameras. We use A+ [19], jointly optimized regressors (JOR) [20], image upscaling with super-resolution forests (RFL) [21], self-exemplars SR method (SelfEx) [18], cascade of sparse coding based network (CSCN) [22], image restoration using encoder-decoder networks (RED) [23], denoising convolutional neural network (DnCNN) [24], trainable non-linear reaction diffusion (TNRD) [25], fast dilated residual SR convolutional net-work (FDSR) [26], SRCNN [3], FSRCNN [9], residue context sub-network (RCN) [27], VDSR [4], deeply-recursive convolutional network (DRCN) [5], IDN [12], DRRN [6], Laplacian SR network (LapSRN) [60], new architecture of deep recursive convolution networks for SR (NDRCN) [63], a persistent memory network (MemNet) [29], CARN-M [11], light-weight image super-resolution with enhanced CNN (LESRCNN) [30], deep alternating network (DAN) [34], Pixel-Level Generative Adversarial Network (PGAN) [37] and our HDSRNet on four public datasets, i.e., Set5, Set14, B100 and U100 for x2, x3 and x4 to conduct experiments. As shown in TABLEs II and III, we can see that our HDSRNet has obtained the best performance in terms of PSNR and SSIM for x2, x3 and x4. For instance, our HDSRNet has improvements of 0.12dB on PSNR and 0.004 on SSIM on Set 5 for x2 than that of IDN in TABLE II. Our HDSRNet has obtained improvements of 0.1dB on PSNR and 0.003 on SSIM on Set14 for x4 than that of DSRCNN in TABLE III. That also shows that our HDSRNet is effective on small datasets for image super-resolution. For big datasets, our HDSRNet still has an advantage for image super-resolution in TABLEs IV and V, we can see that our HDSRNet has obtained the best SR results for all the scale factors, i.e., x2, x3 and x4. For instance, our HDSRNet has exceeded 0.09dB on PSNR and 0.0016 on SSIM than that of DSRCNN for x2 on B100 in TABLE IV. Our HDSRNet has exceeded 0.07dB on PSNR and 0.0012 on SSIM than that of DSRCNN for x4 on U100 in TABLE V. That shows that our HDSRNet is suitable to big datasets for image super-resolution. Specifically, red and blue lines denote the best and second SR results from TABLE II to TABLE V, respectively.

To test practicality of our HDSRNet, we use running time and complexity to test performance of our HDSRNet for image SR. As shown in TABLE VI, our HDSRNet has obtained competitive running time for restoring low-resolution images sizes of $256\times 256$ and $512\times 512$ . As illustrated in TABLE VII, although our HDSRNet has obtained more parameters than that of ACNet, it has obtained less flops than that of ACNet. Thus, it is competitive in complexity and suitable for application in consumer electronic products. According to mentioned experiment results, we can find that our HDSRNet is effective in terms of quantitative analysis.

For qualitative analysis, we use Bicubic, SRCNN, LESRCNN, DCLS, VDSR, DRCN and HDSRNet on a low-resolution image from the B100 and U100 for x3 and x4 to recover high-quality images, which are used to compare with given HR images. That is, we amplify one area of predicted high-quality images from different methods as observation areas, observation areas are clearer, their corresponding methods are more effective for image super-resolution. As shown in Figs.2-7, we can see that our HDSRNet has obtained clearer areas, it shows that our HDSRNet is effective for qualitative analysis. In a summary, our HDSRNet is a good tool for image resolution, according to quantitative and qualitative analysis.

V Conclusion

In this paper, we propose a heterogeneous dynamic convolutional network in SISR. This paper designs a heterogeneous dynamic convolutional network to capture more structural information. The upper network depends on stacked heterogeneous blocks to facilitate more contexture information for improving effects of image super-resolution. Also, each heterogeneous block is composed of a dilated, dynamic, common convolutional layers, ReLU and residual learning operation is used to adjust parameters for different inputs and prevent long-term dependency problem. The lower network uses symmetric architecture to enhance relations of different layers to extract more structural information for SISR. We will deal with SISR with non-reference images in the future.

References

[1] K. Zhang, W. Zuo, and L. Zhang, “Deep plug-and-play super-resolution for arbitrary blur kernels,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 1671–1681.
[2] X. Zhou, H. Huang, R. He, Z. Wang, J. Hu, and T. Tan, “Msra-sr: Image super-resolution transformer with multi-scale shared representation acquisition,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 12 665–12 676.
[3] C. Dong, C. C. Loy, K. He, and X. Tang, “Image super-resolution using deep convolutional networks,” IEEE transactions on pattern analysis and machine intelligence, vol. 38, no. 2, pp. 295–307, 2015.
[4] J. Kim, J. K. Lee, and K. M. Lee, “Accurate image super-resolution using very deep convolutional networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 1646–1654.
[5] J. Kim, J. K. Lee, and K. M. Lee, “Deeply-recursive convolutional network for image super-resolution,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 1637–1645.
[6] Y. Tai, J. Yang, and X. Liu, “Image super-resolution via deep recursive residual network,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 3147–3155.
[7] V. Dumoulin and F. Visin, “A guide to convolution arithmetic for deep learning (2016),” arXiv preprint arXiv:1603.07285, 2016.
[8] W. Shi, J. Caballero, F. Huszár, J. Totz, A. P. Aitken, R. Bishop, D. Rueckert, and Z. Wang, “Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 1874–1883.
[9] C. Dong, C. C. Loy, and X. Tang, “Accelerating the super-resolution convolutional neural network,” in Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part II 14. Springer, 2016, pp. 391–407.
[10] B. Lim, S. Son, H. Kim, S. Nah, and K. Mu Lee, “Enhanced deep residual networks for single image super-resolution,” in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2017, pp. 136–144.
[11] N. Ahn, B. Kang, and K.-A. Sohn, “Fast, accurate, and lightweight super-resolution with cascading residual network,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 252–268.
[12] Z. Hui, X. Wang, and X. Gao, “Fast and accurate single image super-resolution via information distillation network,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 723–731.
[13] Y. Chen, X. Dai, M. Liu, D. Chen, L. Yuan, and Z. Liu, “Dynamic convolution: Attention over convolution kernels,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 11 030–11 039.
[14] E. Agustsson and R. Timofte, “Ntire 2017 challenge on single image super-resolution: Dataset and study,” in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2017, pp. 126–135.
[15] M. Bevilacqua, A. Roumy, C. Guillemot, and M. L. Alberi-Morel, “Low-complexity single-image super-resolution based on nonnegative neighbor embedding,” 2012.
[16] R. Zeyde, M. Elad, and M. Protter, “On single image scale-up using sparse-representations,” in Curves and Surfaces: 7th International Conference, Avignon, France, June 24-30, 2010, Revised Selected Papers 7. Springer, 2012, pp. 711–730.
[17] D. Martin, C. Fowlkes, D. Tal, and J. Malik, “A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics,” in Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001, vol. 2. IEEE, 2001, pp. 416–423.
[18] J.-B. Huang, A. Singh, and N. Ahuja, “Single image super-resolution from transformed self-exemplars,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 5197–5206.
[19] R. Timofte, V. De Smet, and L. Van Gool, “A+: Adjusted anchored neighborhood regression for fast super-resolution,” in Computer Vision–ACCV 2014: 12th Asian Conference on Computer Vision, Singapore, Singapore, November 1-5, 2014, Revised Selected Papers, Part IV 12. Springer, 2015, pp. 111–126.
[20] D. Dai, R. Timofte, and L. Van Gool, “Jointly optimized regressors for image super-resolution,” in Computer Graphics Forum, vol. 34, no. 2. Wiley Online Library, 2015, pp. 95–104.
[21] S. Schulter, C. Leistner, and H. Bischof, “Fast and accurate image upscaling with super-resolution forests,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 3791–3799.
[22] Z. Wang, D. Liu, J. Yang, W. Han, and T. Huang, “Deep networks for image super-resolution with sparse prior,” in Proceedings of the IEEE international conference on computer vision, 2015, pp. 370–378.
[23] X. Mao, C. Shen, and Y.-B. Yang, “Image restoration using very deep convolutional encoder-decoder networks with symmetric skip connections,” Advances in neural information processing systems, vol. 29, 2016.
[24] K. Zhang, X. Gao, D. Tao, and X. Li, “Single image super-resolution with non-local means and steering kernel regression,” IEEE Transactions on Image Processing, vol. 21, no. 11, pp. 4544–4556, 2012.
[25] Y. Chen and T. Pock, “Trainable nonlinear reaction diffusion: A flexible framework for fast and effective image restoration,” IEEE transactions on pattern analysis and machine intelligence, vol. 39, no. 6, pp. 1256–1272, 2016.
[26] Z. Lu, Z. Yu, P. Yali, L. Shigang, W. Xiaojun, L. Gang, and R. Yuan, “Fast single image super-resolution via dilated residual networks,” IEEE Access, vol. 7, pp. 109 729–109 738, 2018.
[27] Y. Shi, K. Wang, C. Chen, L. Xu, and L. Lin, “Structure-preserving image super-resolution via contextualized multitask learning,” IEEE transactions on multimedia, vol. 19, no. 12, pp. 2804–2815, 2017.
[28] H. Ren, M. El-Khamy, and J. Lee, “Image super resolution based on fusing multiple convolution neural networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2017, pp. 54–61.
[29] Y. Tai, J. Yang, X. Liu, and C. Xu, “Memnet: A persistent memory network for image restoration,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 4539–4547.
[30] C. Tian, R. Zhuge, Z. Wu, Y. Xu, W. Zuo, C. Chen, and C.-W. Lin, “Lightweight image super-resolution with enhanced cnn,” Knowledge-Based Systems, vol. 205, p. 106235, 2020.
[31] J. Yang, J. Wright, T. S. Huang, and Y. Ma, “Image super-resolution via sparse representation,” IEEE transactions on image processing, vol. 19, no. 11, pp. 2861–2873, 2010.
[32] J. Song, J. Xiao, C. Tian, Y. Hu, L. You, and S. Zhang, “A dual cnn for image super-resolution,” Electronics, vol. 11, no. 5, p. 757, 2022.
[33] C. Tian, Y. Zhang, W. Zuo, C.-W. Lin, D. Zhang, and Y. Yuan, “A heterogeneous group cnn for image super-resolution,” IEEE transactions on neural networks and learning systems, 2022.
[34] Y. Huang, S. Li, L. Wang, T. Tan et al., “Unfolding the alternating optimization for blind super resolution,” Advances in Neural Information Processing Systems, vol. 33, pp. 5632–5643, 2020.
[35] Z. Luo, H. Huang, L. Yu, Y. Li, H. Fan, and S. Liu, “Deep constrained least squares for blind image super-resolution,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 17 642–17 652.
[36] J. Sahambi et al., “A lightweight deep residual attention network for single image super resolution,” in 2023 National Conference on Communications (NCC). IEEE, 2023, pp. 1–6.
[37] W. Shi, F. Tao, and Y. Wen, “Structure-aware deep networks and pixel-level generative adversarial training for single image super-resolution,” IEEE Transactions on Instrumentation and Measurement, vol. 72, pp. 1–14, 2023.
[38] C. Tian, Y. Xu, W. Zuo, C.-W. Lin, and D. Zhang, “Asymmetric cnn for image superresolution,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 52, no. 6, pp. 3718–3730, 2021.
[39] C. Tian, X. Zhang, J. C.-W. Lin, W. Zuo, Y. Zhang, and C.-W. Lin, “Generative adversarial networks for image super-resolution: A survey,” arXiv preprint arXiv:2204.13620, 2022.
[40] F. Yu, X. Wang, M. Cao, G. Li, Y. Shan, and C. Dong, “Osrt: Omnidirectional image super-resolution with distortion-aware transformer,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 13 283–13 292.
[41] Y. Zhang, Y. Tian, Y. Kong, B. Zhong, and Y. Fu, “Residual dense network for image super-resolution,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 2472–2481.
[42] S. Wan, C. Gong, P. Zhong, B. Du, L. Zhang, and J. Yang, “Multiscale dynamic graph convolutional network for hyperspectral image classification,” IEEE Transactions on Geoscience and Remote Sensing, vol. 58, no. 5, pp. 3162–3177, 2019.
[43] Y. Ding, J. Feng, Y. Chong, S. Pan, and X. Sun, “Adaptive sampling toward a dynamic graph convolutional network for hyperspectral image classification,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–17, 2021.
[44] Z. Duan, T. Zhang, X. Luo, and J. Tan, “Dckn: multi-focus image fusion via dynamic convolutional kernel network,” Signal Processing, vol. 189, p. 108282, 2021.
[45] C. Dai, Z. Guan, and M. Lin, “Single low-light image enhancer using taylor expansion and fully dynamic convolution,” Signal Processing, vol. 189, p. 108280, 2021.
[46] J. Hou, Z. Guo, Y. Wu, W. Diao, and T. Xu, “Bsnet: Dynamic hybrid gradient convolution based boundary-sensitive network for remote sensing image segmentation,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–22, 2022.
[47] H. Shen, Z.-Q. Zhao, and W. Zhang, “Adaptive dynamic filtering network for image denoising,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 2, 2023, pp. 2227–2235.
[48] C. Tian, M. Zheng, W. Zuo, B. Zhang, Y. Zhang, and D. Zhang, “Multi-stage image denoising with the wavelet transform,” Pattern Recognition, vol. 134, p. 109050, 2023.
[49] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
[50] F. Yu and V. Koltun, “Multi-scale context aggregation by dilated convolutions,” arXiv preprint arXiv:1511.07122, 2015.
[51] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Advances in neural information processing systems, vol. 25, 2012.
[52] Y.-S. Xu, S.-Y. R. Tseng, Y. Tseng, H.-K. Kuo, and Y.-M. Tsai, “Unified dynamic convolutional network for super-resolution with variational degradations,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 12 496–12 505.
[53] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
[54] A. Hore and D. Ziou, “Image quality metrics: Psnr vs. ssim,” in 2010 20th international conference on pattern recognition. IEEE, 2010, pp. 2366–2369.
[55] K. Jiang, Z. Wang, P. Yi, and J. Jiang, “Hierarchical dense recursive network for image super-resolution,” Pattern Recognition, vol. 107, p. 107475, 2020.
[56] M. Hu, K. Jiang, Z. Wang, X. Bai, and R. Hu, “Cycmunet+: Cycle-projected mutual learning for spatial-temporal video super-resolution,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
[57] K. Jiang, Z. Wang, P. Yi, T. Lu, J. Jiang, and Z. Xiong, “Dual-path deep fusion network for face image hallucination,” IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no. 1, pp. 378–391, 2020.
[58] Z. Zha, X. Yuan, B. Wen, J. Zhou, J. Zhang, and C. Zhu, “From rank estimation to rank approximation: Rank residual constraint for image restoration,” IEEE Transactions on Image Processing, vol. 29, pp. 3254–3269, 2020.
[59] Z. Zha, X. Yuan, B. Wen, J. Zhou, and C. Zhu, “Group sparsity residual constraint with non-local priors for image restoration,” IEEE Transactions on Image Processing, vol. 29, pp. 8960–8975, 2020.
[60] W.-S. Lai, J.-B. Huang, N. Ahuja, and M.-H. Yang, “Deep laplacian pyramid networks for fast and accurate super-resolution,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 624–632.
[61] W. Bae, J. Yoo, and J. Chul Ye, “Beyond deep residual learning for image restoration: Persistent homology-guided manifold simplification,” in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2017, pp. 145–153.
[62] J. Xu, M. Li, J. Fan, X. Zhao, and Z. Chang, “Self-learning super-resolution using convolutional principal component analysis and random matching,” IEEE Transactions on Multimedia, vol. 21, no. 5, pp. 1108–1121, 2018.
[63] F. Cao and B. Chen, “New architecture of deep recursive convolution networks for super-resolution,” Knowledge-Based Systems, vol. 178, pp. 98–110, 2019.
[64] S. Wang and Y. Zhang, “Deep learning for covid-19 diagnosis via chest images.”
[65] Y. Zhang, L. Deng, H. Zhu, W. Wang, Z. Ren, Q. Zhou, S. Lu, S. Sun, Z. Zhu, J. M. Gorriz et al., “Deep learning in food category recognition,” Information Fusion, p. 101859, 2023.
[66] Y.-D. Zhang, Z. Dong, S.-H. Wang, X. Yu, X. Yao, Q. Zhou, H. Hu, M. Li, C. Jiménez-Mesa, J. Ramirez et al., “Advances in multimodal data fusion in neuroimaging: Overview, challenges, and novel orientation,” Information Fusion, vol. 64, pp. 149–187, 2020.
[67] X. Zhou, Q. Yang, X. Zheng, W. Liang, I. Kevin, K. Wang, J. Ma, Y. Pan, and Q. Jin, “Personalized federation learning with model-contrastive learning for multi-modal user modeling in human-centric metaverse,” IEEE Journal on Selected Areas in Communications, 2024.
[68] X. Zhou, Q. Yang, Q. Liu, W. Liang, K. Wang, Z. Liu, J. Ma, and Q. Jin, “Spatial–temporal federated transfer learning with multi-sensor data fusion for cooperative positioning,” Information Fusion, vol. 105, p. 102182, 2024.
[69] X. Zhou, X. Zheng, X. Cui, J. Shi, W. Liang, Z. Yan, L. T. Yang, S. Shimizu, I. Kevin, and K. Wang, “Digital twin enhanced federated reinforcement learning with lightweight knowledge distillation in mobile networks,” IEEE Journal on Selected Areas in Communications, 2023.
[70] X. Zhou, X. Zheng, T. Shu, W. Liang, I. Kevin, K. Wang, L. Qi, S. Shimizu, and Q. Jin, “Information theoretic learning-enhanced dual-generative adversarial networks with causal representation for robust ood generalization,” IEEE Transactions on Neural Networks and Learning Systems, 2023.