Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
License: arXiv.org perpetual non-exclusive license
arXiv:2402.15704v1 [eess.IV] 24 Feb 2024

A Heterogeneous Dynamic Convolutional Neural Network for Image Super-resolution

Chunwei Tian, Member, IEEE, Xuanyu Zhang, Jia Ren, Wangmeng Zuo, Senior Member, IEEE, Yanning Zhang, Senior Member, IEEE, Chia-Wen Lin, Fellow, IEEE
This work was supported in part by National Natural Science Foundation of China under Grant 62201468, in part by the Guangdong Basic and Applied Basic Research Foundation under Grant 2021A1515110079, in part by the Youth Science and Technology Talent Promotion Project of Jiangsu Association for Science and Technology under Grant JSTJ-2023-017. (Corresponding author: Chunwei Tian (Email:chunweitian@nwpu.edu.cn) and Yanning Zhang (Email: ynzhang@nwpu.edu.cn).)Chunwei Tian is with the School of Software, Northwestern Polytechnical University, Xi’an, 710129, China. He is with the National Engineering Laboratory for Integrated Aero-Space-Ground-Ocean Big Data Application Technology, Xi’an, 710129, China. He is with the Research & Development Institute, Northwestern Polytechnical University, Shenzhen, 518057, China. (Email:chunweitian@nwpu.edu.cn)Xuanyu Zhang is with the School of Software, Northwestern Polytechnical University, Xi’an, 710129, China. (Email: xuanyuzhang@mail.nwpu.edu.cn)Jia Ren is with the School of Information and Communication Engineering, Hainan University, Haikou, 570228, China. (Email: renjia@hainanu.edu)Wangmeng Zuo is with the School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, China. (Email:cswmzuo@gmail.com)Yanning Zhang is with the School of Computer Science, Northwestern Polytechnical University, Xi’an, 710129, China. She is with the National Engineering Laboratory for Integrated Aero-Space-Ground-Ocean Big Data Application Technology, Xi’an, 710129, China. (Email: ynzhang@nwpu.edu.cn)Chia-Wen Lin is with the Department of Electrical Engineering and the Institute of Communications Engineering, National Tsing Hua University. (Email: cwlin@ee.nthu.edu.tw)
Abstract

Convolutional neural networks can automatically learn features via deep network architectures and given input samples. However, robustness of obtained models may have challenges in varying scenes. Bigger differences of a network architecture are beneficial to extract more complementary structural information to enhance robustness of an obtained super-resolution model. In this paper, we present a heterogeneous dynamic convolutional network in image super-resolution (HDSRNet). To capture more information, HDSRNet is implemented by a heterogeneous parallel network. The upper network can facilitate more contexture information via stacked heterogeneous blocks to improve effects of image super-resolution. Each heterogeneous block is composed of a combination of a dilated, dynamic, common convolutional layers, ReLU and residual learning operation. It can not only adaptively adjust parameters, according to different inputs, but also prevent long-term dependency problem. The lower network utilizes a symmetric architecture to enhance relations of different layers to mine more structural information, which is complementary with a upper network for image super-resolution. The relevant experimental results show that the proposed HDSRNet is effective to deal with image resolving. The code of HDSRNet can be obtained at https://github.com/hellloxiaotian/HDSRNet.

Index Terms:
Dynamic convolutions, dilated convolutions, heterogeneous networks, image super-resolution.

I Introduction

Single image super-resolution (SISR) techniques can obtain high-quality images from given low-resolution images (LR), according to solution of ill-posed inverse problem [58, 59]. In recent years, machine learning techniques have obtained huge achievements in many applications, i.e., disease diagnosis [64], classification [70], object recognition [65], multimodal data fusion [66, 68], Metaverse [67, 69] and SISR [2]. Specifically, deep learning techniques with end-to-end architectures have obtained higher performance in image super-resolution. Deep convolutional neural networks (CNNs) use end-to-end architectures rather than manual setting parameters to obtain strong learning abilities to improve visual effects of image super-resolution (SR) [55]. For instance, Dong et al. designed 3-layer CNN in pixel mapping manner to convert a low-resolution (LR) image to a high-resolution (HR) image [3]. Although it can improve resolution of predicted images, it cannot make a tradeoff between network depth and performance in image super-resolution. To address this issue, scholars try to enlarge network depth to pursue improved performance in image super-resolution [4]. Stacking some small filters is used to achieve a very deep SR network [4]. Also, a residual learning technique is used on a deep layer of a deep network to make a balance between SR performance and computational costs. A deeply-recursive convolutional network (DRCN) [5] and deep recursive residual network (DRRN) [6] can exploit recursive networks to decrease parameters of training a SR model. To reduce computational complexities, some up-sampling operations, i.e., a transposed convolution [7], deconvolution and a sub-pixel convolution [8] are set on a deep layer to amplify low-frequency features for constructing HR images. For instance, a fast SR convolutional neural network (FSRCNN) [9] directly inputted low-resolution images to a deep CNN to obtain low-frequency features and applied a sub-pixel convolutional layer to transform low-frequency features to high-frequency features and obtain HR images. To further improve SR performance, an enhanced deep SR network as well as EDSR used enhanced residual blocks to extract more structural information for image super-resolution [10]. Also, deleting many unnecessary batch normalization can improve effect of training efficiency a SR model [10]. Although these algorithms are effective for SISR, they still may suffer from challenges for complex scenes.

In this work, we propose a heterogeneous dynamic convolutional network for SISR as well as HDSRNet. HDSRNet uses a heterogeneous parallel network to capture more information to improve performance of image SR. The upper network uses stacked heterogeneous blocks to extract more contexture information in image SR. Each heterogeneous block includes a dilated, dynamic, common convolutional layer, ReLU and residual learning operation to adaptively adjust parameters, according to different inputs. Also, it can prevent long-term dependency problem. The lower network can use a symmetric architecture to enhance relationships of different layers to obtain more complementary structural information. Quantitative and qualitative analysis show that the proposed method is a good tool for image super-resolution.

Our main contributions can be summarized as follows.

(1) A heterogeneous parallel network is used to facilitate complementary structural information to improve performance for image super-resolution in terms of contextual and hierarchical information.

(2) Dynamic convolutions are embedded into a convolutional neural network to enhance robustness of obtained super-resolution model for complex scenes.

(3) An enhanced residual architecture is designed to address long-term dependency for image super-resolution.

The remaining of this paper is organized as follows. Section II describes related work. Section III illustrates the proposed method. Section IV collects massive comparative experimental results. And Section V summarizes our work.

II Related Work

II-A Deep CNNs for Image Super-resolution

Due to shooting distance, captured images by cameras are unclear. Also, traditional image super-resolution methods need manual setting parameters and complex optimization parameters. To overcome this issue, deep learning techniques are extended for SISR [39, 40]. For instance, a deep recursive residual network used local residual connection and unit to improve effects of image super-resolution [6]. Alternatively, Yang et al. used recursive and gate units to transfer information shallow layers to deep layers to address long-term dependency problem [29]. To improve efficiency of image SR, given LR images are directly as an input of a convolutional neural network and high-quality images are constructed via an up-sampling operation in deep layer [9]. For instance, Ahn et al. used group convolutions to implement a cascading residual CNN to extract more robustness low-frequency information and decrease the number of parameters without causing significantly performance loss in SISR [11]. Tian et al. exploited symmetric group convolutional block to enhance relations of different channels to mine more accurate low-frequency information for image super-resolution [33]. Zhang et al. extended the depth of network and fully used hierarchical information to facilitate more low-frequency information for image SR [41]. According to mentioned methods, we can see that deep CNNs are good tools to obtain clearer images. Motivated by that, we propose a deep CNN for image super-resolution.

II-B Dynamic convolution

Existing CNNs can share parameters to extract useful features to better represent images. However, they may suffer from challenges from varying input images in complex scenes [13]. To address this problem, dynamic convolutions are presented [13]. That is, they can dynamically adjust parameters to adaptively learn models for image applications, according to different inputs. For instance, Wan et al. achieved a graph convolution by arbitrarily structuring non Euclidean data for hyperspectral image classification [42]. Alternatively, Ding et al. can atomically find valuable receptive field of each target node to adaptively capture neighbor information in hyperspectral image classification [43]. To deal with native effects from incurring artifacts of textureless and edge regions, Duan et al. can dynamically generate kernels by region context from input images to better achieve image fusion [44]. To adaptively adjust brightness of enhanced images, Dai et al. fused Taylor expansion and dynamic convolution into a Retinex to intelligently improve clarity of low-light images and degree of brightness flexibly [45]. To extract more context information, Hou et al. proposed dynamic hybrid gradient convolution and coordinate sensitive attention to enhance boundary information extraction for remote sensing image segmentation [46]. To find more high-frequency information, Shen et al. used a spatially enhanced kernel generation to dynamic learn high-frequency and multi-scale features to better achieve denoising effect [47]. Additionally, Tian et al. combined dynamic convolution and wavelet transform to train an adaptive denoiser, according to different given noisy images [48]. According to illustrations, we find that dynamic convolutions are effective for image applications. Motivated by that, we use dynamic convolutions in a CNN for image super-resolution in this work.

III Proposed Method

III-A Network Architecture

The proposed 18-layer HDSRNet contains two 16-layer parallel heterogeneous networks and a 2-layer construction block. The 16-layer parallel heterogeneous is composed of a 16-layer heterogeneous upper network and symmetrical lower network. The heterogeneous upper network is composed of a Conv+ReLU and five stacked heterogeneous blocks, which can extract more contexture information for image super-resolution. A Conv+ReLU is a composite of a convolutional layer and a ReLU operation, which is used to extract non-linear information from given low-resolution images. Also, its input and output channel numbers are 3 and 64, respectively. Its kernel size is 3×3333\times 33 × 3.These stacked heterogeneous blocks utilize different convolutional layers (i.e., dynamic and common convolutional layers) and ReLU to dynamically adjust parameters to obtain robust low-frequency information, according to low-resolution images of different inputs. To obtain complementary low-frequency information, a 16-layer symmetrical lower network is designed. It depends on a symmetrical architecture to enhance inter hierarchical connections to obtain more complementary structural information. Also, two sub-networks can interact information via a multiplication operation. A 2-layer construction block is used to transform low-frequency information to high-frequency information and construct predicted high-quality images. The process can be illustrated via the following formulates.

IS=HDSRNet(IL)=CB(HUNet(IL)×SLNet(IL))=CB((5HB(CR(IL)))×SLNet(IL))=CB(OHUNet×OSLNet)subscript𝐼𝑆𝐻𝐷𝑆𝑅𝑁𝑒𝑡subscript𝐼𝐿absent𝐶𝐵𝐻𝑈𝑁𝑒𝑡subscript𝐼𝐿𝑆𝐿𝑁𝑒𝑡subscript𝐼𝐿absent𝐶𝐵5𝐻𝐵𝐶𝑅subscript𝐼𝐿𝑆𝐿𝑁𝑒𝑡subscript𝐼𝐿absent𝐶𝐵subscript𝑂𝐻𝑈𝑁𝑒𝑡subscript𝑂𝑆𝐿𝑁𝑒𝑡\begin{array}[]{l}{I_{S}}=HDSRNet({I_{L}})\\ {\rm{}\quad}=CB{\rm{(}}HUNet({I_{L}})\times SLNet({I_{L}}))\\ {\rm{}\quad}=CB{\rm{((}}5HB{\rm{(}}CR({I_{L}})){\rm{)}}\times SLNet({I_{L}}))% \\ {\rm{}\quad}=CB({O_{HUNet}}\times{O_{SLNet}})\end{array}start_ARRAY start_ROW start_CELL italic_I start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT = italic_H italic_D italic_S italic_R italic_N italic_e italic_t ( italic_I start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ) end_CELL end_ROW start_ROW start_CELL = italic_C italic_B ( italic_H italic_U italic_N italic_e italic_t ( italic_I start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ) × italic_S italic_L italic_N italic_e italic_t ( italic_I start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ) ) end_CELL end_ROW start_ROW start_CELL = italic_C italic_B ( ( 5 italic_H italic_B ( italic_C italic_R ( italic_I start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ) ) ) × italic_S italic_L italic_N italic_e italic_t ( italic_I start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ) ) end_CELL end_ROW start_ROW start_CELL = italic_C italic_B ( italic_O start_POSTSUBSCRIPT italic_H italic_U italic_N italic_e italic_t end_POSTSUBSCRIPT × italic_O start_POSTSUBSCRIPT italic_S italic_L italic_N italic_e italic_t end_POSTSUBSCRIPT ) end_CELL end_ROW end_ARRAY (1)

where HDSRNet𝐻𝐷𝑆𝑅𝑁𝑒𝑡HDSRNetitalic_H italic_D italic_S italic_R italic_N italic_e italic_t, HUNet𝐻𝑈𝑁𝑒𝑡HUNetitalic_H italic_U italic_N italic_e italic_t and SLNet𝑆𝐿𝑁𝑒𝑡SLNetitalic_S italic_L italic_N italic_e italic_t denote functions of HDSRNet, heterogeneous upper network and symmetrical lower network, respectively. CR𝐶𝑅CRitalic_C italic_R is a Conv+ReLU. 5HB5𝐻𝐵5HB5 italic_H italic_B is five stacked heterogeneous blocks. CB𝐶𝐵CBitalic_C italic_B denotes a construction block. ×\times× represents multiplication operation. Parameters of obtained HDSRNet can be obtained via a loss function in Section III. B. OHUNetsubscript𝑂𝐻𝑈𝑁𝑒𝑡{O_{HUNet}}italic_O start_POSTSUBSCRIPT italic_H italic_U italic_N italic_e italic_t end_POSTSUBSCRIPT and OSLNetsubscript𝑂𝑆𝐿𝑁𝑒𝑡{O_{SLNet}}italic_O start_POSTSUBSCRIPT italic_S italic_L italic_N italic_e italic_t end_POSTSUBSCRIPT are outputs of the HUNet and SLNet, respectively.

Refer to caption
Figure 1: Network architecture of the proposed HDSRNet.

III-B Loss Function

HDSRNet choose mean absolute error (MAE) [12] as the loss function to obtain parameters. Work process of MAE in the HDSRNet can be represented as below.

l(p)=1/Tj=1T|HDSRNet(ILj)IHj|𝑙𝑝1/𝑇superscriptsubscript𝑗1𝑇𝐻𝐷𝑆𝑅𝑁𝑒𝑡superscriptsubscript𝐼𝐿𝑗superscriptsubscript𝐼𝐻𝑗l(p)={1\mathord{\left/{\vphantom{1T}}\right.\kern-1.2pt}T}\sum\limits_{j=1}^{T% }{\left|{HDSRNet(I_{L}^{j})-I_{H}^{j}}\right|}italic_l ( italic_p ) = 1 start_ID / end_ID italic_T ∑ start_POSTSUBSCRIPT italic_j = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT | italic_H italic_D italic_S italic_R italic_N italic_e italic_t ( italic_I start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT ) - italic_I start_POSTSUBSCRIPT italic_H end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT | (2)

where ILjsuperscriptsubscript𝐼𝐿𝑗I_{L}^{j}italic_I start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT and IHjsuperscriptsubscript𝐼𝐻𝑗I_{H}^{j}italic_I start_POSTSUBSCRIPT italic_H end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_j end_POSTSUPERSCRIPT denote the jth𝑗𝑡jthitalic_j italic_t italic_h low- and high-resolution images, respectively. T𝑇Titalic_T represents the number of low-resolution images. l𝑙litalic_l is a loss function of HDSRNet. Also, p𝑝pitalic_p stands for parameters of the trained HDSRNet. Also, it can be optimized by the Adam optimizer [49].

III-C Heterogeneous Block

To train a robust denoiser, heterogeneous blocks are conducted to dynamically adjust parameters to obtain robust low-frequency information, according to different input low-resolution images. Each heterogeneous block is composed of a dilated Conv+ReLU, dynamic Conv+ReLU and Conv+ReLU, where dilated Conv+ReLU represents a combination of a dilated convolution [50] and a ReLU [51]. That is used to capture more context information. A dynamic Conv+ReLU is a combination of a dynamic convolution [13] and ReLU, where can adaptively learn parameters, according to different input information. To prevent long-term dependency, a residual operation is acted between an input of a dilated Conv+ReLU and output of Conv+ReLU. All the convolutional kernels are 3×3333\times 33 × 3. Input, output channel numbers, i.e., dilated, dynamic and common convolutional layers are 64. Also, dilated factor is 2 in the dilated convolutional layers. The mentioned process can be conducted the following equation.

HB(Ot)=CR(DyCR(DiCR(Ot)))+Ot𝐻𝐵subscript𝑂𝑡𝐶𝑅𝐷𝑦𝐶𝑅𝐷𝑖𝐶𝑅subscript𝑂𝑡subscript𝑂𝑡HB({O_{t}})=CR(DyCR(DiCR({O_{t}})))+{O_{t}}italic_H italic_B ( italic_O start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) = italic_C italic_R ( italic_D italic_y italic_C italic_R ( italic_D italic_i italic_C italic_R ( italic_O start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ) ) + italic_O start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT (3)

where Otsubscript𝑂𝑡{O_{t}}italic_O start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is used to define an input of a heterogeneous block. DiCR𝐷𝑖𝐶𝑅DiCRitalic_D italic_i italic_C italic_R stands for a dilated Conv+ReLU. DyCR𝐷𝑦𝐶𝑅DyCRitalic_D italic_y italic_C italic_R represents a dynamic Conv+ReLU. +++ is used to express a residual learning operation as well as direct-sum\oplus in Fig. 1.

III-D Symmetrical Lower Sub-network

To obtain complementary low-frequency information, a 16-layer symmetrical lower network is conducted. Each layer contains a Conv+ReLU, where its input and output channel number are 64 besides the first layer, its kernel is 3×3333\times 33 × 3. Input and output channel number of the first layer are 3 and 64, respectively. To enhance relations of different layers, residual learning operations are used to act between the 1st and 16th, the 2nd and 15th, 3nd and 14th, 4th and 13th, 5th and 12th, 6th and 11th, 7th and 10th, 8th and 9th layers to transfer obtained information of shallow layers to deep layers to prevent long-term dependency and obtain robust information for image super-resolution. The procedure can be summarized as Eq. (4).

OSLNet=CR(CR((CR(CR(O8+CR(O8))+O7)+O6)+.)+O2)+O1\begin{array}[]{c}{O_{SLNet}}=CR(CR(...(CR(CR({O_{8}}+CR({O_{8}}))\\ \qquad+{O_{7}})+{O_{6}})+....)+{O_{2}})+{O_{1}}\end{array}start_ARRAY start_ROW start_CELL italic_O start_POSTSUBSCRIPT italic_S italic_L italic_N italic_e italic_t end_POSTSUBSCRIPT = italic_C italic_R ( italic_C italic_R ( … ( italic_C italic_R ( italic_C italic_R ( italic_O start_POSTSUBSCRIPT 8 end_POSTSUBSCRIPT + italic_C italic_R ( italic_O start_POSTSUBSCRIPT 8 end_POSTSUBSCRIPT ) ) end_CELL end_ROW start_ROW start_CELL + italic_O start_POSTSUBSCRIPT 7 end_POSTSUBSCRIPT ) + italic_O start_POSTSUBSCRIPT 6 end_POSTSUBSCRIPT ) + … . ) + italic_O start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) + italic_O start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_CELL end_ROW end_ARRAY (4)

where Oisubscript𝑂𝑖{O_{i}}italic_O start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT denotes an output of the ith𝑖𝑡ithitalic_i italic_t italic_h layer and i=1,2,3,8𝑖1238i=1,2,3...,8italic_i = 1 , 2 , 3 … , 8. Also, Oi=iCR(IL)subscript𝑂𝑖𝑖𝐶𝑅subscript𝐼𝐿{O_{i}}=iCR({I_{L}})italic_O start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_i italic_C italic_R ( italic_I start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ), where iCR𝑖𝐶𝑅iCRitalic_i italic_C italic_R stands for i𝑖iitalic_i stacked Conv+ReLU.

III-E Construction Block

A 2-layer construction block is used to construct predicted HR images. It contains two phases. The first phase uses a sub-pixel convolutional layer to convert low-frequency information to high-frequency information, its input and output channel number are 128 and 64, respectively. The second phase only utilizes a convolutional layer (Conv) to construct predicted resolution images, where its input and output channel number are 64 and 3, respectively. Their kernel sizes are 3×3333\times 33 × 3. These illustrations can be symbolled as Eq. (5).

OS=CB(OHUNet×OSLNet)=C(Sub(OHUNet×OSLNet))subscript𝑂𝑆𝐶𝐵subscript𝑂𝐻𝑈𝑁𝑒𝑡subscript𝑂𝑆𝐿𝑁𝑒𝑡absent𝐶𝑆𝑢𝑏subscript𝑂𝐻𝑈𝑁𝑒𝑡subscript𝑂𝑆𝐿𝑁𝑒𝑡\begin{array}[]{l}{O_{S}}=CB({O_{HUNet}}\times{O_{SLNet}})\\ \quad\quad=C(Sub({O_{HUNet}}\times{O_{SLNet}}))\end{array}start_ARRAY start_ROW start_CELL italic_O start_POSTSUBSCRIPT italic_S end_POSTSUBSCRIPT = italic_C italic_B ( italic_O start_POSTSUBSCRIPT italic_H italic_U italic_N italic_e italic_t end_POSTSUBSCRIPT × italic_O start_POSTSUBSCRIPT italic_S italic_L italic_N italic_e italic_t end_POSTSUBSCRIPT ) end_CELL end_ROW start_ROW start_CELL = italic_C ( italic_S italic_u italic_b ( italic_O start_POSTSUBSCRIPT italic_H italic_U italic_N italic_e italic_t end_POSTSUBSCRIPT × italic_O start_POSTSUBSCRIPT italic_S italic_L italic_N italic_e italic_t end_POSTSUBSCRIPT ) ) end_CELL end_ROW end_ARRAY (5)

where Sub𝑆𝑢𝑏Subitalic_S italic_u italic_b and C𝐶Citalic_C are functions of a sub-pixel convolutional layer and a convolutional layer, respectively.

IV Experiments

IV-A Training Dataset

We utilize the public DIV2K dataset[14] as a training set to develop a HDSRNet for enhancing image super-resolution capabilities. Specifically, the DIV2K contains three parts, i.e., a training dataset of 800 images, a validation dataset of 100 images and a test dataset of 100 images. To enlarge diversity of a training dataset, we merge an original dataset and a validation dataset to form a new training dataset, which includes 900 high-quality images and corresponding low-resolution images of x2, x3 and x4. All training images are saved format of ‘.png’.

IV-B Test Datasets

To fairly test SR performance of our HDSRNet, four public SR datasets, i.e., Set5 containing 5 natural images [15], Set14 containing 14 natural images [16], BSD100 (B100) containing 100 natural images [17] and Urban100 (U100) containing 100 containing [18] are used to conduct experiments. These datasets include three different scales, i.e., x2, x3 and x4. They are saved as format of ‘.png’.

IV-C Implementation Details

Original parameters of training a HDSRNet are β1𝛽1\beta 1italic_β 1 of 0.9, β2𝛽2\beta 2italic_β 2 of 0.999, epsilon of 1e-8, batch size of 64 and initial learning rate of 1e-4, which is halved every 300,000 steps. Also, it more parameters are the same as Ref. [10].

The proposed HDSRNet is implemented by Pytorch 1.8.0 and Python 3.8. And all the experiments run on a workstation with Ubuntu 20.04, which equipped AMD EPYC 7502P CPU and four GPUs of Nvidia GeForce RTX 3090 with Nvidia CUDA 11.1, and cuDNN 8005. All experiments are trained via one 3090 GPU.

IV-D Network Analysis

We analyze architecture of designed HDSRNet containing a heterogeneous upper sub-network, symmetrical lower-network and construction block for image super-resolution, according to its rationality and validity.

Heterogeneous upper sub-network: Most of existing SR models cannot adaptively learn parameters, according to different given low-resolution images [52]. Dynamic convolutions can automatically adjust parameters, according to different inputs [13]. In response to this motivation, we have developed a heterogeneous upper sub-network specifically for image super-resolution. We design a main network, according to VGG [53]. That is, six stacked Conv+ReLU is used as a basic network to extract structural low-frequency information. A combination of six stacked Conv+ReLU and a CB has obtained peak signal-to-noise ratio (PSNR) [54] of 25.13dB and structural similarity (SSIM) [54] of 0.7525 on U100 for x4, which shows effectiveness of six stacked Conv+ReLU. A CB is used to build HR images, which can be shown in latter section. To prevent long-term dependency problem, deep fusion idea [57] is used in this paper. That is, each residual learning operation is used to act between an input and output of Conv+ReLU besides the first Conv+ReLU to transform low-frequency information from shallow layers to deep layers to pursue better performance on super-resolution. Its effectiveness can be shown by comparing ‘Heterogeneous upper sub-network without Dilated Conv+ReLU, Dynamic Conv+ReLU and a CB’ and ‘A combination of six stacked Conv’ in terms of PSNR and SSIM in TABLE I. To extract more context, each dilated Conv+ReLU with dilated factor of 2 is used before each Conv+ReLU besides the first Conv+ReLU in heterogeneous upper sub-network and its effectiveness can be proved by comparing ‘Heterogeneous upper sub-network without Dilated Conv+ReLU, Dynamic Conv+ReLU and a CB’ and ‘Heterogeneous upper sub-network without dynamic Conv+ReLU and a CB’ in terms of PSNR and SSIM on U100 for x4. Specifically, five stacked blocks besides the first Conv+ReLU in the heterogeneous upper sub-network (HUNet) are five heterogeneous blocks (HBs) and its rationality and validity are shown in latter section. To make HUNet robust, each Dynamic Conv+ReLU is set behind each Dilated Conv+ReLU in each HB to dynamically adjust parameters, according to different inputs. Its better SR results can be found via comparing ‘Heterogeneous upper sub-network and a CB’ and ‘Heterogeneous upper sub-network without dynamic Conv+ReLU and a CB’ in TABLE I, where effectiveness of dynamic convolutions is verified. It is known that difference of network architecture is bigger, its performance is better [41]. Motivated by that, we respectively use Conv+ReLU rather than Dynamic Conv+ReLU and Dilated Conv+ReLU to conduct experiments. As shown in TABLE I, we can see that ‘Heterogeneous upper sub-network and a CB’ has obtained higher PSNR and SSIM values than that of ‘Heterogeneous upper sub-network with Conv+ReLU rather than dilated Conv+ReLU’ and ‘Heterogeneous upper sub-network with Conv+ReLU rather than dynamic Conv+ReLU’. This approach demonstrates the effectiveness of dynamic Conv+ReLU and dilated Conv+ReLU in HB, and highlights the advantages of various network architectures within HB for image super-resolution.

Symmetrical lower sub-network: To prevent poor representation of single network for image super-resolution, a symmetrical lower sub-network is designed, which can be used to assist an upper network to extract more complementary structural information to improve recovered high-resolution images. That is implemented by the following phases.

TABLE I: PSNR and SSIM of different methods on U100 for x4.
Scale Methods PSNR(dB)/SSIM
×4 A combination of six stacked Conv+ReLU and a CB 25.13/0.7525
Heterogeneous upper sub-network without Dilated 25.22/0.7559
Conv+ReLU, Dynamic Conv+ReLU and a CB
Heterogeneous upper sub-network without dynamic 25.46/0.7649
Conv+ReLU and a CB
Heterogeneous upper sub-network without dilated 25.61/0.7697
Conv+ReLU and a CB
Heterogeneous upper sub-network with Conv+ReLU 25.74/0.7739
rather than dilated Conv+ReLU
Heterogeneous upper sub-network with Conv+ReLU 25.68/0.7729
rather than dynamic Conv+ReLU
Heterogeneous upper sub-network and a CB 25.75/0.7740
HDSRNet without residual learning operations 25.99/0.7825
in symmetrical lower sub-network
HDSRNet 26.01/0.7827
TABLE II: PSNR and SSIM of different methods with different upscale factors on Set5.
Dataset Methods ×2 ×3 ×4
PSNR(dB)/SSIM PSNR(dB)/SSIM PSNR(dB)/SSIM
Set5 Bicubic 33.66/0.9299 30.39/0.8682 28.42/0.8104
SRCNN [3] 36.66/0.9542 32.75/0.9090 30.48/0.8628
VDSR [4] 37.53/0.9587 33.66/0.9213 31.35/0.8838
DRRN [6] 37.74/0.9591 34.03/0.9244 31.68/0.8888
FSRCNN [9] 37.00/0.9558 33.16/0.9140 30.71/0.8657
CARN-M [11] 37.53/0.9583 33.99/0.9236 31.92/0.8903
IDN [12] 37.83/0.9600 34.11/0.9253 31.82/0.8903
A+ [19] 36.54/0.9544 32.58/0.9088 30.28/0.8603
JOR [20] 36.58/0.9543 32.55/0.9067 30.19/0.8563
RFL [21] 36.54/0.9537 32.43/0.9057 30.14/0.8548
SelfEx [18] 36.49/0.9537 32.58/0.9093 30.31/0.8619
CSCN [22] 36.93/0.9552 33.10/0.9144 30.86/0.8732
RED [23] 37.56/0.9595 33.70/0.9222 31.33/0.8847
DnCNN [24] 37.58/0.9590 33.75/0.9222 31.40/0.8845
TNRD [25] 36.86/0.9556 33.18/0.9152 30.85/0.8732
FDSR [26] 37.40/0.9513 33.68/0.9096 31.28/0.8658
RCN [27] 37.17/0.9583 33.45/0.9175 31.11/0.8736
DRCN [5] 37.63/0.9588 33.82/0.9226 31.53/0.8854
LapSRN [60] 37.52/0.9590 - 31.54/0.8850
NDRCN [63] 37.73/0.9596 33.90/0.9235 31.50/0.8859
MemNet [29] 37.78/0.9597 34.09/0.9248 31.74/0.8893
LESRCNN [30] 37.65/0.9586 33.93/0.9231 31.88/0.8903
LESRCNN-S [30] 37.57/0.9582 34.05/0.9238 31.88/0.8907
ScSR [31] 35.78/0.9485 31.34/0.8869 29.07/0.8263
DSRCNN [32] 37.73/0.9588 34.17/0.9247 31.89/0.8909
DAN [34] 37.34/0.9526 34.04/0.9199 31.89/0.8864
PGAN [37] - - 31.03/0.8798
HDSRNet (Ours) 37.95/0.9604 34.31/0.9263 32.15/0.8927
TABLE III: PSNR and SSIM of different methods with different upscale factors on Set14.
Dataset Methods ×2 ×3 ×4
PSNR(dB)/SSIM PSNR(dB)/SSIM PSNR(dB)/SSIM
Set14 Bicubic 30.24/0.8688 27.55/0.7742 26.00/0.7027
SRCNN [3] 32.42/0.9063 29.28/0.8209 27.49/0.7503
VDSR [4] 33.03/0.9124 29.77/0.8314 28.01/0.7674
DRRN [6] 33.23/0.9136 29.96/0.8349 28.21/0.7720
FSRCNN [9] 32.63/0.9088 29.43/0.8242 27.59/0.7535
CARN-M [11] 33.26/0.9141 30.08/0.8367 28.42/0.7762
IDN [12] 33.30/0.9148 29.99/0.8354 28.25/0.7730
A+ [19] 32.28/0.9056 29.13/0.8188 27.32/0.7491
JOR [20] 32.38/0.9063 29.19/0.8204 27.27/0.7479
RFL [21] 32.26/0.9040 29.05/0.8164 27.24/0.7451
SelfEx [18] 32.22/0.9034 29.16/0.8196 27.40/0.7518
CSCN [22] 32.56/0.9074 29.41/0.8238 27.64/0.7578
RED [23] 32.81/0.9135 29.50/0.8334 27.72/0.7698
DnCNN [24] 33.03/0.9128 29.81/0.8321 28.04/0.7672
TNRD [25] 32.51/0.9069 29.43/0.8232 27.66/0.7563
FDSR [26] 33.00/0.9042 29.61/0.8179 27.86/0.7500
RCN [27] 32.77/0.9109 29.63/0.8269 27.79/0.7594
DRCN [5] 33.04/0.9118 29.76/0.8311 28.02/0.7670
LapSRN [60] 33.08/0.9130 29.63/0.8269 28.19/0.7720
NDRCN [63] 33.20/0.9141 29.88/0.8333 28.10/0.7697
MemNet [29] 33.28/0.9142 30.00/0.8350 28.26/0.7723
LESRCNN [30] 33.32/0.9148 30.12/0.8380 28.44/0.7772
LESRCNN-S [30] 33.30/0.9145 30.16/0.8384 28.43/0.7776
ScSR [31] 31.64/0.8940 28.19/0.7977 26.40/0.7218
DSRCNN [32] 33.43/0.9157 30.24/0.8402 28.46/0.7796
DAN [34] 33.08/0.9041 30.09/0.8287 28.42/0.7687
PGAN [37] - - 27.75/0.8164
HDSRNet (Ours) 33.52/0.9170 30.27/0.8412 28.56/0.7799
TABLE IV: PSNR and SSIM of different methods with different upscale factors on B100.
Dataset Methods ×2 ×3 ×4
PSNR(dB)/SSIM PSNR(dB)/SSIM PSNR(dB)/SSIM
B100 Bicubic 29.56/0.8431 27.21/0.7385 25.96/0.6675
SRCNN [3] 31.36/0.8879 28.41/0.7863 26.90/0.7101
VDSR [4] 31.90/0.8960 28.82/0.7976 27.29/0.7251
DRRN [6] 32.05/0.8973 28.95/0.8004 27.38/0.7284
FSRCNN [9] 31.53/0.8920 28.53/0.7910 26.98/0.7150
CARN-M [11] 31.92/0.8960 28.91/0.8000 27.44/0.7304
IDN [12] 32.08/0.8985 28.95/0.8013 27.41/0.7297
A+ [19] 31.21/0.8863 28.29/0.7835 26.82/0.7087
JOR [20] 31.22/0.8867 28.27/0.7837 26.79/0.7083
RFL [21] 31.16/0.8840 28.22/0.7806 26.75/0.7054
SelfEx [18] 31.18/0.8855 28.29/0.7840 26.84/0.7106
CSCN [22] 31.40/0.8884 28.50/0.7885 27.03/0.7161
RED [23] 31.96/0.8972 28.88/0.7993 27.35/0.7276
DnCNN [24] 31.90/0.8961 28.85/0.7981 27.29/0.7253
TNRD [25] 31.40/0.8878 28.50/0.7881 27.00/0.7140
FDSR [26] 31.87/0.8847 28.82/0.7797 27.31/0.7031
DRCN [5] 31.85/0.8942 28.80/0.7963 27.23/0.7233
LapSRN [60] 31.80/0.8950 - 27.32/0.7280
NDRCN [63] 32.00/0.8975 28.86/0.7991 27.30/0.7263
MemNet [29] 32.08/0.8978 28.96/0.8001 27.40/0.7281
LESRCNN [30] 31.95/0.8964 28.91/0.8005 27.45/0.7313
LESRCNN-S [30] 31.95/0.8965 28.94/ 0.8012 27.47/0.7321
ScSR [31] 30.77/0.8744 27.72/0.7647 26.61/0.6983
DSRCNN [32] 32.05/0.8978 29.01/0.8029 27.50/0.7341
DAN [34] 31.76/0.8858 28.94/0.7919 27.51/0.7248
PGAN [37] - - 26.35/0.6926
HDSRNet (Ours) 32.14/0.8994 29.06/0.8048 27.55/0.7357
TABLE V: PSNR and SSIM of different methods with different upscale factors on U100.
Dataset Methods ×2 ×3 ×4
PSNR(dB)/SSIM PSNR(dB)/SSIM PSNR(dB)/SSIM
U100 Bicubic 26.88/0.8403 24.46/0.7349 23.14/0.6577
SRCNN [3] 29.50/0.8946 26.24/0.7989 24.52/0.7221
VDSR [4] 30.76/0.9140 27.14/0.8279 25.18/0.7524
DRRN [6] 31.23/0.9188 27.53/0.8378 25.44/0.7638
FSRCNN [9] 29.88/0.9020 26.43/0.8080 24.62/0.7280
CARN-M [11] 31.23/0.9193 27.55/0.8385 25.62/0.7694
IDN [12] 31.27/0.9196 27.42/0.8359 25.41/0.7632
A+ [19] 29.20/0.8938 26.03/0.7973 24.32/0.7183
JOR [20] 29.25/0.8951 25.97/0.7972 24.29/0.7181
RFL [21] 29.11/0.8904 25.86/0.7900 24.19/0.7096
SelfEx [18] 29.54/0.8967 26.44/0.8088 24.79/0.7374
RED [23] 30.91/0.9159 27.31/0.8303 25.35/0.7587
DnCNN [24] 30.74/0.9139 27.15/0.8276 25.20/0.7521
TNRD [25] 29.70/0.8994 26.42/0.8076 24.61/0.7291
FDSR [26] 30.91/0.9088 27.23/0.8190 25.27/0.7417
DRCN [5] 30.75/0.9133 27.15/0.8276 25.14/0.7510
LapSRN [60] 30.41/0.9100 - 25.21/0.7560
WaveResNet [61] 30.96/0.9169 27.28/0.8334 25.36/0.7614
CPCA [62] 28.17/0.8990 25.61/0.8123 23.62/0.7257
NDRCN [63] 31.06/0.9175 27.23/0.8312 25.16/0.7546
MemNet [29] 31.31/0.9195 27.56/0.8376 25.50/0.7630
LESRCNN [30] 31.45/0.9206 27.70/0.8415 25.77/0.7732
LESRCNN-S [30] 31.45/0.9207 27.76/0.8424 25.78/0.7739
ScSR [31] 28.26/0.8828 - 24.02/0.7024
DSRCNN [32] 31.83/0.9252 27.99/0.8483 25.94/0.7815
DAN [34] 30.60/0.9060 27.65/0.8352 25.86/0.7721
LDRAN[36] - - 25.91/0.7786
PGAN [37] - - 25.47/0.9574
HDSRNet (Ours) 32.00/0.9267 28.02/0.8493 26.01/0.7827
TABLE VI: Running time (millisecond) of different methods for image super-resolution via different sizes for ×4absent4\times 4× 4.
Image sizes VDSR [4] CARN-M [11] ACNet [38] HDSRNet (Ours)
256 × 256 17.2 15.9 18.38 14.17
512 × 512 57.5 19.9 75.79 26.11
TABLE VII: Parameters and flops of different methods with scale factor of 4 for restoration high-quality images of 1024×1024102410241024\times 10241024 × 1024.
Methods Parameters Flops
DCLS [35] 13,626K 498.18G
ACNet [38] 1,357K 132.82G
HDSRNet (Ours) 1,819K 110.99G
Refer to caption
Figure 2: Predicted high-quality images from different methods on a same low-resolution image from Set14 for ×2: (a) A HR image, (b)Bicubic, (c) SRCNN, (d) LESRCNN, (e) DCLS, (f) VDSR, (g) DRCN and (h) HDSRNet (Ours).
Refer to caption
Figure 3: Predicted high-quality images from different methods on a same low-resolution image from B100 for ×3: (a) A HR image, (b)Bicubic, (c) SRCNN, (d) LESRCNN, (e) DCLS, (f) VDSR, (g) DRCN and (h) HDSRNet (Ours).
Refer to caption
Figure 4: Predicted high-quality images from different methods on a same low-resolution image from B100 for ×3: (a) A HR image, (b)Bicubic, (c) SRCNN, (d) LESRCNN, (e) DCLS, (f) VDSR, (g) DRCN and (h) HDSRNet (Ours).
Refer to caption
Figure 5: Predicted high-quality images from different methods on a same low-resolution image from U100 for ×4: (a) A HR image, (b)Bicubic, (c) SRCNN, (d) LESRCNN, (e) DCLS, (f) VDSR, (g) DRCN and (h) HDSRNet (Ours).
Refer to caption
Figure 6: Predicted high-quality images from different methods on a same low-resolution image from U100 for ×4: (a) A HR image, (b)Bicubic, (c) SRCNN, (d) LESRCNN, (e) DCLS, (f) VDSR, (g) DRCN and (h) HDSRNet (Ours).
Refer to caption
Figure 7: Predicted high-quality images from different methods on a same low-resolution image from U100 for ×4: (a) A HR image, (b)Bicubic, (c) SRCNN, (d) LESRCNN, (e) DCLS, (f) VDSR, (g) DRCN and (h) HDSRNet (Ours).

The first phase design a 16-layer stacked Conv+ReLU to extract low-frequency structural information, according to VGG [53]. Its effectiveness can be proved via ‘Heterogeneous upper sub-network and a CB’ and ‘HDSRNet without residual learning operations in symmetrical lower sub-network’ in TABLE I. To enhance relationship of hierarchical features, the second phase is conducted. It uses residual learning operations to merge obtained information of shallow and deep layers to achieve a symmetrical lower sub-network to extract more accurate information as shown in Section III.D. As illustrated in TABLE I, we can see that ‘HDSRNet’ has obtained higher PSNR and SSIM than that of HDSRNet without residual learning operations in symmetrical lower sub-network, which shows effectiveness of residual learning operations in the symmetrical lower sub-network for image SR. Besides, ’HDSRNet’ has obtained higher PSNR value than that of ’Heterogeneous upper sub-network and a CB’ in TABLE I, which shows effectiveness of Heterogeneous parallel networks for image super-resolution.

Construction block: To construct HR images, we design a 2-layer construction block. It consists of a sub-pixel convolutional layer and a convolutional layer. Sub-pixel convolutional layer is used to amplify low-frequency information to high-frequency information. A convolutional layer is utilized to construct predicted HR images.

IV-E Comparisons with Popular Methods for SR

In this section, we use both quantitative and qualitative analysis to evaluate results of our HDSRNet on SISR. Quantitative analysis includes PSNR, SSIM, running time and complexity of our HDSRNet. Specifically, PSNR and SSIM are used to measure quality of predicted HR images. Also, running time and parameters are used to test feasibility of our HDSRNet for real applications, i.e., phones and cameras. We use A+ [19], jointly optimized regressors (JOR) [20], image upscaling with super-resolution forests (RFL) [21], self-exemplars SR method (SelfEx) [18], cascade of sparse coding based network (CSCN) [22], image restoration using encoder-decoder networks (RED) [23], denoising convolutional neural network (DnCNN) [24], trainable non-linear reaction diffusion (TNRD) [25], fast dilated residual SR convolutional net-work (FDSR) [26], SRCNN [3], FSRCNN [9], residue context sub-network (RCN) [27], VDSR [4], deeply-recursive convolutional network (DRCN) [5], IDN [12], DRRN [6], Laplacian SR network (LapSRN) [60], new architecture of deep recursive convolution networks for SR (NDRCN) [63], a persistent memory network (MemNet) [29], CARN-M [11], light-weight image super-resolution with enhanced CNN (LESRCNN) [30], deep alternating network (DAN) [34], Pixel-Level Generative Adversarial Network (PGAN) [37] and our HDSRNet on four public datasets, i.e., Set5, Set14, B100 and U100 for x2, x3 and x4 to conduct experiments. As shown in TABLEs II and III, we can see that our HDSRNet has obtained the best performance in terms of PSNR and SSIM for x2, x3 and x4. For instance, our HDSRNet has improvements of 0.12dB on PSNR and 0.004 on SSIM on Set 5 for x2 than that of IDN in TABLE II. Our HDSRNet has obtained improvements of 0.1dB on PSNR and 0.003 on SSIM on Set14 for x4 than that of DSRCNN in TABLE III. That also shows that our HDSRNet is effective on small datasets for image super-resolution. For big datasets, our HDSRNet still has an advantage for image super-resolution in TABLEs IV and V, we can see that our HDSRNet has obtained the best SR results for all the scale factors, i.e., x2, x3 and x4. For instance, our HDSRNet has exceeded 0.09dB on PSNR and 0.0016 on SSIM than that of DSRCNN for x2 on B100 in TABLE IV. Our HDSRNet has exceeded 0.07dB on PSNR and 0.0012 on SSIM than that of DSRCNN for x4 on U100 in TABLE V. That shows that our HDSRNet is suitable to big datasets for image super-resolution. Specifically, red and blue lines denote the best and second SR results from TABLE II to TABLE V, respectively.

To test practicality of our HDSRNet, we use running time and complexity to test performance of our HDSRNet for image SR. As shown in TABLE VI, our HDSRNet has obtained competitive running time for restoring low-resolution images sizes of 256×256256256256\times 256256 × 256 and 512×512512512512\times 512512 × 512. As illustrated in TABLE VII, although our HDSRNet has obtained more parameters than that of ACNet, it has obtained less flops than that of ACNet. Thus, it is competitive in complexity and suitable for application in consumer electronic products. According to mentioned experiment results, we can find that our HDSRNet is effective in terms of quantitative analysis.

For qualitative analysis, we use Bicubic, SRCNN, LESRCNN, DCLS, VDSR, DRCN and HDSRNet on a low-resolution image from the B100 and U100 for x3 and x4 to recover high-quality images, which are used to compare with given HR images. That is, we amplify one area of predicted high-quality images from different methods as observation areas, observation areas are clearer, their corresponding methods are more effective for image super-resolution. As shown in Figs.2-7, we can see that our HDSRNet has obtained clearer areas, it shows that our HDSRNet is effective for qualitative analysis. In a summary, our HDSRNet is a good tool for image resolution, according to quantitative and qualitative analysis.

V Conclusion

In this paper, we propose a heterogeneous dynamic convolutional network in SISR. This paper designs a heterogeneous dynamic convolutional network to capture more structural information. The upper network depends on stacked heterogeneous blocks to facilitate more contexture information for improving effects of image super-resolution. Also, each heterogeneous block is composed of a dilated, dynamic, common convolutional layers, ReLU and residual learning operation is used to adjust parameters for different inputs and prevent long-term dependency problem. The lower network uses symmetric architecture to enhance relations of different layers to extract more structural information for SISR. We will deal with SISR with non-reference images in the future.

References

  • [1] K. Zhang, W. Zuo, and L. Zhang, “Deep plug-and-play super-resolution for arbitrary blur kernels,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 1671–1681.
  • [2] X. Zhou, H. Huang, R. He, Z. Wang, J. Hu, and T. Tan, “Msra-sr: Image super-resolution transformer with multi-scale shared representation acquisition,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 12 665–12 676.
  • [3] C. Dong, C. C. Loy, K. He, and X. Tang, “Image super-resolution using deep convolutional networks,” IEEE transactions on pattern analysis and machine intelligence, vol. 38, no. 2, pp. 295–307, 2015.
  • [4] J. Kim, J. K. Lee, and K. M. Lee, “Accurate image super-resolution using very deep convolutional networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 1646–1654.
  • [5] J. Kim, J. K. Lee, and K. M. Lee, “Deeply-recursive convolutional network for image super-resolution,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 1637–1645.
  • [6] Y. Tai, J. Yang, and X. Liu, “Image super-resolution via deep recursive residual network,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 3147–3155.
  • [7] V. Dumoulin and F. Visin, “A guide to convolution arithmetic for deep learning (2016),” arXiv preprint arXiv:1603.07285, 2016.
  • [8] W. Shi, J. Caballero, F. Huszár, J. Totz, A. P. Aitken, R. Bishop, D. Rueckert, and Z. Wang, “Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 1874–1883.
  • [9] C. Dong, C. C. Loy, and X. Tang, “Accelerating the super-resolution convolutional neural network,” in Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part II 14.   Springer, 2016, pp. 391–407.
  • [10] B. Lim, S. Son, H. Kim, S. Nah, and K. Mu Lee, “Enhanced deep residual networks for single image super-resolution,” in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2017, pp. 136–144.
  • [11] N. Ahn, B. Kang, and K.-A. Sohn, “Fast, accurate, and lightweight super-resolution with cascading residual network,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 252–268.
  • [12] Z. Hui, X. Wang, and X. Gao, “Fast and accurate single image super-resolution via information distillation network,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 723–731.
  • [13] Y. Chen, X. Dai, M. Liu, D. Chen, L. Yuan, and Z. Liu, “Dynamic convolution: Attention over convolution kernels,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 11 030–11 039.
  • [14] E. Agustsson and R. Timofte, “Ntire 2017 challenge on single image super-resolution: Dataset and study,” in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2017, pp. 126–135.
  • [15] M. Bevilacqua, A. Roumy, C. Guillemot, and M. L. Alberi-Morel, “Low-complexity single-image super-resolution based on nonnegative neighbor embedding,” 2012.
  • [16] R. Zeyde, M. Elad, and M. Protter, “On single image scale-up using sparse-representations,” in Curves and Surfaces: 7th International Conference, Avignon, France, June 24-30, 2010, Revised Selected Papers 7.   Springer, 2012, pp. 711–730.
  • [17] D. Martin, C. Fowlkes, D. Tal, and J. Malik, “A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics,” in Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001, vol. 2.   IEEE, 2001, pp. 416–423.
  • [18] J.-B. Huang, A. Singh, and N. Ahuja, “Single image super-resolution from transformed self-exemplars,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 5197–5206.
  • [19] R. Timofte, V. De Smet, and L. Van Gool, “A+: Adjusted anchored neighborhood regression for fast super-resolution,” in Computer Vision–ACCV 2014: 12th Asian Conference on Computer Vision, Singapore, Singapore, November 1-5, 2014, Revised Selected Papers, Part IV 12.   Springer, 2015, pp. 111–126.
  • [20] D. Dai, R. Timofte, and L. Van Gool, “Jointly optimized regressors for image super-resolution,” in Computer Graphics Forum, vol. 34, no. 2.   Wiley Online Library, 2015, pp. 95–104.
  • [21] S. Schulter, C. Leistner, and H. Bischof, “Fast and accurate image upscaling with super-resolution forests,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 3791–3799.
  • [22] Z. Wang, D. Liu, J. Yang, W. Han, and T. Huang, “Deep networks for image super-resolution with sparse prior,” in Proceedings of the IEEE international conference on computer vision, 2015, pp. 370–378.
  • [23] X. Mao, C. Shen, and Y.-B. Yang, “Image restoration using very deep convolutional encoder-decoder networks with symmetric skip connections,” Advances in neural information processing systems, vol. 29, 2016.
  • [24] K. Zhang, X. Gao, D. Tao, and X. Li, “Single image super-resolution with non-local means and steering kernel regression,” IEEE Transactions on Image Processing, vol. 21, no. 11, pp. 4544–4556, 2012.
  • [25] Y. Chen and T. Pock, “Trainable nonlinear reaction diffusion: A flexible framework for fast and effective image restoration,” IEEE transactions on pattern analysis and machine intelligence, vol. 39, no. 6, pp. 1256–1272, 2016.
  • [26] Z. Lu, Z. Yu, P. Yali, L. Shigang, W. Xiaojun, L. Gang, and R. Yuan, “Fast single image super-resolution via dilated residual networks,” IEEE Access, vol. 7, pp. 109 729–109 738, 2018.
  • [27] Y. Shi, K. Wang, C. Chen, L. Xu, and L. Lin, “Structure-preserving image super-resolution via contextualized multitask learning,” IEEE transactions on multimedia, vol. 19, no. 12, pp. 2804–2815, 2017.
  • [28] H. Ren, M. El-Khamy, and J. Lee, “Image super resolution based on fusing multiple convolution neural networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2017, pp. 54–61.
  • [29] Y. Tai, J. Yang, X. Liu, and C. Xu, “Memnet: A persistent memory network for image restoration,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 4539–4547.
  • [30] C. Tian, R. Zhuge, Z. Wu, Y. Xu, W. Zuo, C. Chen, and C.-W. Lin, “Lightweight image super-resolution with enhanced cnn,” Knowledge-Based Systems, vol. 205, p. 106235, 2020.
  • [31] J. Yang, J. Wright, T. S. Huang, and Y. Ma, “Image super-resolution via sparse representation,” IEEE transactions on image processing, vol. 19, no. 11, pp. 2861–2873, 2010.
  • [32] J. Song, J. Xiao, C. Tian, Y. Hu, L. You, and S. Zhang, “A dual cnn for image super-resolution,” Electronics, vol. 11, no. 5, p. 757, 2022.
  • [33] C. Tian, Y. Zhang, W. Zuo, C.-W. Lin, D. Zhang, and Y. Yuan, “A heterogeneous group cnn for image super-resolution,” IEEE transactions on neural networks and learning systems, 2022.
  • [34] Y. Huang, S. Li, L. Wang, T. Tan et al., “Unfolding the alternating optimization for blind super resolution,” Advances in Neural Information Processing Systems, vol. 33, pp. 5632–5643, 2020.
  • [35] Z. Luo, H. Huang, L. Yu, Y. Li, H. Fan, and S. Liu, “Deep constrained least squares for blind image super-resolution,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 17 642–17 652.
  • [36] J. Sahambi et al., “A lightweight deep residual attention network for single image super resolution,” in 2023 National Conference on Communications (NCC).   IEEE, 2023, pp. 1–6.
  • [37] W. Shi, F. Tao, and Y. Wen, “Structure-aware deep networks and pixel-level generative adversarial training for single image super-resolution,” IEEE Transactions on Instrumentation and Measurement, vol. 72, pp. 1–14, 2023.
  • [38] C. Tian, Y. Xu, W. Zuo, C.-W. Lin, and D. Zhang, “Asymmetric cnn for image superresolution,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 52, no. 6, pp. 3718–3730, 2021.
  • [39] C. Tian, X. Zhang, J. C.-W. Lin, W. Zuo, Y. Zhang, and C.-W. Lin, “Generative adversarial networks for image super-resolution: A survey,” arXiv preprint arXiv:2204.13620, 2022.
  • [40] F. Yu, X. Wang, M. Cao, G. Li, Y. Shan, and C. Dong, “Osrt: Omnidirectional image super-resolution with distortion-aware transformer,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 13 283–13 292.
  • [41] Y. Zhang, Y. Tian, Y. Kong, B. Zhong, and Y. Fu, “Residual dense network for image super-resolution,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 2472–2481.
  • [42] S. Wan, C. Gong, P. Zhong, B. Du, L. Zhang, and J. Yang, “Multiscale dynamic graph convolutional network for hyperspectral image classification,” IEEE Transactions on Geoscience and Remote Sensing, vol. 58, no. 5, pp. 3162–3177, 2019.
  • [43] Y. Ding, J. Feng, Y. Chong, S. Pan, and X. Sun, “Adaptive sampling toward a dynamic graph convolutional network for hyperspectral image classification,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–17, 2021.
  • [44] Z. Duan, T. Zhang, X. Luo, and J. Tan, “Dckn: multi-focus image fusion via dynamic convolutional kernel network,” Signal Processing, vol. 189, p. 108282, 2021.
  • [45] C. Dai, Z. Guan, and M. Lin, “Single low-light image enhancer using taylor expansion and fully dynamic convolution,” Signal Processing, vol. 189, p. 108280, 2021.
  • [46] J. Hou, Z. Guo, Y. Wu, W. Diao, and T. Xu, “Bsnet: Dynamic hybrid gradient convolution based boundary-sensitive network for remote sensing image segmentation,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–22, 2022.
  • [47] H. Shen, Z.-Q. Zhao, and W. Zhang, “Adaptive dynamic filtering network for image denoising,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 2, 2023, pp. 2227–2235.
  • [48] C. Tian, M. Zheng, W. Zuo, B. Zhang, Y. Zhang, and D. Zhang, “Multi-stage image denoising with the wavelet transform,” Pattern Recognition, vol. 134, p. 109050, 2023.
  • [49] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
  • [50] F. Yu and V. Koltun, “Multi-scale context aggregation by dilated convolutions,” arXiv preprint arXiv:1511.07122, 2015.
  • [51] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Advances in neural information processing systems, vol. 25, 2012.
  • [52] Y.-S. Xu, S.-Y. R. Tseng, Y. Tseng, H.-K. Kuo, and Y.-M. Tsai, “Unified dynamic convolutional network for super-resolution with variational degradations,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 12 496–12 505.
  • [53] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
  • [54] A. Hore and D. Ziou, “Image quality metrics: Psnr vs. ssim,” in 2010 20th international conference on pattern recognition.   IEEE, 2010, pp. 2366–2369.
  • [55] K. Jiang, Z. Wang, P. Yi, and J. Jiang, “Hierarchical dense recursive network for image super-resolution,” Pattern Recognition, vol. 107, p. 107475, 2020.
  • [56] M. Hu, K. Jiang, Z. Wang, X. Bai, and R. Hu, “Cycmunet+: Cycle-projected mutual learning for spatial-temporal video super-resolution,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
  • [57] K. Jiang, Z. Wang, P. Yi, T. Lu, J. Jiang, and Z. Xiong, “Dual-path deep fusion network for face image hallucination,” IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no. 1, pp. 378–391, 2020.
  • [58] Z. Zha, X. Yuan, B. Wen, J. Zhou, J. Zhang, and C. Zhu, “From rank estimation to rank approximation: Rank residual constraint for image restoration,” IEEE Transactions on Image Processing, vol. 29, pp. 3254–3269, 2020.
  • [59] Z. Zha, X. Yuan, B. Wen, J. Zhou, and C. Zhu, “Group sparsity residual constraint with non-local priors for image restoration,” IEEE Transactions on Image Processing, vol. 29, pp. 8960–8975, 2020.
  • [60] W.-S. Lai, J.-B. Huang, N. Ahuja, and M.-H. Yang, “Deep laplacian pyramid networks for fast and accurate super-resolution,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 624–632.
  • [61] W. Bae, J. Yoo, and J. Chul Ye, “Beyond deep residual learning for image restoration: Persistent homology-guided manifold simplification,” in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2017, pp. 145–153.
  • [62] J. Xu, M. Li, J. Fan, X. Zhao, and Z. Chang, “Self-learning super-resolution using convolutional principal component analysis and random matching,” IEEE Transactions on Multimedia, vol. 21, no. 5, pp. 1108–1121, 2018.
  • [63] F. Cao and B. Chen, “New architecture of deep recursive convolution networks for super-resolution,” Knowledge-Based Systems, vol. 178, pp. 98–110, 2019.
  • [64] S. Wang and Y. Zhang, “Deep learning for covid-19 diagnosis via chest images.”
  • [65] Y. Zhang, L. Deng, H. Zhu, W. Wang, Z. Ren, Q. Zhou, S. Lu, S. Sun, Z. Zhu, J. M. Gorriz et al., “Deep learning in food category recognition,” Information Fusion, p. 101859, 2023.
  • [66] Y.-D. Zhang, Z. Dong, S.-H. Wang, X. Yu, X. Yao, Q. Zhou, H. Hu, M. Li, C. Jiménez-Mesa, J. Ramirez et al., “Advances in multimodal data fusion in neuroimaging: Overview, challenges, and novel orientation,” Information Fusion, vol. 64, pp. 149–187, 2020.
  • [67] X. Zhou, Q. Yang, X. Zheng, W. Liang, I. Kevin, K. Wang, J. Ma, Y. Pan, and Q. Jin, “Personalized federation learning with model-contrastive learning for multi-modal user modeling in human-centric metaverse,” IEEE Journal on Selected Areas in Communications, 2024.
  • [68] X. Zhou, Q. Yang, Q. Liu, W. Liang, K. Wang, Z. Liu, J. Ma, and Q. Jin, “Spatial–temporal federated transfer learning with multi-sensor data fusion for cooperative positioning,” Information Fusion, vol. 105, p. 102182, 2024.
  • [69] X. Zhou, X. Zheng, X. Cui, J. Shi, W. Liang, Z. Yan, L. T. Yang, S. Shimizu, I. Kevin, and K. Wang, “Digital twin enhanced federated reinforcement learning with lightweight knowledge distillation in mobile networks,” IEEE Journal on Selected Areas in Communications, 2023.
  • [70] X. Zhou, X. Zheng, T. Shu, W. Liang, I. Kevin, K. Wang, L. Qi, S. Shimizu, and Q. Jin, “Information theoretic learning-enhanced dual-generative adversarial networks with causal representation for robust ood generalization,” IEEE Transactions on Neural Networks and Learning Systems, 2023.