Learning Deep Gradient Descent Optimization For Image Deconvolution
Learning Deep Gradient Descent Optimization For Image Deconvolution
Deconvolution
Abstract
INTRODUCTION
IMAGE deconvolution, also known as image deblurring, aims to recover a sharp image
from an observed blurry image. The blurry image y ∈ R m is usually modeled as a convolution
of a latent image x ∈ R n and a blur kernel k ∈ R l:
where ∗ denotes the convolution operator, n 2 Rm denotes an i.i.d. white Gaussian noise
term with unknown standard deviation (i.e. noise level). Given a blurry image y and the
corresponding blur kernel k, the task of recovering the sharp image x is referred to as (non-blind)
image deconvolution, which is often used as a subcomponent of blind image deblurring [1], [2],
[3]. Single image deconvolution is challenging and mathematically ill-posed due to the unknown
noise and the loss of the high-frequency information.
Many conventional methods resort to different natural image priors based on manually
designed empirical statistics (e.g. sparse gradient prior [4], [5], [6]) or learned generative models
(e.g. Gaussian mixture models (GMMs) [7]), which usually lead to non-convex and time-
consuming optimization. The optimization algorithms are used to iteratively update the images
based on the priors and the imaging model in (1). For efficiency, discriminative learning methods
[8], [9], [10] are investigated to learn mapping functions from blurred observation to the sharp
image, which are usually restricted to specific blur kernels and noise levels, however.
Due to the successes in many computer vision applications, deep neural networks
(DNNs) have been used more frequently for learning image restoration models [10], [11], [12],
[13], [14]. Since it is impractical to directly apply end to-end DNNs to the deconvolution for
different blur kernels [10], many approaches resort to unrolling an optimization algorithm as a
static cascade scheme with a fixed number of steps in which specific neural networks are
integrated into different steps [11], [13], [9], [15]. The DNN components are usually model the
operators only corresponding to the priors/regularizer (e.g. proximal projectors [13], [12]).
In these static model structures, the DNN based operators in each step are learned
specifically for the intermediate output from the previous step. As a result, these models usually
require customized training for specific noise levels [11], [9] or manually parameter tuning (that
reflects the unknown noise level) for a specific blurred image (in testing) [15], [12], [13],
limiting their applications in practice. Although the learning based methods have applied the
optimization schemes as an interface to the deconvolution application, they are restricted to learn
a static mapping function and overlook the dynamic characteristics in the optimization process.
We address the above issues by learning a universal optimizer for image deconvolution.
Specifically, we propose Recurrent Gradient Descent Network (RGDN), a recurrent DNN
architecture derived from gradient descent optimization methods. The RGDN iteratively updates
the unknown variable x using a universal image updating unit, which mimics the gradient
descent optimization process. To achieve this, we parameterize and learn a universal gradient
descent optimizer, which can be repeatedly used to update x based on its previous updates.
Unlike previous methods [13], [11], [12], [15] only focusing on image prior learning, we
parameterize and learn all main operations of a general gradient descent algorithm, including the
gradient of a free-form image prior, based on CNNs (see Fig. 1). In previous methods [13], [11],
the CNNs are mainly used as a denoised on the image gradients in some splitting technique
based optimization methods. In the implementation of the proposed learnable optimizer, we
observe that incorporating the standard optimization algorithm into the deep neural network
design is beneficial since it can utilize the structure of the problem more effectively.
The proposed model learns not only the optimization processes but also the items
associating to the regularizer, reflecting an image prior. Moreover, the optimizer shared across
steps is trained to dynamically handle different updating statues, which is more flexible and
general to handle the observation with different blur and noise levels. Given input images with
different levels of degenerations, the learned optimizer can adaptively obtain high-quality results
via different numbers of iterations (see Fig. 4 and 6). To summarize, the main contributions of
this paper are:
• We learn an optimized for image deconvolution by fully parameterizing the general gradient
descent optimizer, instead of learning only image priors [7], [16] or the prior-related operators
[13], [15], [12]. The integration of trainable DNNs and the fully parameterized optimization
algorithm yields a parameter-free, effective and robust deconvolution method, making a
substantial step towards the practical deconvolution for real-world images.
• We propose a new discriminative learning model, i.e. the RGDN, to learn an optimizer for
image deconvolution. The RGDN systematically incorporates a series of CNNs into the general
gradient descent scheme. Benefiting from the parameter sharing and recursive supervision,
RGDN tends to learn a universal and dynamic updating unit (i.e. optimizer), which can be
iteratively applied arbitrary times to boost the performance on different observations, making it a
very flexible and practical method.
• Training one RGDN model is able to handle various types of blur and noise. Extensive
experiments on both synthetic data and real images show that the parameter free RGDN learned
from a synthetic dataset can produce competitive or even better results against the other state of-
the-art methods requiring given/known noise level.
Non-blind image deconvolution has been extensively studied in computer vision, signal
processing and other related fields. We will only discuss the most relevant works. Existing non-
blind deconvolution methods can be mainly categorized into two groups: manually-designed
conventional methods and the learning based methods.
Similar to the manually-designed priors, the learned priors also require well tuned
parameters for specific noise levels. To improve efficiency, some approaches address
deconvolution by directly learning a discriminative function [20], [8], [9], [15], [13]. Schuler et
al. [9] impose a regularized inversion of the blur in the Fourier domain and then remove the
noise using a learned multi-layer perceptron (MLP). Schmidt and Roth [8] propose shrinkage
fields (CSF), an efficient discriminative learning procedure based on a random field structure.
Schmidt et al. [20] propose an approach based on Gaussian conditional random field, in which
the parameters are calculated through regression trees.
Chen et al. [21] proposed a diffusion network that integrates the learnable diffusion
process into an iterative estimation scheme and can achieve high-quality results. However, this
model merely focuses on modeling the image priors relying on the RBF based diffusion process
and has to be specially trained for different noise levels. Deep neural networks have been studied
as a more flexible and efficient approach for deconvolution. Xu et al. [10] train a CNN to restore
the images with outliers in an end-to-end fashion, which requires a fine-tuning for every blur
kernel.
As shown by the plug-and-play framework [22], [23], the variable splitting techniques
[24], [18] can be used to decouple the restoration problem as a data fidelity term and a
regularization term corresponding to a projector in optimization. To handle the instance-specific
blur kernel more easily, a series of methods [11], [13], [12] learn a denoisor and integrate it into
the optimization as the projector reflecting the regularization. In [11], a fully convolutional
network (FCN) is trained to remove noise in image gradients to guide the image deconvolution,
which has to be custom-trained for specific noise level.
Zhang et al. [13] learn a set of CNN denoisors (for different noise levels) and plug them
into a half-quadratic splitting (HQS) scheme for image restoration. Chang et al. [12] learn a
proximal operator with adversarial training as image prior. Relying on HQS, Kruse et al. [15]
learn a CNN-based prior term companying with an FFT-based deconvolution scheme.
These methods only focus on learning the prior/regularization term, and the noise level is
required to be known in the testing phase. In a recent work, Jin et al. [25] propose a Bayesian
framework for noise adaptive deconvolution. In recent work, Jin et al. [25] propose a Bayesian
framework for noise adaptive deconvolution by generalizing the model in [21]. Unlike the
proposed method, it models the images with restricted features and activation functions and
applies a fixed number iterations with specific parameters.
Other related works Early works [26], [27], [28], [29] have explored the general idea of
learning to optimize for different tasks, such as few-shot learning [28] and semantic
segmentation [29]. Different optimization algorithms and iterative inference techniques are
unrolled as deep learning models. For example, mean-field inference for conditional random
fields is implemented as a recurrent neural network for image semantic segmentation [29].
Gu et al. [32] integrates local and non-local denoiser into the HQS framework as image
priors for image restoration. In [33], dynamically updated guidance is used to enhance depth
images in a bi-level optimization framework. Deep neural networks with recurrent structures
have also been studied in many other low-level image processing tasks, such as blind image
deblurring [34], [35], image super-resolution [36], and image filtering [37].
CHAPTER 2
LITERATURE SURVEY
[1] J. Pan, Z. Hu, Z. Su, and M.-H. Yang, “Deblurring text images via l0-regularized
intensity and gradient prior,” in The IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), 2014, pp. 2901– 2908.
We propose a simple yet effective L 0 -regularized prior based on intensity and gradient
for text image deblurring. The proposed image prior is motivated by observing distinct properties
of text images. Based on this prior, we develop an efficient optimization method to generate
reliable intermediate results for kernel estimation. The proposed method does not require any
complex filtering strategies to select salient edges which are critical to the state-of-the-art
deblurring algorithms.
We discuss the relationship with other deblurring algorithms based on edge selection and
provide insight on how to select salient edges in a more principled way. In the final latent image
restoration step, we develop a simple method to remove artifacts and render better deblurred
images. Experimental results demonstrate that the proposed algorithm performs favorably
against the state-of-the-art text image deblurring methods. In addition, we show that the
proposed method can be effectively applied to deblur low-illumination images.
In this paper, we propose a simple yet effective prior for text image deblurring. While the
proposed prior is based on the properties of two-tone text images, it can also be effectively
applied to non-document text images and low-illumination scenes with saturated regions. With
this prior, we present an effective optimization method based on a half-quadratic splitting
strategy, which ensures that each sub-problem has a closed-form solution.
The proposed method does not require any complex processing techniques, e.g., filtering,
adaptive segmentation or SWT. In addition, we develop a simple latent image restoration method
which helps reduce artifacts effectively. Our future work will focus on a better non-blind
deconvolution method and extend the proposed algorithm to non-uniform text im age deblurring.
[2] L. Xu and J. Jia, “Two-phase kernel estimation for robust motion deblurring,” in
European Conference on Computer Vision (ECCV), 2010, pp. 157–170.
We discuss a few new motion deblurring problems that are significant to kernel
estimation and non-blind deconvolution. We found that strong edges do not always profit kernel
estimation, but instead under certain circumstance degrade it. This finding leads to a new metric
to measure the usefulness of image edges in motion deblurring and a gradient selection process
to mitigate their possible adverse effect. We also propose an efficient and high-quality kernel
estimation method based on using the spatial prior and the iterative support detection (ISD)
kernel refinement, which avoids hard threshold of the kernel elements to enforce sparsity. We
employ the TV-ℓ1 deconvolution model, solved with a new variable substitution scheme to
robustly suppress noise.
We have presented a novel motion deblurring method and have made a number of
contributions. We observed that motion deblurring could fail when considerable strong and yet
narrow structures exist in the latent image and proposed an effective mask computation
algorithm to adaptively select useful edges for kernel estimation. The ISD-based kernel
refinement further improves the result quality with adaptive regularization. The final
deconvolution step uses a 1 data term that is robust to noise. It is solved with a new iterative
optimization scheme. We have extensively tested our algorithm, and found that it is able to
deblur images with very large blur kernels, thanks to the use of the selective edge map.
[3] D. Gong, J. Yang, L. Liu, Y. Zhang, I. Reid, C. Shen, A. v. d. Hengel, and Q. Shi, “From
motion blur to motion flow: a deep learning solution for removing heterogeneous motion
blur,” The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
Our FCN is the first universal end-to-end mapping from the blurred image to the dense
motion flow. To train the FCN, we simulate motion flows to generate synthetic blurred-image-
motion-flow pairs thus avoiding the need for human labeling. Extensive experiments on
challenging realistic blurred images demonstrate that the proposed method outperforms the state-
of-the-art.
In this paper, we proposed a flexible and efficient deep learning based method for
estimating and removing the heterogeneous motion blur. By representing the heterogeneous
motion blur as pixel-wise linear motion blur, the proposed method uses a FCN to estimate the a
dense motion flow map for blur removal. Moreover, we automatically generate training data with
simulated motion flow maps for training the FCN. Experimental results on both synthetic and
real world data show the excellence of the proposed method.
[4] A. Levin, R. Fergus, F. Durand, and W. T. Freeman, “Image and depth from a
conventional camera with a coded aperture,” ACM transactions on graphics (TOG), vol.
26, no. 3, p. 70, 2007.
A conventional camera captures blurred versions of scene information away from the
plane of focus. Camera systems have been proposed that allow for recording all-focus images, or
for extracting depth, but to record both simultaneously has required more extensive hardware and
reduced spatial resolution. We propose a simple modification to a conventional camera that
allows for the simultaneous recovery of both (a) high resolution image information and (b) depth
information adequate for semi-automatic extraction of a layered depth representation of the
image.
Our modification is to insert a patterned occluder within the aperture of the camera lens,
creating a coded aperture. We introduce a criterion for depth discriminability which we use to
design the preferred aperture pattern. Using a statistical model of images, we can recover both
depth information and an all-focus image from single photographs taken with the modified
camera. A layered depth map is then extracted, requiring user-drawn strokes to clarify layer
assignments in some cases. The resulting sharp image and layered depth map can be combined
for various photographic applications, including automatic scene segmentation, post-exposure
refocusing, or re-rendering of the scene from an alternate viewpoint.
The principle of our approach is to control the effect of defocus so that we can both
estimate the amount of defocus easily – and hence infer distance information – while at the same
time making it possible to compensate for at least part of the defocus to create artifact-free
images.
[5] D. Krishnan and R. Fergus, “Fast image deconvolution using hyper laplacian priors,” in
Advances in Neural Information Processing Systems (NIPS), 2009, pp. 1033–1041.
It was recently shown that certain nonparametric regressors can escape the curse of
dimensionality when the intrinsic dimension of data is low ([1, 2]). We prove some stronger
results in more general settings. In particular, we consider a regressor which, by combining
aspects of both tree-based regression and kernel regression, adapts to intrinsic dimension,
operates on general metrics, yields a smooth function, and evaluates in time O(log n). We derive
a tight convergence rate of the form n −2/(2+d) where d is the Assouad dimension of the input
space. Relative to parametric methods, nonparametric regressors require few structural
assumptions on the function being learned. However, their performance tends to deteriorate as
the number of features increases. This so-called curse of dimensionality is quantified by various
lower bounds on the convergence rates of the form n −2/(2+D) for data in R D (see e.g. [3, 4]).
In other words, one might require a data size exponential in D in order to attain a low risk.
Fortunately, it is often the case that data in R D has low intrinsic complexity, e.g. the data is near
a manifold or is sparse, and we hope to exploit such situations. One simple approach, termed
manifold learning (e.g. [5, 6, 7]), is to embed the data into a lower dimensional space where the
regressor might work well.
A recent approach with theoretical guarantees for nonparametric regression, is the study
of adaptive procedures, i.e. ones that operate in R D but attain convergence rates that depend just
on the intrinsic dimension of data. An initial result [1] shows that for data on a d dimensional
manifold, the asymptotic risk at a point x ∈ R D depends just on d and on the behavior of the
distribution in a neighborhood of x.
Later, [2] showed that a regressor based on the RP tree of [8] (a hierarchical partitioning
procedure) is not only fast to evaluate, but is adaptive to Assouad dimension, a measure which
captures notions such as manifold dimension and data sparsity. The related notion of box
dimension (see e.g. [9]) was shown in an earlier work [10] to control the risk of nearest neighbor
regression, although adaptively was not a subject of that result.
[6] Y. Wang, J. Yang, W. Yin, and Y. Zhang, “A new alternating minimization algorithm
for total variation image reconstruction,” SIAM Journal on Imaging Sciences, vol. 1, no. 3,
pp. 248–272, 2008.
We propose, analyze and test an alternating minimization algorithm for recovering
images from blurry and noisy observations with total variation (TV) regularization. This
algorithm arises from a new half-quadratic model applicable to not only the anisotropic but also
isotropic forms of total variation discretization’s. The per-iteration computational complexity of
the algorithm is three Fast Fourier Transforms (FFTs). We establish strong convergence
properties for the algorithm including finite convergence for some variables and relatively fast
exponential (or q-linear in optimization terminology) convergence for the others. Furthermore,
we propose a continuation scheme to accelerate the practical convergence of the algorithm.
Extensive numerical results show that our algorithm performs favorably in comparison to several
state-of-the-art algorithms. In particular, it runs orders of magnitude faster than the Lagged
Diffusivity algorithm for total-variation-based deblurring. Some extensions of our algorithm are
also discussed.
In this paper, we propose a fast algorithm for reconstructing images from blurry and
noisy observations. For simplicity, we assume that the underlying images have square domains,
but all discussions can be equally applied to rectangle domains. Let u 0 ∈ R n 2 be an original
n×n gray-scale image, K ∈ R n 2×n 2 represent a blurring (or convolution) operator, ω ∈ R n 2
be additive noise, and f ∈ R n 2 an observation which satisfies the relationship:
Following the framework proposed in [17, 18], a number of half-quadratic models have
been derived and analyzed. However, model (1.3) cannot be derived from (1.2) by the
constructions given in [17, 18]; instead, a new approximate TV function must be first introduced
before applying the construction of [18] (see Section 2.3 for more details). The approximate TV
model (1.3) and the resulting alternating minimization algorithm were first proposed in [43]
without a convergence analysis. A similar split method has recently been proposed that uses
Bregman iterations [20].
[7] D. Zoran and Y. Weiss, “From learning models of natural image patches to whole image
restoration,” in The IEEE International Conference on Computer Vision (ICCV), 2011, pp.
479–486.
Learning good image priors is of utmost importance for the study of vision, computer
vision and image processing applications. Learning priors and optimizing over whole images can
lead to tremendous computational challenges. In contrast, when we work with small image
patches, it is possible to learn priors and perform patch restoration very efficiently. This raises
three questions - do priors that give high likelihood to the data also lead to good performance in
restoration? Can we use such patch based priors to restore a full image? Can we learn better
patch priors? In this work we answer these questions. We compare the likelihood of several
patch models and show that priors that give high likelihood to data perform better in patch
restoration.
Motivated by this result, we propose a generic framework which allows for whole image
restoration using any patch based prior for which a MAP (or approximate MAP) estimate can be
calculated. We show how to derive an appropriate cost function, how to optimize it and how to
use it to restore whole images. Finally, we present a generic, surprisingly simple Gaussian
Mixture prior, learned from a set of natural images. When used with the proposed framework,
this Gaussian Mixture Model outperforms all other generic prior methods for image denoising,
deblurring and inpainting.
Image priors have become a popular tool for image restoration tasks. Good priors have
been applied to different tasks such as image denoising [ 1 , 2 , 3 , 4 , 5 , 6], image inpainting [6]
and more [7], yielding excellent results. However, learning good priors from natural images is a
daunting task - the high dimensionality of images makes learning, inference and optimization
with such priors prohibitively hard. As a result, in many works [ 4 , 5 , 8] priors are learned over
small image patches.
This has the advantage of making computational tasks such as learning, inference and
likelihood estimation much easier than working with whole images directly. In this paper we ask
three questions: (1) Do patch priors that give high likelihoods yield better patch restoration
performance? (2) Do patch priors that give high likelihoods yield better image restoration
performance? (3) Can we learn better patch priors?
[8] U. Schmidt and S. Roth, “Shrinkage fields for effective image restoration,” in The IEEE
Conference on Computer Vision and Pattern Recognition (CVPR), 2014, pp. 2774–2781.
Many state-of-the-art image restoration approaches do not scale well to larger images,
such as megapixel images common in the consumer segment. Computationally expensive
optimization is often the culprit. While efficient alternatives exist, they have not reached the
same level of image quality. The goal of this paper is to develop an effective approach to image
restoration that offers both computational efficiency and high restoration quality. To that end we
propose shrinkage fields, a random field-based architecture that combines the image model and
the optimization algorithm in a single unit.
The underlying shrinkage operation bears connections to wavelet approaches, but is used
here in a random field context. Computational efficiency is achieved by construction through the
use of convolution and DFT as the core components, high restoration quality is attained through
loss-based training of all model parameters and the use of a cascade architecture. Unlike heavily
engineered solutions, our learning approach can be adapted easily to different trade-offs between
efficiency and image quality. We demonstrate state-of-the-art restoration results with high levels
of computational efficiency, and significant speedup potential through inherent parallelism.
We presented shrinkage fields, a novel random field model applicable to the restoration
of high-resolution images, which is based on an extension of the additive form of half-quadratic
optimization. By replacing potentials with shrinkage functions, we increased model flexibility
and enabled efficient learning of all model parameters. Experiments on image denoising and
deconvolution with cascaded shrinkage fields demonstrated that fast runtime and high restoration
quality can go hand-in-hand.
Image deconvolution is the ill-posed problem of recovering a sharp image, given a blurry
one generated by a convolution. In this work, we deal with space-invariant nonblind
deconvolution. Currently, the most successful methods involve a regularized inversion of the
blur in Fourier domain as a first step. This step amplifies and colors the noise, and corrupts the
image information. In a second (and arguably more difficult) step, one then needs to remove the
colored noise, typically using a cleverly engineered algorithm.
However, the methods based on this two-step approach do not properly address the fact
that the image information has been corrupted. In this work, we also rely on a two-step
procedure, but learn the second step on a large dataset of natural images, using a neural network.
We will show that this approach outperforms the current state-of the-art on a large dataset of
artificially blurred images. We demonstrate the practical applicability of our method in a real-
world example with photographic out-of-focus blur
We have shown that neural networks achieve a new state of-the-art in image
deconvolution. This is true for all scenarios we tested. Our method presents a clear benefit in that
it is based on learning: We do not need to design or select features or even decide on a useful
transform domain, the neural network automatically takes care of these tasks. An additional
benefit related to learning is that we can handle different types of noise, whereas it is not clear if
this is always possible for other methods. Finally, by directly learning the mapping from
corrupted patches to clean patches, we handle both types of artifacts introduced by the direct
deconvolution, instead of being limited to removing colored noise. We were able to gain insight
into how our MLPs operate:
They detect features in the input and generate corresponding features in the output. Our
MLPs have to be trained on GPU to achieve good results in a reason able amount of time, but
once learned, deblurring on CPU is practically feasible. A limitation of our approach is that each
MLP has to be trained on only one blur kernel: Results achieved with MLPs trained on several
blur kernels are inferior to those achieved with MLPs trained on a single blur kernel. This makes
our approach less useful for motion blurs, which are different for every image. However, in this
case the deblurring quality is currently more limited by errors in the blur estimation than in the
non-blind deconvolution step. Possibly our method could be further improved with a meta-
procedure, such as [17].
[10] L. Xu, J. S. Ren, C. Liu, and J. Jia, “Deep convolutional neural network for image
deconvolution,” in Advances in Neural Information Processing Systems (NIPS), 2014, pp.
1790–1798.
Our method achieves decent results quantitatively and visually. The implementation, as
well as the dataset, is available at the project webpage. To conclude this paper, we have proposed
a new deep convolutional network structure for the challenging image deconvolution task. Our
main contribution is to let traditional deconvolution schemes guide neural networks and
approximate deconvolution by a series of convolution steps. Our system novelly uses two
modules corresponding to deconvolution and artifact removal. While the network is difficult to
train as a whole, we adopt two supervised pre-training steps to initialize sub-networks. High-
quality deconvolution results bear out the effectiveness of this approach.
CHAPTER 3
In this section, we will first briefly revisit the classical model-based non-blind
deconvolution problem and the general gradient descent algorithm. We then propose the RGDN
model with a fully parameterized gradient descent scheme. Finally, we discuss how to perform
training and deconvolution with RGDN. We consider the common blur model in (1), which can
also be rewritten as
where A 2 Rn×m denotes the convolution matrix of k. The matrix A represents the convolution
operation x ∗ k as the matrix-vector multiplication Ax with a similar definition in [17]. Note that
we slightly abuse the notation k∗x to denote the 2D convolution operation, although x and k are
defined as the vector representations of the image and kernel, respectively.
Based on the blur model (1) and the common Gaussian noise assumption, given a blurry
image y and the blur kernel k, the desired solution of the non-blind deconvolution should
minimize a data fidelity term f(x) = 21λky−Axk2 2, where the weighting term λ > 0 reflects the
noise level in y. Considering the ill-posed nature of the problem, given a regularizer Ω(x), the
non-blind deconvolution can be achieved by solving the minimization problem
where the regularizer Ω(x) corresponds to the image prior, and the weighting term γ ≥ 0 controls
the strength of the regularization. Generally, Ω(x) can be in any form, such as the classical
choice TV regularizer [6] or an arbitrary learning based free-form regularizer. Although the
optimization algorithms with high level abstractions (e.g. proximal algorithm [38]) are often used
for problem (3) [5], [7], [39], to show the potential of the proposed idea, we start from the
gradient descent method sitting in a basic level. Let t denote the step index. The vanilla gradient
descent solves for xb (i.e. an estimate of x) via a sequence of updates:
where dt denotes the descent direction, αt denotes the step length, and rf(xt) and rΩ(xt)
denote the gradients of f(·) and Ω(·) at step t. In classic gradient methods, the step length αt is
usually determined by an exact or approximate line search procedure [40]. Specifically, for the
deconvolution problem (3), rf(xt) = λ1 (ATAxt − ATy). Note that rΩ(·) may also be a sub
gradient for some regularizers.
To accelerate the optimization, we can scale the descent direction dt via a scaling matrix
Dt using the curvature information, which can be determined by different ways. For example, Dt
is the inverse Hessian matrix (or an approximation) when the second order information of the
objective is used [40]. We thus arrive a general updating equation at step t:
Given an initialization x0, a general gradient descent solves problem (3) by repeating the
updating in (5) until some certain stopping conditions are achieved. The compact formulation in
(5) offers an advantage to learn a universal parameterized optimizer.
Our final goal is to learn a mapping function F(·) that takes a blurry image y and the blur
kernel k as input and recovers the target clear image x as xb = F(y; k). We achieve this by
learning a fully parameterized optimizer. Given xt from the previous step, the gradient descent in
(5) calculates xt+1 relying on several main operations including gradient (or derivative)
calculation for data fidelity term f(·) and regularizer Ω(·), calculation of scaling matrix Dt and
step length determination.
For enabling the flexibility for learning, we fully parameterize the gradient descent
optimizer in (5). To achieve this, we replace the main computation entities with a series of
parameterized mapping functions. Firstly, we let R(·) replace rΩ(·) to supplant the gradient of
the regularizer.
It implicitly plays as an image prior. Considering that the noise level in y is unknown and
hard to estimate in a prior, a predefined λ is insufficient in practice. We then define an operator
H(·) to handle the unknown noise and the varying estimation error (in xt) by adjusting ATAxt −
ATy. H(·) implicitly tunes λ adaptively. Finally, we define D(·) as a functional operator to
replace Dt in each step to control the descent direction (i.e. dt). R(·) and D(·) absorb the tradeoff
weight γ and the step length αt, respectively. As shown
Fig. 1. (a) The overall architecture of our RGDN. Given a blurry image y and the corresponding
blur kernel k, the optimizer (i.e. Gradient Descent Unit, GDU) U(·) produces a new estimate
xt+1 from the estimate from previous step xt. Note a universal optimizer is used for all steps with
shared parameters. (b) The structure of the optimizer U(·). Each colored block of the optimizer
(Left) corresponds to an operation in the classical gradient descent method (Right). (c) R(·), H(·)
and D(·) share a common architecture of a CNN block with different parameters to be learned.
Both the input and output (for the optimizer and all subnetworks) are H × W × C tensors, where
C is the number of channels of the input image y.
in Fig. 1 (b), by replacing the calculation entities in (5) with the mapping functions
introduced above, the gradient descent optimizer at each step can be formulated as:
where U(·) denotes the parameterized gradient descent optimizer, and G(·) denotes the
gradient generator consisting of R(·), H(·) and D(·). Given an initial x0 (e.g. letting x0 = y), we
can formulate the whole estimation model F(·) as
where ◦ denotes the composition operator, US denotes a the S-fold composition of U(·),
and Θ denotes the set of all parameters of U(·) (i.e. the parameters of R(·), H(·) and D(·)). US
means the optimizer U is performed S times. In each iteration, the optimizer calculates once the
gradient of data fitting term ATAx − ATy. We use the matrix-vector formulation to simplify the
representation and implement it using convolution operations for efficiency.
Specifically, given any image x and a blur kernel k (or the equivalent matrix operator A),
we can implement Ax and ATx as k ∗ x and k¯∗x, respectively, where k¯ denotes the blur kernel
obtained by rotating k counterclockwise by 180 degrees in the 2D plane. Thus ATAx−ATy can
be implemented as k¯ ∗k∗x−k¯ ∗y.
universal gradient descent unit (GDU) to implement U(·) and apply it in all steps in a
recurrent manner (see Fig. 1 (a)). In the GDU, the gradient generator G(·) takes a current
prediction xt of size H × W × C and generates a gradient with the same size. In U(·), the
subcomponents R(·), H(·) and D(·) play as mapping functions with a same size for input and
output as well. Considering that CNNs with an encoder-decoder architecture have been
commonly used to model similar mapping functions, we implement R(·), H(·) and D(·) using
three CNNs with the same structure shown in Fig. 1 (c). Since finding the best structures for each
subnetwork.
Fig. 3. Visualization of the generated updating gradient on a toy image with elementary contents.
(a) The input blurry image. (b) Ground truth image. (c) An intermediate estimate xt with t = 3.
(d) Gradient of the data fitting term, −rf(x3) = −(ATAx3 − Ay). (e) The generated gradient of the
learned optimizer. (f) The residual between the current estimate x3 and the final target image, i.e.
the ground truth image xGT, which can be seen as an “ideal” updating gradient to obtain the
ground truth image with one step updating. The generated gradient in (e) is much more similar to
the “ideal” gradient than the original gradient from data fitting term. Note that (d), (e), and (f) are
visualized with scaling and pseudo-color. The images are best viewed by zooming in.
is not the main focus, we use the same structure as a default plain choice. Nevertheless,
the three CNNs are trained with different parameters, resulting in different functions. We then
construct the GDU by assembling the three CNNs according to the model in (6) (see Fig. 1 (b)).
As shown in Fig. 1, each trainable CNN consists of 3 convolution layers (conv) and 3
transposed convolution (tconv) layers. Except for the first and the last layers, each conv or tconv
is followed by a batch normalization (BN) layer [41] and a ReLU activation function. Following
a widely used setting [13], the first conv is only followed by a ReLU activation function. Apart
from the last tconv, we apply 64 5 × 5 convolution features for each conv and tconv.
The last tconv maps the 64-channel intermediate features to a C-channel RGB output,
where C denotes the number of channels of the image. We set the stride size as 1 for all conv and
tconv. Our contributions are agnostic to the specific implementation choice for the structure of
each subnetwork corresponding to R(·), H(·) and D(·), respectively, which may be further tuned
for better performance.
Towards learning a universal optimizer, the proposed RGDN shares parameters among
the GDUs in all steps, which enables the optimizer (i.e. the shared GDU) to see different states
during the iterations. The learned optimizer can thus handle the dynamically varying states
during the optimization. Combining with the recursive supervision introduced in the following,
the learned universal optimizer can focus on improving the quality of the current estimate in each
update.
Training of the RGDN thus gives us flexibility to repeat the learned optimizer arbitrary
times to approach the desired deconvolution results for different observations. As a result, the
proposed optimizer exhibits a strong generalization for handling images with different levels of
degenerations, even beyond the training data (see Fig. 4 and 6). In practice, we can stop the
process relying on some stopping conditions as the classic iterative optimization algorithms.
Previous methods [8], [15] often truncate the classic iterative optimization algorithm with fixed
step numbers and rigidly train different parameters to only process the images from previous
steps.
However, a static model with a fixed step number may not be suitable for all degenerated
images. The previous models thus require the ground truth noise level as the input hyper-
parameter and/or customized training for specific noise level, which limits the practicability. As
shown in Fig. 1 (b), the three subnetworks corresponding to R(·), H(·) and D(·) are integrated
together as the entities of a gradient descent and trained jointly.
Although the network architecture is designed following the gradient descent process, for
flexibility, we do not restrict each subnetwork to fit the exact intermediate output of the
conventional gradient descent algorithm. The learned subnetworks thus may work beyond the
functions of the conventional optimizer. We observe that the learned optimizer can work stably
and smoothly converges on various cases (as shown in Fig. 6).
Although the subcomponents R(·), H(·) and D(·) are not restricted to mimic the original
optimization operations, they are implanted and trained in the optimization scheme, which
enables the learned subnetworks to take benefits from the classic optimization scheme.
Furthermore, the recursive supervisions on all steps push the universal optimizer to improve the
image quality in each step.
The subnetwork R(·) generate a descent direction only based on the current estimate. As
shown in Fig. 2 (c), the generated gradient explicitly handle the blurry boundary in the images
and avoid to influence the details. Moreover, we synthesize a toy image with elementary contents
and visualize the generated gradient descent direction more intuitively, as shown in Fig. 3. It
visualizes the generated gradient for updating x in an intermediate step.
qFig. 3 (f) shows the difference between the current estimate and the ground truth image xGT
(shown in Fig. 3 (b)), which can be seen as an ideal updating gradient. The gradient of the data
fitting term f(·) (shown in Fig. 3 (d)) is blurry and contains a lot of artifacts. The updating
gradient generated by the proposed method is more similar to the ideal gradient (see Fig. 3 (e)).
Fig. 3 shows that the gradient generated by the proposed optimizer is similar to the
“ideal” gradient, and better than the “original” gradient from the data fitting term. The
visualizations in Fig. 2 and 3 show that the gradient generated by the proposed learned optimizer
is more effective than the simple gradient from the ordinary gradient
Fig.4. Intermediate results of RGDN. (a) and (g) are the input blurry images y and the
corresponding blur kernels k. (a) is an image with noise level 0.15%. (g) is an image from the
training data. Both the images and the kernels in (a) are not in the training set. (b)-(e) are the
intermediate results of the RGDN at the steps #3, #20, #30 and #40. (h)-(k) show the results on
steps #1-3 and #5, since we perform 5 steps during training. (f) and (l) are the ground truth
images.
1) Training loss:
We expect to determine the best model parameter Θ that accurately estimates xb = F(y; k;
Θ) through training on a given dataset f(xi; ki; yi)gN i=1. We minimize the mean squared error
(MSE) between the ground truth x and the estimate xb over the training dataset:
Inspired by [11], [42], we also consider to minimize the gradient discrepancy in training:
where rv and rh denote the operators calculating the image gradients in the horizontal and
vertical directions, respectively. The loss function in (9) is expected to help to produce sharp
images [42]. For all experiments, the models are trained by minimizing the sum of LMSE(·) and
L grad(·).
Instead of solely minimizing the difference between the ground truth and the output of
the final step, we impose recursive supervision [36] that supervises not only the final estimate
but also the outputs of the intermediate steps (i.e. outputs of Ut(·)’s, for t 2 [1; S)). The recursive
supervision directly forces the output of each step to approach the ground truth, which
accelerates the training and enhances the performance (see Section IV-C).
If we apply supervision only on the last step, only the final optimization step provides
information for training the optimizer, rendering inefficient gradient back propagation through
the recursive steps. The recursive supervision on the intermediate steps along the optimization
trajectory allows us to train the optimizer on partial intermediate trajectories, which is similar to
[27] and can help to tackle the issue.
The supervisions on all steps help to train the optimizer that can achieve desired solution
as fast as possible. Let xbt i = Ut(x0 i ; yi; ki) denote the estimate of xi from the t-th step. By
averaging over all training samples and the steps, we have the whole training objective
where τ denotes the importance weight for the loss term on image gradients and the
weights κt; t = 1; :::; S denote the weights for the losses on different steps. In our
implementation, we apply the default setting τ = 1 and κt = 1; t = 1; :::; S for simplicity, although
there may exist a particular “optimal” setting for the weights.
In Section IV-C, we conduct experiments to study the behaviors of different settings for τ
and κt’s. As shown in Fig. 4, the learned optimizer steadily pushes the results close to the ground
truth, which is consistent with the recursive supervision.
3) Implementation details:
Although the number of steps the RGBN takes is not bounded in principle, considering
the training efficiency, we run the optimizer for 5 steps in training (i.e. S = 5). As shown in
experiments, benefiting from the parameter sharing and recursive supervision, the proposed
learned optimizer can obtain sustained performance gain after running with the iterations more
than that in training.
This observation is consistent with the learnable optimizer based meta-learning method
[27]. For training, we randomly initialize the parameters of RGDN. The training is carried out
using a mini-batch Adam [43] optimizer. We set the batch size and learning rate as 4 and 5 ×
10−5, respectively.
We observe that the proposed optimizer can obtain sustained image quality gain after
running with more iterations in testing (as shown in the experiments in Section IV), although it is
trained using a limited number of steps. Thus we apply the learned optimizer for non-blind
deconvolution by using
Fig. 5. Visual comparison on the images with different noise levels. The first two rows show
results from the dataset [44] with σ = 2%. The bottom two rows show results on an image from
the generated BSD-Blur dataset with σ = 3%.
arbitrary steps and stop the processing relying on some certain stopping conditions,
similar to a classic optimizer. Note that the proposed model shares parameters across different
steps. By focusing on every single step during training, we can see that the universal optimizer
(i.e. the GDU) is trained to refine different intermediate images. The optimizer thus can see
various images with different status and noise levels.
The recursive supervisions on all steps encourage the optimizer to lift the image quality
of the diverse images in each step. Thus even training with a limited number of steps can render
strong generalization for running more steps. Moreover, the image prior learned in the first steps
is universal to be applied in more stages for image restoration. Fig. 4 shows that the intermediate
image qualities are progressively lifted during iterations.
Benefited from the learned updating process and the implicit image prior, details are
gradually recovered, and the artifacts are suppressed with increased iterations. Considering that
the optimizer is trained to improve the quality of each intermediate estimate, the learned
optimizer is flexible to afford many iterations to handle the high-level degenerations and fewer
iterations for mild degenerations.
Thus the learned optimizer is able to generally handle the varying visual appearance
among the input images and the intermediate results as well as consistently improve the
estimates. More numerical studies are in Section IV-D. The optimization can be stopped when
achieving jφ(xt) − φ(xt−1)j=jφ(xt) − φ(x0)j < , where φ(x) = ky − Axk2 2 and is a small
tolerance parameter. In practice, a maximum iteration number T is also used as a stopping
criterion.
CHAPTER 4
RESULT
CHAPTER 5
IMPLEMENTATION
5.1.1 PYTHON
• Simple to Understand and Apply Python is a simple language to learn and use. It's a high-
level programming language that's helpful to developers.
• Expressive Language Python is more expressive than other languages, making it more
intelligible and readable.
• Language Interpretation Python is an interpreted language, which means that the interpreter
runs the code line by line. This makes debugging simple, making it suitable for novices.
• Open Source and Free Software Python is a free programming language that can be
downloaded from a secure internet address. It's also possible to get the source code. As a
result, there is an open supply.
• Adaptable It means that other languages, such as C/C++, may be used to put the code
together, and as a result, it can be utilized in our Python code.
5.2 Library Files
5.2.1 NUMPY:
The Python programming language was not initially intended for numerical computing,
but it quickly caught the attention of the clinical and engineering communities, prompting the
formation of a special interest group known as matrix-sig in 1995 with the goal of defining an
array computing bundle. Guido van Rossum, a Python clothier and maintainer, was one of its
contributors, adding changes to Python's grammar (especially the indexing syntax) to make
array computation easier.
Jim Fulton completes a matrix package, which is later modified by Jim Hugunin to
create Numeric, also known as Numerical Python extensions or NumPy. Hugunin, a PhD
student at MIT, joined the Corporation for National Research Initiatives (CNRI) to work on
JPython in 1997, leaving the maintainer position to Paul Dubois of Lawrence Livermore
National Laboratory (LLNL). David Ascher, Konrad Hinsen, and Travis Oliphant were among
the early members.
NumPy is a non-optimizing byte code interpreter that targets the CPython Python
reference implementation. Algorithms created for this version of Python are frequently much
slower than their compiled counterparts. NumPy tackles the slowness issue in part by
providing multidimensional arrays and capabilities and operators that work appropriately on
arrays, which necessitates rewriting some code, generally inner loops, in order to use NumPy.
NumPy arrays are used to store and perform information in the Python bindings of the widely
used computer vision library OpenCV.
Because images with more than one channel are really stored as 3-dimensional arrays,
indexing, slicing, and protecting with separate arrays are all highly eco-friendly ways to access
specific pixels in a picture. The NumPy array, which is used as a standard statistical structure
in OpenCV for pictures, extracted feature factors, kernel filtering, and other tasks, greatly
simplifies the development process and debugging.
5.2.2 PANDAS:
It also has the larger goal of being the most powerful and versatile open source data
analysis/manipulation device available in any language. It is already well on its way to
achieving this goal. A panda is well-suited to a wide range of statistical applications:
• Tabular statistics containing columns of varying types, such as those seen in a SQL database
or an Excel spreadsheet
• Time collecting data that are sorted and unordered (but not necessarily at the same
frequency).
• Row and column labels for arbitrary matrix information (homogeneously typed or
heterogeneous)
To be placed into a panda's facts form, the data does not need to be tagged in any
way. Pandas' two basic statistics systems, Series (1-dimensional) and DataFrame (2-
dimensional), address the vast majority of common use cases in finance, information, social
science, and a wide range of engineering disciplines. A panda is built on top of NumPy and is
intended to work in conjunction with a variety of different third-party libraries in scientific
computing.
5.2.3 MATPLOT LIB:
See the pattern plots and thumbnail galleries for samples. The pyplot package, when
used with IPython, provides a MATLAB-like interface for convenient plotting. Through an
object-oriented interface or a set of methods common to MATLAB users, you have complete
control over line styles, font houses, axis houses, and so on for the electricity consumer.
5.2.4 SEABORN:
Seaborn is a data visualisation package for Python that is mostly based on matplotlib. It
provides a high-level interface for creating visually beautiful and useful statistics graphs.
Seaborn is a Python module for creating statistical visuals. It's based on matplotlib, and it's
tightly integrated with panda’s data systems.
5.2.5 Scikit-learn:
To put it another way, sci-kit study is a free software system studying library for the
Python computer language. It includes support vector machines, random forests, gradient
boosting, k-method, and DBSCAN, among other categorization, regression, and clustering
algorithms, and is designed to work with the Python numerical and clinical libraries NumPy
and SciPy. Scikit-research was created in 2007 as a Google summers of code initiative by
David Cornopean. Matthieu Brucher afterwards joined the challenge and began using it in his
thesis paintings.
INRIA was given consideration in 2010, and the initial public release (v0.1 beta)
became released in late January 2010. The project presently has over 30 active participants and
has received financial support from INRIA, Google, Tiny Clues, and the Python Software
Foundation. In most cases, solving a problem involves looking at a collection of n samples of
facts and then attempting to predict attributes of unknown facts. If a sample has more than one
kind, such as a multi-dimensional entry (also known as multivariate information), it is said to
have many characteristics or capabilities.
CHAPTER 6
CONCLUSION
[1] J. Pan, Z. Hu, Z. Su, and M.-H. Yang, “Deblurring text images via l0-regularized intensity
and gradient prior,” in The IEEE Conference on Computer Vision and Pattern Recognition
(CVPR), 2014, pp. 2901– 2908.
[2] L. Xu and J. Jia, “Two-phase kernel estimation for robust motion deblurring,” in European
Conference on Computer Vision (ECCV), 2010, pp. 157–170.
[3] D. Gong, J. Yang, L. Liu, Y. Zhang, I. Reid, C. Shen, A. v. d. Hengel, and Q. Shi, “From
motion blur to motion flow: a deep learning solution for removing heterogeneous motion blur,”
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
[4] A. Levin, R. Fergus, F. Durand, and W. T. Freeman, “Image and depth from a conventional
camera with a coded aperture,” ACM transactions on graphics (TOG), vol. 26, no. 3, p. 70, 2007.
[5] D. Krishnan and R. Fergus, “Fast image deconvolution using hyperlaplacian priors,” in
Advances in Neural Information Processing Systems (NIPS), 2009, pp. 1033–1041.
[6] Y. Wang, J. Yang, W. Yin, and Y. Zhang, “A new alternating minimization algorithm for
total variation image reconstruction,” SIAM Journal on Imaging Sciences, vol. 1, no. 3, pp. 248–
272, 2008.
[7] D. Zoran and Y. Weiss, “From learning models of natural image patches to whole image
restoration,” in The IEEE International Conference on Computer Vision (ICCV), 2011, pp. 479–
486.
[8] U. Schmidt and S. Roth, “Shrinkage fields for effective image restoration,” in The IEEE
Conference on Computer Vision and Pattern Recognition (CVPR), 2014, pp. 2774–2781.
[11] J. Zhang, J. Pan, W.-S. Lai, R. Lau, and M.-H. Yang, “Learning fully convolutional
networks for iterative non-blind deconvolution,” CVPR, 2017.
[12] J. R. Chang, C.-L. Li, B. Poczos, B. V. Kumar, and A. C. Sankaranarayanan, “One network
to solve them all solving linear inverse problems using deep projection models,” 2017.
[13] K. Zhang, W. Zuo, S. Gu, and L. Zhang, “Learning deep cnn denoiser prior for image
restoration,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017,
pp. 3929–3938.
[14] R. Liu, X. Fan, M. Hou, Z. Jiang, Z. Luo, and L. Zhang, “Learning aggregated transmission
propagation networks for haze removal and beyond,” IEEE transactions on neural networks and
learning systems, no. 99, pp. 1–14, 2018.
[15] J. Kruse, C. Rother, and U. Schmidt, “Learning to push the limits of efficient fft-based
image deconvolution,” in IEEE International Conference on Computer Vision (ICCV), 2017, pp.
4586–4594.
[16] U. Schmidt, Q. Gao, and S. Roth, “A generative perspective on mrfs in low-level vision,” in
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013, pp. 1067–
1074.
[17] D. Gong, M. Tan, Q. Shi, A. van den Hengel, and Y. Zhang, “MPTV: Matching pursuit
based total variation minimization for image deconvolution,” IEEE Transactions on Image
Processing, pp. 1–1, 2018.
[19] L. Sun, S. Cho, J. Wang, and J. Hays, “Good image priors for nonblind deconvolution:
Generic vs specific,” in European Conference on Computer Vision (ECCV), 2014, pp. 231–246.
[20] U. Schmidt, J. Jancsary, S. Nowozin, S. Roth, and C. Rother, “Cascades of regression tree
fields for image restoration,” IEEE Transactions on Pattern Analysis and Machine Intelligence
(TPAMI), vol. 38, no. 4, pp. 677–689, 2016.
[21] Y. Chen, W. Yu, and T. Pock, “On learning optimized reaction diffusion processes for
effective image restoration,” in The IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), 2015, pp. 5261– 5269.
[22] S. V. Venkata krishnan, C. A. Bouman, and B. Wohlberg, “Plug-and-play priors for model
based reconstruction,” in Global Conference on Signal and Information Processing (Global SIP),
2013, pp. 945–948.
[24] D. Geman and C. Yang, “Nonlinear image recovery with half-quadratic regularization,”
IEEE Transactions on Image Processing, vol. 4, no. 7, pp. 932–946, 1995.
[25] M. Jin, S. Roth, and P. Favaro, “Noise-blind image deblurring,” in The IEEE Conference on
Computer Vision and Pattern Recognition (CVPR), 2017.