Reference Document
Reference Document
Reference Document
by
Faisal M. Al-Salem
Doctoral Committee:
Professor Andrew E. Yagle, Chair
Professor Jeffrey A. Fessler
Professor Mahta Moghaddam
Professor Douglas C. Noll
The ability to ask the right question is more than half the battle of finding the answer.
—Thomas J. Watson.
Faisal M. Al-Salem 2010
©
All Rights Reserved
To my family
ii
ACKNOWLEDGEMENTS
I would like to express my deepest gratitude to my advisor, Prof. Andrew Yagle, for
recommending this exciting research topic, for his guidance, and for his patience and
support.
I would also like to thank my thesis committee members, Prof. Fessler, Prof. Noll, and
Prof. Moghaddam for their valuable time, input and suggestions.
Finally, I am grateful to Becky Turanski, the EE: Systems Graduate Program
Coordinator, for her outstanding administrative assistance.
iii
TABLE OF CONTENTS
DEDICATION ..............................................................................................................................................ii
GLOSSARY ...............................................................................................................................................viii
ABSTRACT ................................................................................................................................................. xi
CHAPTER
I. Introduction ........................................................................................................................................... 1
III. Solving for the Expansion Coefficients of the Polyphase Components .......................................... 22
iv
3.3 Mean and Covariance of an Estimated PPC ................................................................................... 36
3.4 Pre-denoising the LR Images using PCA ........................................................................................ 39
3.4.1 The Sample Mean and Sample Covariance Matrix ............................................................... 41
3.4.2 Outlier LR Images and their Effect on Denoising ................................................................. 44
3.5 Color Images .................................................................................................................................. 45
3.6 Post-Processing the SR Image......................................................................................................... 47
3.7 Summary ......................................................................................................................................... 48
3.8 Future Work .................................................................................................................................... 49
BIBLIOGRAPHY ....................................................................................................................................... 95
v
LIST OF FIGURES
Figure
2.2 LR images obtained from a HR image via LSV transformation and downsampling. ............ 13
2.3 In the LSV case, a LR image is a linear combination of separate parts of the PPCs of the
HR image (a LR image can be viewed as a linear local mixing of subregions of the PPCs). 14
4.1 For I = 4, J = 5, m* = 13 and n = 1,...,16, the highlighted (white) blocks represent all the
pairs (n, m) that give the same pairs of sub data matrices given by (n, m*). ......................... 61
5.1 An illustration of the integration effect of the primary LR CCD array corresponding to
4 4 . ................................................................................................................................... 69
5.2 An illustration of the integration effect of the secondary LR CCD array corresponding to
5 5 . ................................................................................................................................... 70
5.6 Approximately pure translations. Details: dog’s face. (# of LRs = 35). . ............................... 78
vi
5.10 Random vibrations: estimating the ref. PPC in the pixel domain. No denoising. .................. 83
5.11 Random vibrations: using a single sec. LR image vs. estimating the ref. PPC. (# LRs = 35). 84
5.17 Atmospheric turbulence: Blind SR vs. Iterative L1. (# of LRs = 100). ................................. 91
vii
GLOSSARY
Symbol Description
viii
Symbol Description
ix
Acronym Description
HR high resolution.
LR low resolution.
LS least squares.
MD median filter.
SR super-resolution.
TV total variation
UM unsharp masking.
x
ABSTRACT
(HR) image from several low-resolution (LR) versions of it. We assume that the original
linear shift-invariant transforms over different subregions of the HR image. The linearly
assumption of linearity, these LR images can form a basis that spans the set of the
components) of the PPCs of the HR image. To estimate the reference PPC, LR images
are acquired using two imaging sensors with different sensor densities. This setup allows
for blind reconstruction of the polyphase components of the HR image by solving a few
small linear systems of equations where the number of unknowns is equal to the number
of available LR images. The parameters we estimate are the expansion coefficients of the
PPCs in terms of the LR basis, using the subpolyphase components. Both synthetic and
real data sets are used to test the algorithm. The major features of our approach are: (1) it
is blind, so that unknown motion and blurs can both be incorporated; (2) it is fast, in that
xi
only small linear systems of equations need to be solved; and (3) it is robust, in that it
avoids the problem of system model errors by treating the LR images as basis for
xii
CHAPTER I
Introduction
Image resolution is determined by two main factors. Blurring, due to optical limits and
various other processes (like the effect of the atmosphere and motion blur, for example),
results in soft images, while low-sensor density of the imaging device causes aliasing.
Signal processing based super-resolution (SR) methods are typically concerned with
overcoming the resolution limitation resulting in aliasing (although such techniques do
take blur into consideration). In this context, ‘resolution’ refers to the sampling interval,
or pixel size. Coarse sampling (pixels of relatively large size) results in ‘low resolution’
images, while ‘high resolution’ images correspond to fine sampling (pixels of relatively
small size)1. This is in contrast to optical super-resolution where the aim is to beat the
diffraction limit2 [40]. Optical SR methods are expensive and are usually developed to
enhance the resolution of an already expensive imaging system [41] that is capable of
producing very high resolution images (up to the diffraction limit). Henceforth, the term
‘super-resolution’ shall be used exclusively to refer to the process of overcoming the
sensor density limitation using signal processing methods3.
Multiframe super-resolution is a technique that provides a cheap alternative to
increasing the sensor density of an imaging chip, by combining multiple low-resolution
(LR) images into a high-resolution (HR) image [1]. In particular, for more pixels, one
_____________________________
1
For example, an image of an actual width of 0.5 meter, and height of 0.5 meter, can be sampled at
1000 samples per meter, in each direction, to obtain an image of 500x500 pixels. At a lower sampling rate
of 200 samples per meter, we obtain a lower resolution image of 100x100 pixels.
2
Diffraction of light results in blurring. It defines the maximum limit on resolution (acutance) of the
optical system.
3
A diffraction-limited imaging system can still benefit from signal processing-based super-resolution
techniques when imaging a larger field of view (zooming out). See §5.1 for details.
1
could either use a larger imaging chip, and consequently a larger lens will be needed, or
decrease the pixel size which requires very high quality photo sensors that can perform
well under deprived light conditions. Both options result in a substantial increase in cost.
A third, much cheaper option, is to use super-resolution techniques.
Beyond cost reduction considerations, there are optimal physical limits on pixel
density (and chip/lens size). For example, particularly large pixel spacing is required in
some applications (in infrared imaging, for example [73]). Therefore, super-resolution is
the only option when the optimal physical limits of sensor manufacturing (or the imaging
system) are met.
The classical solution of the multiframe super-resolution problem is based on the
following premise: given relative scene motions, we get different LR frames that can be
combined into a HR image. In order for the scene motion to be useful in conventional
multiframe SR techniques it must be different from frame to frame and modeled as a
linear transformation. For example, the motion could be global (pure translations), local
(general linear warping) or due to rotation. For many motion-based SR methods, the
estimation of motion information (registration) is needed as a preliminary step. Typically,
these methods assume available motion information or implement one of the available
registration techniques [18, 19]. The extra computational load, required by the
registration process, can be significant for cases more complex than the global motion
model.
In order to reduce the effect of registration error on the super-resolved image, some
methods, e.g. [43, 44], jointly estimate the motion parameters and the HR resolution
image [1].
Also, these classical methods incorporate in their models the presence of both blur and
noise as unwanted terms. Most of these techniques either assume that the blurring
kernel(s) is known or could be identified via one of the blind blur identification methods
[20]. Also, additive white Gaussian noise is usually assumed.
As our proposed method adopts a novel and completely different approach, in this
thesis, we provide a very brief review of SR methods.
2
Super-resolution reconstruction started as a frequency-domain technique. The original
idea of dealiasing in the frequency domain dates to [3] and was improved by others, for
example [4-6]. These methods are theoretically simple and computationally efficient.
However, their use is restricted to the case of pure translational motion and more
importantly they are sensitive to errors [1, 14].
A more robust approach is solving the problem in the spatial domain. In fact, all
modern techniques adopt the spatial (pixel) domain approach where the solution of a very
large scale, ill-posed system of linear equations is sought. Different spatial domain
methods use different assumptions and different approaches to the solution of the same
matrix formulation and they are, in general, computationally expensive. This is especially
true for projection type methods [16, 17]. Refer to [1, 2] for a comprehensive review of
these and other techniques.
Elad and Hel-Or [7] provide a spatial domain solution to the special case of pure
translation problem treated in [3-6]. They take advantage of this special case to develop a
fast algorithm and optimality of their solution is shown to be in the maximum likelihood
(ML) sense.
In [12] the authors adopt a completely deterministic approach to the solution of the
large system of equations. Blurring is assumed to be known and the same for all acquired
LR images and as is the case with typical motion-based SR techniques, the authors
assume that the registration information is either available or estimated using one of the
available image registration methods. They implement Tikhonov regularization to
stabilize the solution with the regularization parameter automatically determined using
the generalized cross-validation method (GCV). They provide a proof for the GCV
formula for underdetermined systems and conjugate gradient (CG) algorithm is then used
to iteratively solve the large system of linear equations. To accelerate convergence they
derive and implement preconditioners. Later in [13] the authors improve on their
previous work by developing a parametric estimation of the blur.
Other researchers, for example [8, 9, 15], have considered implementing stochastic
regularization where a priori knowledge of the distribution of the HR image is used to
constrain or stabilize the solution. In [8] the authors show that using a maximum a
posteriori (MAP) estimator reduces the problem to solving the same huge system of
3
equations with the regularization term being stochastically determined. Stochastic
regularization can have the advantage of edge-preserving reconstruction when the image
prior’s distribution model is accurate [1].
For its edge-preserving properties, the authors in [14] advocate using bilateral total
variation method rather than Tikhonov regularization. Inspired by [42], the authors in
[14] use the L1-norm for the data-fitting term, which gives solutions that are robust to
outliers and registration errors. Their algorithm is relatively fast and when specialized to
the case of pure translations it becomes even faster.
Unlike the conventional motion-based SR techniques, multiframe motionless SR does
not require relative motion to estimate the HR image. This class of multiframe SR
methods seeks HR image reconstruction using different blurs, zoom or photometric cues,
and whole publications are devoted to this special class of SR techniques, for example
[34, 9-11]. In fact, it was first shown in [8] that motionless SR is possible from
differently blurred images. In contrast to motion-based SR, which treats the blurring
process as a nuisance, in motionless SR the blurs are taken advantage of to produce a HR
image. Blur-based motionless SR techniques usually assume that the blurs are known, but
there are some attempts (for example [34]) at blindly de-mixing the polyphase
components of the HR resolution image by treating the problem as a multiple input
multiple output (MIMO) system with the input being the polyphase components. The
authors in [34], however, reported that their blind method is very sensitive to error.
A recently active area in the field is single-frame super-resolution, where a HR image
is obtained from a single LR frame using a training set of images of similar statistical
nature [37]. The performance is dependent on the size and choice of the set of example
images. Such learning-based methods are expected to perform well when specialized to
super-resolving images with specific structure like face images [35, 36].
1.2 Contribution
4
image could be different distortion (e.g. blurring) processes or due to motion
(global or local). Therefore, our work is different in the sense that we can make
use of either motion or blur. This is different from motion-based methods in that
they only make use of motion and incorporate blur in their model as a nuisance
term. It is also different from the blur-based motionless algorithms, as these do
not incorporate motion at all in their model.
LR images as basis: Instead of reconstructing the HR image directly, we solve
for the expansion coefficients of its polyphase components (PPCs) in terms of the
available LR images under the assumption that the LR images can form a basis to
reconstruct the polyphase components.
Blind reconstruction via sampling diversity: Since we solve for the expansion
coefficients of the PPCs in terms of the LR images, our proposed method is blind
in the sense that, unlike other multiframe SR algorithms, our method requires no
registration or blur estimation. These coefficients are estimated using only a tiny
portion (a subpolyphase component) of each PPC. These subpolyphase
components are determined via the property of sampling diversity (chapter II) by
using a single PPC, corresponding to a different downsampling factor, as a
reference. This reference PPC can be estimated using two sets of LR images,
captured with two different imaging chips with different sensor densities (chapter
IV).
Speed: Our method involves the solution of a few small linear systems of
equations where the number of unknowns is equal to the number of available LR
images. This implies that the implementation of the method is inherently fast.
1.3 Applications
We list here a few examples of practical cases on which our algorithm could be used:
Just like every motion-based SR technique, our method can handle the classical
problem of achieving SR using (approximately) pure translational sub-pixel shifts.
However, unlike previous work, our fast blind reconstruction method does not
require registration as a preliminary step.
5
Because of the random nature of the motion blur associated with vibrating
imaging systems, conventional registration methods perform poorly, and as a
result, the performance of conventional motion-based SR methods suffers. In our
case, the randomness of the motion blur is actually a desired quality and no
estimation of the motion blur or image registration is needed, and images are
super-resolved fast, and all for the simple hardware requirement of adding another
(secondary) lower resolution CCD sensor.
When the imaging medium is the turbulent atmosphere, the effect can be modeled
as a time-variant, shift-variant point spread function (PSF). In §5.1 we discuss the
applicability of our method in this scenario.
This thesis is organized as follows. In chapter II, we introduce a novel approach to the
problem of multiframe super-resolution where the set of LR images is viewed as a basis,
in terms of which, the PPCs of the HR image can be represented. In addition, we
introduce the property of sampling diversity which reveals a tiny portion (a subpolyphase
component) of each one of the PPCs, using a reference PPC of different sampling. In
chapter III, we investigate different classical methods to solve for the expansion
coefficients of the PPCs in terms of the LR basis, using the subpolyphase components. In
chapter IV, we address the problem of estimating the reference PPC, which can only be
achieved using two sets of LR images captured by two different imaging sensors with
different sensor densities. Applications and experimental results are discussed in chapter
V, and the thesis is concluded in chapter VI.
6
CHAPTER II
2.1 Introduction
motion matrix of size M1M 2 M1M 2 , H katm is the M1M 2 M1M 2 matrix representation of
the k-th atmospheric blurring effect, H kcam is the M1M 2 M1M 2 matrix representation of
the k-th camera blur, is of size m1m2 M1M2 and represents the decimation operation,
and k is the noise vector. The term ‘atmospheric blur’ shall refer to blurring due to
atmospheric, and other types of blur (e.g. motion blur), that are not the direct result of the
limitations of the imaging system. The camera’s optical blur and CCD integrating effect
are represented by H kcam . Because H katm and Fk can be represented with block circulant
matrices, they commute [14] and (2.1) can be re-written as
7
yk H kcam H katm Fk u k
(2.2) H k Fk u k for k 1,..., K ,
where H k H kcam H katm merges the blur effect in one matrix representation. See Figure
2.1 (b), for a graphical depiction of (2.2).
As mentioned in chapter I, typical classical SR reconstruction techniques assume Fk to
be known and usually assume the blurring process to be known and it is viewed as an
unwanted term. On the other hand, blur-based motionless SR takes advantage of the
known blurring process if it is different for each measured image, and it assumes Fk to be
the identity matrix [8]. The additive noise is usually assumed to be white Gaussian noise.
Combining the equations in (2.2), we get
y1 H1F1 1 S1
u u
yK H K FK K S K
(2.3) Y Su .
8
numerical algorithm to solve the problem efficiently. However, the speed of even the
fastest of these algorithms is limited by the fact that the number of the unknowns in (2.3)
is equal to the number of pixels in the HR image itself (e.g., 250,000 unknowns, for a HR
image of size 500x500).
Original Original
scene scene
Discretization
atmospheric
/motion blur
HR image
(u)
Discretization
II
noise noise
( k ) ( k )
+ +
Figure 2.1: The observation model. (a) the actual physical process of image acquisition. (b) equivalent
discrete observation model.
9
2.2 Low-Resolution Images as Basis Signals
The conventional matrix formulation (2.3) used for spatial domain SR methods can be
replaced with a much more efficient formulation if each LR image is a decimated version
of the HR image after going through a finite support linear shift invariant (LSI) transform
(e.g., a finite impulse response (FIR) filter or a point spread function (PSF) )
10
where hk is the lexicographical unwrapping of the k-th FIR filter coefficients of size
vector u1,2 is also the unwrapping (by column) of the ( 1 , 2 )-th submatrix of the
u1 , 2 k1 , k2 u k1I L2 1 1 , k2 I L1 2 1
(2.5) for 1 1,..., L2 , 2 1,..., L1.
of the HR image u. If, however, L1 I and L2 I then only I2 of these are the polyphase
components. Let c be the column index of the image matrix, U. Then U c (c-th column in
1
computational cost, Uˆ YH T HH T . Note that HH T has the small size of L1L2 L1L2 ,
and thus the computational cost depends mainly on the size of the kernels.
Of course, the kernels are not always known and using any algorithm to estimate them
means substantial additional computations and according to simulations, even when the
system matrix H is known and well-conditioned, adding small perturbations to it can
result in large errors. This means that solving (2.4) is sensitive to estimation errors of the
system matrix, H.
11
The idea that a LR image can be written as a linear combination of the PPCs is not
new (although, the matrix formulation (2.4) is novel). In fact, in [34], the authors
developed a motionless blur-based SR algorithm with computational complexity that is
mainly dependent on the size of the blurs rather than the size of the HR image (unlike in
[8] where the formulation (2.3) was still used, with the motion matrix set to identity).
Their contribution was to blindly estimate restoration filters to recover the PPCs, but
their algorithm is very sensitive to error. Similarly, solving (2.4) is sensitive to errors in
the system matrix, and therefore there is little motivation to try to estimate the kernels.
Moreover, even if we somehow could estimate the kernels quite accurately, the
assumption that the different kernels must be of the same finite size is quite restrictive.
Nevertheless, equations (2.4-2.6) are useful in answering the question as to when the
LR images can span a subspace for the PPCs. Specifically, these equations tell us that, for
the case of the same size LSI kernels, the LR images are linear mixtures of the PPCs
‘and’ other image sub-matrices (rearrangements of elements of the PPCs). Therefore,
when the LR images are mixtures of K ’submatrices (including I 2 PPCs) then we need K ’
mixtures (LR images) in order to be able to write the PPCs as linear combinations of LR
images.
Now suppose we have available the PPCs and we calculate their expansion
coefficients in terms of a set of different LR images that do not satisfy the assumptions
exactly (LSI, same finite support kernels and sufficient number of LR images) and then
using these expansion coefficients we reconstruct the PPCs. In another scenario, where
we have exact knowledge of the transform kernels, suppose we ‘approximate’ them (the
kernels) to fit our model (2.4) and then solve the problem. Which one of the two
scenarios is expected to give better results? Noting that in the first case there is no wrong
solution but rather a possibly incomplete one, we can easily expect the reconstructed
PPCs of the first scenario to be much better.
Essentially, equations (2.4-2.6) give insight (under the LSI assumption) as to how
many LR images might be enough to fully represent the PPCs but this does not mean that
the PPCs cannot be represented, at least partially, by any number of available LR images.
While formulations like (2.3, 2.4) are inverse problems, and as such are sensitive to
model errors, finding the expansion coefficients of the PPCs is simply a change of basis.
12
2.2.2 LSV Transforms
When the HR image undergoes a linear shift-variant (LSV) transformation that can be
approximated as a set of local LSI transforms3 (over different subregions of the HR
image) then the previous discussion can be readily extended to the case of LSV
transforms.
To be more precise, suppose the LSV transform can be approximated as r LSI kernels
over r different subregions of the HR image. One option is to treat these subregions as r
different HR images where we can reconstruct the PPCs of each one of them separately4.
Alternatively, we can reconstruct the PPCs of the whole HR image but with r times more
LR images5. This is because in the case of a LSV transform, a LR image can be viewed
as a linear local mixing of subregions of the PPCs and therefore to reconstruct each PPC
as a whole, we need r times more LR images than it is required in the LSI case.
For example, suppose a square HR image undergoes a LSV transform that can be
approximated as 4 LSI kernels over the 4 quadrants of the HR image, each with
approximately equal finite support of size 3 3 . The linearly transformed HR image is
then downsampled by 3 3 to produce the LR images as shown below.
Figure 2.2: LR images obtained from a HR image via LSV transformation and downsampling.
In light of (2.4-2.6), we know that the -th quadrant of the k-th LR image can be written
as a linear combination of the -th quadrants of the 9 PPCs of the HR. This means that
the whole of the k-th LR image can be written as
r I2
(2.7) yk n U n Z ,
1 n1
_____________________________
3
Rotation of an image is an example of a linear transform that cannot be approximated as a set of local
LSI transforms.
4
Reconstruction of subregions of the HR image separately has a downside as will be discussed in §2.3.
5
Although the LSV case does require more LR images, according to simulations, good results can be
achieved with a smaller number than recommended.
13
where denotes the element-wise multiplication operator, U n is the n-th PPC of the HR
image, Z is an all-zero matrix except for the elements corresponding to the -th
coefficients. These are the elements of the -th LSI kernel. See Figure 2.3 for an
illustration of equation (2.7). Naturally, since a LR image is composed of rI 2 4 9 36
separate parts of the PPCs, then in order to be able to write the PPCs as linear
combinations of the LR images, in the LSV case, we will need K rI 2 LR images. Note
that if the size of an LSI kernel is L1 L2 , L1 I and L2 I , then we need K rL1L2 , for a
complete basis.
0 0 0 0 0 0
+
2nd 2nd 2nd
0 quad. of 0 quad. of 0 quad. of
12 1st PPC. 2 2nd PPC. 2 9th PPC
2 9
0 0 0 0 0 0
0 0 0 0 0 0
14 24 94
4th 4th 4th
0 quad. of 0 quad. of 0 quad. of
1st PPC. 2nd PPC 9th PPC
Figure 2.3: In the LSV case, a LR image is a linear combination of separate parts of the PPCs of the HR
image (a LR image can be viewed as a linear local mixing of subregions of the PPCs).
14
2.3 Sampling Diversity
In the previous section we explained that, under the assumption of linearity of the
2
transformations, the I polyphase components (PPCs) of the HR image can be written as
linear combinations of the LR images, i.e.
2
(2.8) U n nI 1 R Y ,
where R Y denotes the range (column space) of Y. Throughout the discussion in this
section, we make the assumption that we have available only one of the J 2 PPCs of the
HR image (corresponding to J J ), where I and J are two relatively prime integers. In
other words, we assume that we know the m-th PPC, U m for some m between 1 and J 2 .
Henceforth, we refer to this known PPC (of different sampling) as the reference PPC.
When I and J are relatively prime, the following property holds: any two PPCs
corresponding to I I and J J respectively, share exactly6
M 1M 2
(2.9)
I 2J 2
pixels between them. These are the elements of a PPC corresponding to IJ IJ . Said in
the J 2 PPCs corresponding to J J and I and J are relatively prime, then U q , one of
and q, j, i, respectively. We refer to this property as the sampling diversity property. See
Table 2.1 for a more concise definition of this property.
_____________________________
6
The number of common elements is exactly M1M 2 I 2 J 2 when the dimensions of the HR image are
integer multiples of IJ.
15
Therefore, if we know one of the J 2 PPCs of the HR image, then we already know
HR image
I I IJ IJ J J
U q q 1
2 I 2J 2 2
polyphase
components
U n nI 1 U m mJ 1
J J I I
J2
subpolyphase
components
U n, j j 1 I2
U m ,i i 1
Table 2.1: Sampling diversity: when I and J are relatively prime, there exist 1-1 mappings T, Tn , and Tm ,
such that Uq Un, j Um,i for q T m, n , j Tn m , and i Tm n .
_____________________________
7
In §2.2.2 we discussed that one option to deal with the LSV case is to super-resolve subregions of the
HR image separately. The disadvantage of this approach is that the number of shared elements will be
smaller since M1 and M 2 in (2.9) will become smaller (the dimensions of subregions of the HR image).
16
to relatively prime downsampling factors must share exactly a subpolyphase component.
So in this example the question is: which one of the sub PPCs of the 3rd 2 2 PPC (
Un3 ) is equal to which one of the sub PPCs of the 9th 3 3 PPC ( Um9 )? By
examining Figure 2.5, it is easy to see that the answer is the 8th and the 3rd, respectively
(i.e. j Tn m 8 , and i Tm n 3 ).
By examining many configurations, such as the one shown in Figure 2.5, for different8
J I 1 , m and n, we derived the mapping functions Tn and Tm . Unfortunately, these do
not seem to have a simple analytical form. We provide a description of these functions,
below.
Function Tn m :
1 2 J
J 1 J 2 2 J
AJ
2 2
J J 1 J J 2 J 2
n
Tn1 n 1
I
T1
rn1 n
J
c1n J rn1 J Tn1
T
BJ circshift AJ , [ rn1 , c1n ]
Tn BJ :
Tn m Tn J 2 m 1 .
_____________________________
8
Instead of the more general case of I and J being relatively prime, we restrict our discussion to the
case of I and J being two consecutive integers (larger than 1) since this gives the largest possible number of
common elements between any two PPCs corresponding to I I and J J .
17
Function Tm n :
1 2 I
I 1 I 2 2 I
AI
2 2
I I 1 I I 2 I 2
v 1 I I 1 I 2 1
T
d mod m, J
if d 0
d 1
end
m
r v
J
c v d
Tm1 AI r , c
T 1
rm1 m
I
c1m I rm1 I Tm1
T
BI circshift AI , mod I rm1 1, I , mod I c1m 1, I
Tm BI :
Tm n Tm n .
Note: circshift A, [ r , c ] is a function that circularly shifts down the rows in matrix A by
r, and it circularly shifts its columns to the right by c. If r is negative the rows are shifted
upwards. If c is negative, the columns are shifted to the left.
In the previous sections we explained how the property of sampling diversity gives us
a small part (a subpolyphase component) of each one of the I I PPCs, when we know
a single J J PPC, and I and J are relatively prime. In chapter III we investigate how
18
to use these subpolyphase components to find the expansion coefficients (in terms of the
available LR images) of all the I 2 PPCs of the HR image. In chapter IV, we address the
problem of estimating a single J J PPC, which we refer to as the reference PPC.
As we will see in chapter IV, the estimation of the reference PPC is possible if we
have two imaging sensors (e.g. two CCD arrays) with different sensor densities
corresponding to I I and J J , respectively. We shall refer to the CCD array with
the higher sensor density, as the primary CCD sensor; the secondary CCD sensor is the
one with the lower density9.
These sensors must therefore be designed to satisfy the requirement of relatively
prime downsampling. In particular, if we want to reconstruct HR images of size M1 M 2 ,
where M1 and M2 are integer multiples of IJ, and J I , then the primary CCD array
must have
M M
m1 m2 1 2
I I
I I
m1S m2S m1 m2
J J
19
which is an optical device (a half-silvered mirror or a cube prism) that splits a beam of
light in two, where half of the light is transmitted through (to the primary CCD array)
while the other half is reflected, at a right angle (towards the secondary CCD array). The
only disadvantage of using a beam splitter is that the signal-to-noise ratio (SNR) will
decrease by 6 dB since only half the amount of light reaches the sensors. Using a larger
aperture allows more light in, at the expense of loss of depth of field11. Another solution
is using a non-stationary 100% reflective mirror that moves in the optical path, for only
half of the imaging time, reflecting all the light towards the secondary sensor.
Beam
Splitter
Primary
CCD
array
Camera
Lens
Secondary
CCD array
_____________________________
11
Depth of field is the portion of an image that appears sharp due to focusing at only one distance. The
loss of sharpness as we move away from the focus point is gradual and is proportional to the aperture size.
20
The 1st pixel in the HR image
The 1st pixel in Un3
The 1st pixel in Um9
The 1st pixel in sub PPCs, U n, j U m,i
▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪
(▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪
▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪
(▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪
▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪
(▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪
▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪
(▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪
▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪
(▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪
▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪
(▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪
▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪
(▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪
▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪
(▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪
▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪
(▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪
▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪
(▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪
▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪
(▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪
▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪
(▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪ (▪) ▪
Figure 2.5: An illustration of the property of sampling diversity. For I = 2, J = 3, n = 3 (the 3rd out of 4
2 2 PPCs) and m = 9 (the last of the 9 3 3 PPCs), the polyphase components U n and Um , have
21
CHAPTER III
3.1 Introduction
In chapter II, we explained how the property of sampling diversity can be used to find
portions (sub PPCs) of all the I I PPCs of the HR image, with the help of a reference
PPC of different sampling. In addition, we noted that under the assumption of the
linearity of the transforms, the LR images can be viewed as a basis spanning a subspace
where the PPCs exist.
Our goal, in this chapter, is to find the expansion coefficients of the PPCs in terms of
the LR basis, using their sub PPCs. The diagram in Figure 3.1 gives a pictorial summary
of how the PPCs are reconstructed.
between 1 and J 2 , and using it as our reference PPC, we obtain a sub PPC of each one of
I2
the ↓ I x I PPCs, U n n 1 . In other words, using the reference PPC, we obtain the I 2 sub
PPCs, Un, j n1 , for j Tn m . Because the reference PPC contains error, all the sub PPCs
I2
will be noisy as well. Namely, the j-th sub PPC, Un, j is related to the n-th PPC, Un via
(3.1) U n , j D jU n e ,
22
where D j is a ↓ J x J matrix (performing shifting and decimation) that gives us the j-th
sub PPC from the n-th PPC, and e is assumed to be zero-mean, white Gaussian noise,
i.e.
e ~ 0, R ,
1 1 2
p U n, j ; xn exp 2 U n, j D jYxn .
p 2
2 e2 2 e
The maximum likelihood (ML) estimator of the expansion coefficients is therefore given
by solving the minimization
2
min U n , j D jYxn .
xn
A D jY
b U n, j
(3.3) x xn .
2
(3.4) min Ax b ,
x
23
which has the LS solution
1
(3.5) xˆ AT A AT b.
property of
a reference PPC of sampling diversity
different sampling
sub PPCs
LR basis
expansion
LR basis
coefficients
PPCs
Note that
1
xˆ ~ x , e2 AT A ,
and since it attains the Cramer-Rao lower bound (CRLB), then it is the minimum
variance unbiased estimator (MVUE). Another way to prove this classical result is via the
use of Gauss-Markov theorem which states that when the error model is linear (3.1) and
the noise is zero-mean, uncorrelated and with the same variance, then the LS solution is
the best (minimum variance) unbiased estimator (BLUE). If the noise is also assumed to
24
be Gaussian then the BLUE is the MVUE because a linear estimator requires only first
and second order statistics and these are sufficient statistics in the Gaussian case [30].
Any ML estimator is asymptotically Gaussian1, asymptotically unbiased, and
asymptotically efficient, i.e. it attains the CRLB with more samples (larger LR images,
see below). And with the assumptions made at the beginning of this section, the LS
solution is the ML estimator and it is unbiased and efficient with Gaussian distribution
(since it is a linear function of b U n , j ), and it is unique when A has full column rank.
(3.6) p M 1M 2 I 2 J 2 K ,
where K is the number of LR images. In other words, in order for the problem to be
overdetermined, p, which is the number of the pixels in a sub PPC (which is the same
number of pixels in sub LR images reordered as columns in the sub data matrix A), must
be larger than the number of LR images. This means that the systems of equations we
solve become more overdetermined by super-resolving larger LR images which can lead
to an even lower CRLB bound to be asymptotically (or exactly, with our assumptions)
attained by the ML estimator. For example, obtaining a HR image that is 4x4 times
larger than LR images of size 200x200 can give a lower variance estimate, than does
super-resolving (by the same factor of 4x4) smaller LR images of size 100x100. In short,
it is preferable to super-resolve the HR image in its entirety rather than working on
subregions of it. Of course, if the LR images are too large then we might need to super-
resolve subregions of the HR image to lower memory requirements (and the
computational cost).
Finally, we note that by the invariance property of the MLE,
Uˆ n Yxˆ
is also the ML estimator of the n-th PPC, Un . It is also unbiased and efficient with
Gaussian distribution
_____________________________
1
Knowledge of the (asymptotic) distribution of an estimator is useful for purposes of statistical
inference.
25
1
Uˆ n ~ U n , e2Y AT A Y T .
Given the fact that the columns of the data matrix, Y, are assumed to be ‘noise-free’
LR images, we would expect the data submatrix, A to be ill-conditioned. This is due to
the fact that the LR images are highly correlated and thus columns of Y are hardly
linearly independent. Also, if Y has singular values 1 K and A has singular
values 1 K then by the interlacing theorem for singular values [26] we have
k k for k 1,..., K ,
and therefore if Y is ill-conditioned, then so is A. If that is the case, the solution (3.5) is
numerically unstable. In order to see this, let wk , k , vk kK1 denote the singular triplets
(left singular vectors, singular values and right singular vectors) of A, then equation (3.5)
can be re-written as
K
wkT b
xˆ vk .
k
k 1
Therefore, when the last few singular values are very small (A is ill-conditioned), the LS
solution will be unstable, resulting in noise magnification as 1 k for the small
singular values and the components of noisy b in the direction of wk represent the most
26
rK
wkT b
xˆTSVD
vk .
k 1 k
K
1
xˆTik AT A 2 I AT b k2 k 2 wkT b vk ,
k 1
where is the regularization parameter. This is the solution to the minimization problem
2 2
min Ax b 2 x .
x
1 1 2 1 1 2
max p b | x p x exp 2 b Ax . exp 2 x
p 2 K 2
x
2 e2 2 e 2 x2 2 x
1 1 2
2
2
exp 2 b Ax e2 x .
c 2 e x
In addition, if we further assume that the expansion coefficients and the noise are
independent, and thus x and b are jointly Gaussian ( p b, x is Gaussian), then the
27
coefficients. Unlike Bayesian methods, penalized likelihood does not assume prior
knowledge of the distribution of the parameters (expansion coefficients).
In the previous section, we made the assumption that the data matrix, Y, and thus the
data submatrix A (3.3), are noiseless which is rarely ever the case. This means that the LS
solution is not the ML estimator. Nevertheless, if we ignore the fact that A is noisy and
apply the LS solution then we do not need any regularization as A is already well-
conditioned (the smallest singular values will never be zero due to presence of noise).
However, by opting to ignore the fact that A contains error then we will have a biased
solution corresponding to the projection of b on the wrong space (columns of A are
noisy).
The total least squares (TLS) generalizes the original least squares solution by
accounting for presence of noise in A. Specifically, the LS solution, which minimizes
2
Ax b , is equivalent to solving the problem
2
min bˆ b subject to Ax bˆ .
bˆ , x
That is, b̂ is the smallest possible perturbation of b which lies in the range of A. In other
words, we perturb b just enough to ensure that the perturbed equation has a solution, and
then solve this system of equations. Now, if A is also subject to noise, then why not
2
perturb A as well as b? That is, seek  and b̂ such that2 A b Aˆ bˆ
F
is as small
as possible subject to bˆ R Aˆ . Then Âx bˆ has a solution, and any such solution is
_____________________________
2 2
The notation F
denotes the squared Frobenius norm of a matrix, which is the sum of the square of
all its elements.
3
For a matrix A p K with p K , it is sufficient (and more economical) to compute the left singular
vectors corresponding to the non-zero singular values only.
28
K
(3.7) A W V T
k wk vkT ,
k 1
K 1
(3.8) A b W V T k w k vkT .
k 1
2
(3.9) min
Aˆ , bˆ , x
A b Aˆ bˆ
F
subject to Âx bˆ .
The TLS problem (3.9) is non-convex. Nevertheless, an analytical solution does exist.
We start by rewriting Ax b as
T
A b x T , 1 0.
K
(3.10) Aˆ bˆ
k w k vkT ,
k 1
T
ˆ ˆ ˆT
A b x , 1 0.
(3.11)
Therefore
T
xˆ T , 1 -1
vK 1
V K 1, K 1
-1 T
xˆ V 1, K 1 V K , K 1 ,
V K 1, K 1
29
Statistical Properties of the TLS Solution
A potential problem with the TLS solution, in our case, is due to the fact that the LR
images are highly correlated causing the gaps between the last few singular values of
A b to be very narrow5. This means the solution of the TLS problem (3.11) is not
K 1 1 K ,
_____________________________
4
While, in terms of bias, the LS solution does not benefit much, especially at higher levels of noise,
from increasing the overdeterminedness of the systems of equations (super-resolving larger LR images), its
bias is significantly reduced by increasing the number of LR images. Indeed, adding noise to a complete
basis renders it incomplete, and this is precisely why the solution becomes more biased with higher noise
levels in the data matrix. In other words, adding noise to the available LR images makes their number
effectively lower. See the beginning of §2.2 and also the end of §2.2.1 regarding using LR images as a
basis set.
5
The smallest singular values correspond mostly to noise, and in the case of same variance,
uncorrelated noise, they tend to be equal in size.
30
then any linear combination of vK1 , vK , … , v 1 solves the TLS problem provided it
T
results in a vector of the form xˆ T , 1 [23, section 3.3.1].
1
(3.12) xˆ AT A K2 1 I AT b.
We review the simple proof here for convenience. First note that
T T
A b
T
A b xˆ T , 1 K2 1 xˆ T , 1 ,
also,
T AT A AT b T T
A b A b x , 1 T
T
ˆ T
T
xˆ , 1 .
b A b b
Equating the top row of the right-hand-side of the last two equations we get
AT Axˆ AT b K2 1 xˆ ,
which gives (3.12). Now, the interlacing theorem [26] implies that
1 1 K K K 1,
K
and realizing that the matrix AT A K2 1 I has singular values k2 K2 1 , we notice
k 1
that the TLS solution can be numerically unstable when the smallest singular values of
A b are close to each other. In fact, TLS can be seen as an attempt to reverse the
process that made A and b noisy, and compared to LS, it can be viewed as a de-
regularization procedure [33].
31
the solution with the minimum norm, i.e. Tikhonov regularize the TLS solution (TRTLS).
First note that problem (3.9) is equivalent to6
2
(3.13) min
Aˆ , x
A b Aˆ Ax
ˆ
F
.
Using Lagrange multiplier formulation [31], the authors in [27] proved that (3.14) has the
solution
1
(3.15) xˆ TRTLS AT A 2 K2 1 I AT b.
Note that for 2 K2 1 , we get the LS solution. In our case, A is rarely ill-conditioned
because it is a submatrix of the data matrix Y which is always contaminated with noise.
This precludes the need for increasing the regularization parameter beyond K2 1 . In fact,
the notion that a certain amount of error in the coefficient matrix might actually be
beneficial is discussed, even within the context of super-resolution, in [28]. Therefore our
choice of the regularization parameter should lie within
0 2 K2 1 ,
where the lower limit achieves the TLS solution while the upper limit gives us the LS
solution.
The idea of using the L1-norm to penalize the least squares solution was first
presented in the context of Linear Regression [29] under the name Least Absolute
Selection and Shrinkage Operator (LASSO). The use of the L1-norm was motivated by
the desire to get rid of irrelevant features for easier interpretability. An L1-norm penalty
function has the property of concentrating on minimizing small residuals as opposed to
large ones. Therefore, when the residuals are the elements of x, this gives us a sparse set
_____________________________
6
One way to prove the result (3.12) is using Lagrange multipliers for (3.13). See [27].
32
of expansion coefficients. This is in contrast to the L2-norm penalty (Tikhonov) which
forces the coefficients to be rather more similar to each other.
Typically, L1-norm minimization is used for robustness against outliers. In addition to
noise, outliers represent an important source of error. For our problem, outliers are
irrelevant LR images7 reordered as columns in the data matrix. Ideally, the expansion
coefficient corresponding to an outlier LR image should be zero. Fortunately, as our
problem is typically highly overdetermined (3.6), outliers, if present, should not affect the
solution. Now, x being the expansion coefficients in terms of the set of LR images,
adding an L1 penalty nonlinearly denoises the solution, partly by shrinking it and partly
by discarding the least significant components. These small components likely
correspond to noise so discarding them is desirable.
Adding an L1 regularization term to the data fitting term (3.13) we get
2
(3.16) min
Aˆ , x
A b Aˆ Ax
ˆ
F
x 1.
Like (3.13, 3.14), problem (3.16) is non-convex. Unlike (3.13, 3.14), however, problem
(3.16) does not happen to have an analytical solution. Consequently, we replace (3.16)
with a convex surrogate problem. First note that (3.13) is equivalent to
2
ˆ bˆ ,
min Ax
x
where  and b̂ are as defined in (3.10). Now, consider the (convex) cost function
2
(3.17) ˆ bˆ x ,
min Ax
x 1
and note that for 0 , we get the unregularized TLS solution while for 0 , we get
what we refer to as the L1-norm regularized TLS solution. Of course, (3.17) is not
equivalent to (3.16), and we do not know how well it approximates it. Nevertheless,
according to all our simulations, for the same data fitting error, solving (3.17) gives better
denoising performance compared to the TRTLS (3.14).
_____________________________
7
In our case, an outlier image is one that is either too distorted, too noisy or simply does not belong to
the LR basis.
33
Problem (3.17) can be reformulated as
2
(3.18) min x 1 subject to ˆ bˆ
Ax 2,
2
ˆˆ
where 2 Ax ˆ
TRTLS b . This, of course, requires evaluating (3.15) which takes only a
fraction of the time needed to solve (3.18). By solving (3.18) we find the L1-regularized
TLS solution, to within the same error (data misfit) corresponding to the TRTLS solution.
This is the easiest way to highlight the denoising performance of the L1-norm compared
to the linear filtering effect of the L2-norm (Tikhonov) penalty.
2
min 1T t subject to ˆ bˆ
Ax 2, t x t .
t ,x
Trimmed TLS
K
xˆ k2 k K2 1 wkT b vk .
k 1
_____________________________
8
We used the solver SDPT3 [60, 61], along with the interface CVX [58, 59], to obtain an exact solution to
(3.18) reformulated in the SOCP form. Of course, for larger problems, iterative methods become essential.
34
Obviously, the last few components of the solution are responsible for the numerical
instability and noise magnification associated with the TLS solution. It is therefore rather
intuitive to simply discard the highest order components of the solution. This is not to be
confused with truncated TLS (TTLS) where regularization is reached by finding the
optimal linear combination of the last few right singular vectors of the augmented matrix,
A b [24]. This is also different from Tikhonov regularized TLS (TRTLS) in that,
unlike TRTLS, the weights of the lower order components of the solution are not
changed.
To the best of our knowledge, there is no reference in the literature to this type of
regularization of the TLS solution. Also, it appears there is no easy way to assess the
optimality of this method as the cost function it minimizes is unknown. The simulations,
however, point to the superiority of trimmed TLS (better bias-variance tradeoff)
compared to Tikhonov regularized TLS.
Tikhonov regularized TLS solution should be appreciated at least for its simplicity and
providing numerical stability. However, in Bayesian terms, using a minimum energy
penalty entails the assumption that the expansion coefficients we solve for are a zero-
mean, uncorrelated, with the same variance and jointly Gaussian distributed. On the other
hand, using an L1-norm minimization corresponds to the assumption of a Laplacian
distribution. Naturally, since the LR basis is highly correlated, the assumption that the
expansion coefficients are uncorrelated is unrealistic. In addition, the assumption that the
joint distribution of the expansion coefficients is Gaussian (or Laplacian) cannot be
accurate but it is somewhat more acceptable compared to some other methods where a
minimum energy penalty is used to stabilize the solution for the pixels of the HR image
itself [12, 13].
Two popular regularization methods are based on the assumption that natural signals
are smooth. These are the Markov random field (MRF) prior [52] and the total variation
(TV) norm minimization. TV is commonly used as a regularizer for denoising/deblurring
of images [53, 54]. It penalizes the total amount of change in the image as measured by
the L1-norm of the magnitude of the gradient. In our case, however, what we solve for
35
are the expansion coefficients, hence using MRF or TV to regularize the solution is
inappropriate. In addition, even if we reformulate the regularization to be a function of
the PPC, for example,
2
min
Aˆ , x
A b Aˆ ˆ
Ax F
Yx ,
where Yx is the regularization term, and even if we could solve this non-convex
In this section we show that an estimated PPC will always be noisier than the LR
images, even if the estimated expansion coefficients have zero variance.
First, we assume that the data matrix is corrupted with additive noise,
Y Yo ,
where Yo is the noise-free data matrix (the signal component of the data) and is a noise
2
matrix with entries that are uncorrelated, zero-mean and with the same variance v .
Let w and Rw denote the mean and covariance, respectively, of the error, w, in the
36
The corresponding estimated n-th PPC component is thus,
Uˆ n Yxˆ
Yo x Yo w xˆ .
Therefore,
Ε Uˆ n Yo x Yo w
Uˆ n Ε Uˆ n Yo w w xˆ ,
Cov Uˆ n U n Cov Uˆ n
T
Ε Uˆ n Ε Uˆ n Uˆ n Ε Uˆ n
Yo RwYoT Ε xˆ xˆ 2Ε Yo w w xˆ .
T T
Ε xˆ xˆ Ε xˆ T xˆ v2 I d
T
v2 x w
2 2
Tr Rw 2wT x I d ,
and
Ε Yo w w xˆ 0 ,
T
where Tr( . ) denotes the trace of a matrix, Id is the identity matrix of size d d and
Cov Uˆ n Yo RwYoT Ε xˆ T xˆ v2 I d
(3.19)
2
Yo RwYoT v2 x w Tr Rw 2wT x I d ,
2
37
and the mean square error (MSE) of Uˆ n is given by
2
MSE Uˆ n Total variance Uˆ n Bias Uˆ n
(3.20) 2
Tr Yo RwYoT d v2 x w Tr Rw 2wT x Yo w
2 2
Equation (3.19) tells us that even if we knew the error-free expansion coefficients, x,
in terms of the noiseless version of the data matrix, Yo , then a reconstructed PPC will be
(3.21) 2
Cov Uˆ n x v2 I d .
Consequently, it is obvious that even in the absence of error in estimating the expansion
coefficients, pre-denoising of the data matrix (§3.4) or post-denoising of the
reconstructed HR image (§3.6), or both, is a necessity when the noise in the data matrix is
moderately high. Also, equation (3.21) reveals that Uˆ n is inconsistent, regardless of the
estimation of the expansion coefficients, and therefore, given that the expansion
coefficients are known, the only way to benefit from an increased overdeterminedness of
the problem (super-resolving larger LR images) is if the pre-denoiser of the data does
benefit from super-resolving large LR images. As will be explained in the next section,
PCA denoising, which denoises by maximizing the SNR of the low order principal
components and discarding the ones with small SNR, performs better, at least
theoretically, when dealing with larger LR images.
The MSE formula (3.20) contains three error parameters:
2
v which is, as defined previously, the variance of noise in LR images.
on the noise level in the sub data matrix A (i.e. v2 ), the bias caused by
38
regularization (if any), and the bias of the estimated reference PPC. According to
2
experiments, at moderate values of v (e.g. at 30dB SNR), the bias due to noisy A is
normally marginal, even using the LS estimator.
Although, it might not be easily discernible from examining equation (3.20), according to
our experiments, the bias of Uˆ n can overshadow its variance (the reconstructed HR
image appears much less noisy when it is blurred or aliased). As mentioned above, this
can only be partly owing to the bias-variance tradeoff associated with estimating the
expansion coefficients (regularization). In other words, a blurred reference PPC has the
advantage of submerging the noisy appearance of the reconstructed HR image. However,
the best way to control the enhanced noise manifestation (3.21) is to directly control the
2
effect of the parameter v (3.19 - 3.21) by pre-denoising the data matrix. This,
incidentally, also strips the TLS of its advantage of low bias compared to the LS solution,
even at relatively low SNRs.
In light of the last two sections, the goal of pre-denoising the LR images is clear:
reducing the noise enhancement effect associated with multiplying the LR images with
the expansion coefficients and obtaining less biased estimates of the expansion
coefficients.
Using first and second order statistics of a data set, principal component analysis
(PCA) provides an orthonormal optimal basis (in the mean squared error (MSE) sense)
for a reduced representation of the data [32], where the first few principal axes can
capture, on average, a significant portion of a data point’s energy while the last few
principle axes correspond mainly to insignificant features. In other words, it is the
optimal linear minimum MSE (MMSE) compressor of the data, regardless of the
distribution10. This property of the PCA makes it also the optimal linear denoiser when
the data is contaminated with additive zero-mean, same variance, uncorrelated noise.
Specifically, if we assume that the noisy LR images are realizations of a random vector
_____________________________
10
If the mean and covariance matrix are known, the distribution of the data is irrelevant to the
performance of PCA as a linear MMSE compressor.
39
y yo v ,
2
where v is a zero-mean noise vector with covariance matrix, v Id , and is statistically
independent of yo , which is the underlying random vector generating the noiseless part
of the LR images (the signal part) with mean, , and covariance matrix, C, with eigen-
decomposition
C E E T ,
where the columns of E are the orthonormal eigenvectors of C, and the diagonal matrix,
, contains eigenvalues 1 2 d , then the covariance matrix of the random
vector y , is
C y C v2 I d
ET ,
E
where
2I .
v d
The PCA basis vectors (the principal axes) are the columns of E, and the transformation
zk ET yk
where yk is the k-th centered LR image, decorrelates the centered LR image and
maximizes the variance of the lower order principal components (expansion coefficients
in terms of the PCA basis) of the k-th centered LR image:
Ε zk zkT Ε E T yk ykT E
ET C y E
.
Noting that the principal components (PCs), i.e. the elements of the feature vector, zk ,
have variances
(3.22) v2 for 1,..., d ,
40
it becomes evident that the PCA also maximizes the SNR along the low order principal
axes. Consequently, if we replace the q highest order PCs (the last q elements of zk ) with
d
Ε zk zˆk zk zˆk v2 ,
T
d q 1
would correspond mostly to noise. Therefore, we can denoise the LR images by centering
them, PCA transforming them and then discarding the high order PCs, or we could
simply retain only the low order principal axes (corresponding to the largest eigenvalues)
and use them for denoising:
yˆ k Er ErT yk ,
where Er is the reduced PCA basis, and yˆ k is the denoised k-th LR image.
Since we have no knowledge of the true mean and true covariance matrix, we can only
empirically estimate them from the data. The most commonly used estimators are the
sample mean and the sample covariance matrix, which are unbiased under the assumption
that the observations are i.i.d. If the data is also Gaussian distributed, the sample mean
and (a slightly differently scaled) sample covariance matrix are also the ML estimates of
the true mean and the true covariance matrix, respectively. The assumption of
independence of observations is unrealistic. Moreover, the distribution of the data is
hardly Gaussian and thus taking the eigenvectors of the sample covariance matrix as our
PCA basis is not optimal (the empirically derived PCA basis is not the linear MMSE
compressor, and thus it cannot be the optimal linear denoiser). For the scope of this
thesis, however, the sample mean and sample covariance shall suffice.
In our problem, the number of observations (LR images) is far smaller than their
dimensionality11. Under such circumstances, the sample covariance matrix provides a
poor estimate. A better strategy is to denoise sub LR images. This, not only reduces the
_____________________________
11
Typically, the number of LR images is less than 1% of the number of variables (pixels within a LR
image).
41
number of parameters to be estimated (smaller covariance matrix), but it also provides
more samples (observations), allowing for a larger denoising space, where it is possible
to discard a lot more high order PCs12. In particular, we use both the primary and
secondary LR images13 (corresponding to the primary and secondary sensors,
respectively) and downsample them by J J and I I , respectively, obtaining
KJ 2 K S I 2 highly correlated sub LR images of the same size, where K S is the number
of secondary LR images. From these sub LR images we compute the sample mean and
sample covariance, and then PCA denoise them using the eigenvectors of the empirically
estimated covariance matrix. The sample mean of the sub LR images is given by
1 KJ 2 K S I 2
ˆ 2
KJ K S I 2 k 1
y sub
k ,
where yksub is the k-th sub LR image, reordered as a column vector. The sample
covariance is defined as
KJ 2 K S I 2
1 T
Cˆ y yksub ˆ yksub ˆ p p .
KJ 2 K S I 2 1 k 1
Now, let D denote the matrix of the orthonormal eigenvectors of Cˆ y , corresponding to
the largest ro eigenvalues14. D is, therefore, the reduced PCA matrix which we use to
denoise the sub LR images as follows
(3.23)
yˆ ksub DDT yksub ˆ ˆ ,
Now we list the reasons for our choice of the sub LR images to be obtained by
downsampling the primary and secondary LR sets by J J and I I , respectively:
_____________________________
12
The more computationally expensive Kernel PCA (nonlinear PCA), is known in the literature to be a
much more superior denoiser than the empirically derived linear PCA when the number of samples far
exceeds their dimensionality [55]. However, this is not applicable in our case.
13
The primary LR images are normalized to have the same L2-norm, and the secondary LR images are
normalized to have the L2-norm of a primary LR image scaled by I/J. This step is useful to ensure that no
single LR image can dominate the analysis.
14
According to synthetic and real data experiments, at ro = 0.3 p, there is virtually no loss of detail
associated with denoising. In fact, even at ro = 0.1 p there is slightly noticeable loss of detail. The default
value we use for ro is 0.2 p.
42
1. By denoising the sub LR images as described above, we also directly denoise the
sub data matrices used for estimating the expansion coefficients.
2. The reason for choosing sub LR images to be downsampled versions of the LR
images, rather than subregions of them, is that subregions across the LR images are
not as highly correlated and thus more PCs would need to be retained to avoid
significant loss of detail, which translates to less denoising capability.
3. Of course, to lower the computational15 cost of finding the eigen-decomposition (or
SVD) of the sample covariance matrix, we could use even smaller sub LR images
by downsampling further. This also makes the corresponding sample covariance
matrix a better estimate since even more samples will be used to compute it. But on
the other hand, the denoising space will get smaller (a smaller covariance matrix
means a smaller number of eigenvectors, hence fewer can be discarded). Moreover,
this will result in smaller SNR along the lower order axes since, theoretically, the
noise level is constant along all axes (3.22) and of course it does not get lower
when dealing with smaller sub LR images, while the signal’s variance is
maximized along the low order axes and is proportional to its total energy. Hence,
working with smaller sub LR images results in smaller denoising space and lower
SNR in the retained PCs. We digress slightly here to note that PCA denoising, at
least theoretically16, can circumvent the inconsistency of the PPC estimator (3.21),
since working with larger LR images translates to a larger denoising space and
higher SNR in the retained PCs. Practically, however, and given that the number of
LR images is fixed, working with larger LR images means that the sample
covariance matrix estimate of the true covariance matrix of the sub LR images
becomes poorer, not to mention the higher computational cost of finding the eigen-
decomposition of the increased size sample covariance matrix (although when the
number of samples is smaller than their dimensionality, the computational cost is
primarily determined by the number of samples where the reduced SVD of the
matrix of samples, rather than the covariance matrix itself, is computed [32], which
_____________________________
15
For faster computation of the first ro singular vectors of the covariance matrix, we use the Matlab
code prepared by Mark Tygert, which is an implementation of the algorithm described in [62].
16
Assuming the true covariance matrix is known.
43
is expected to be the case if the problem involves super-resolving larger LR
images, given that the number of LR images is fixed).
4. Finally, as will be explained in chapter IV, the same reduced PCA matrix D (3.23)
will be also used in estimating the reference PPC, saving us the trouble of
calculating the eigen-decomposition of another covariance matrix.
Outlier LR images are those images irrelevant to the reconstruction of the PPCs. Since
we use the LR images as basis signals, given that the estimated reference PPC does not
have any components corresponding to outliers, the expansion coefficients in terms of the
outlier images should be exactly zero, and thus outliers should be of no concern to us.
However, since we pre-denoise the LR images using PCA, which is dependent on the
sample covariance matrix, the presence of outliers in the samples will make high order
PCs more representative of the signal’s energy [32] and thus we will have to retain more
PCs or risk significant loss of detail. Of course, more PCs to be retained means more
noise too and therefore getting rid of outlier LR images becomes essential for better
denoising.
Depending on the application, there is more than one suitable method for detection
and removal of outliers in the data. For example, trimming the data involves finding the
Mahalanobis distance of each LR image from the mean, and iteratively calculating a new
covariance matrix (and mean) [56]. Of course, the Mahalanobis distance involves finding
the inverse of the sample covariance matrix of the LR images, which is decidedly
singular since number of LR images is far much lower than their dimensionality.
Alternatively, and since our goal is to find a robust estimation of the covariance matrix of
the sub LR images, we could implement the minimum covariance determinant (MCD)
method. It works by finding the subset of samples whose covariance matrix has the
lowest determinant [57]. However, and regardless of the computational cost, this method
requires that the number of samples be much higher than their dimensionality which is
hardly the case in our problem, even when the samples are sub LR images.
Fortunately, while our problem is short on samples (relative to their dimensionality), it
is advantaged by the fact that the LR images are highly correlated. Therefore, outliers can
44
be defined as those images that are farthest from the mean. In order to identify outliers in
the secondary LR set, the mean of the primary LR set is lowered in size (via nearest
neighbor interpolation) to the same size of a secondary LR image, and outlier secondary
LR images are thus those that are farthest from the resized mean. There are two reasons
we did not use the mean of the secondary LR set to identify outliers within this set:
The (same size) sub LR images, from both sets, are assumed to have the same mean
and same covariance matrix and therefore, using two means to identify the outliers
to the computation of the sample mean and sample covariance is meaningless.
Ultimately, the secondary set of LR images is there only so we can estimate the
reference PPC in order to compute the expansion coefficients of the primary PPCs
in terms of the primary LR images. As a result, the relevance of an estimated
reference PPC, and by extension the secondary LR images used to construct it, is
determined by the available primary LR set. Namely, the ‘outlyingness’ of a
secondary LR image can only be measured in terms of the ensemble of the primary
LR set.
Clearly, this simple method of rejecting outlier images assumes that the number of
outliers in the primary and secondary sets of LR images is already known. In chapter IV,
where the estimation of a reference PPC is highly affected by the presence of outliers, we
describe a simple intuitive way to obtain an approximate estimate of the number of
outliers.
The typical approach to processing color images is to simply super-resolve each of the
three color-band images separately (thus tripling the computational cost) while ignoring
the color artifacts present in the demosaiced17 LR images [46, 47]. Although none of the
authors of [35-37], who addressed the problem of single-frame super-resolution using
subspace learning methods, explained how they dealt with the case of color, we believe
they too ignored the color artifacts and assumed that the LR images are captured by a 3-
_____________________________
17
Single CCD color cameras use the Bayer (color) filter to obtain all 3 color band images using one
CCD sensor, where each pixel senses only one of the 3 colors, according to the Bayer pattern, and then the
three raw color band images are demosaiced to interpolate the missing pixels. This results in color artifacts
that are normally negligible at high resolutions but easily noticeable in LR images.
45
CCD18 camera (one sensor per color-band), where there would be no color artifacts at
all. On the other hand, Farsiu et al. [48] considered joint demosaicing and super-
resolution of color images to reduce the color artifacts associated with single CCD color
cameras.
In our case, we also assume that the primary set of LR images is obtained by 3
primary CCD sensors. For the secondary set of lower resolution LR images, only one
sensor for the green (luminance) band19 is required since we need to estimate the set of
expansion coefficients only once. Recall that a LR image is assumed to be a linear mixing
of the PPCs, and since each one of the three HR color-band images, undergoes the same
transform resulting in the corresponding LR color-band image within the same LR frame,
the same set of expansion coefficients can be used to un-mix the PPCs of each HR color-
band image. In other words, if we let X denote the matrix containing all the expansion
coefficients computed using only the green primary and green secondary LR images, then
UR YRX
UG YGX
UB YBX,
where Y R , Y G and Y B are the red, green and blue data matrices, containing the
R G
unwrapped by column K red, K green and K blue LR images, respectively, and U , U
B
and U are the red, green and blue image matrices containing the I 2 red, I 2 green and I 2
blue PPCs, respectively.
Although we are using only the green primary and secondary LR images to estimate
the expansion coefficients, we might still want to pre-denoise the primary red and blue
LR images since multiplying noisy LR images with the expansion coefficients enhances
the noise (3.21) as we explained in §3.3. Of course, in this case, the sample covariance
matrix will be derived from the primary red and blue LR sets only, as there are no
secondary sets of red and blue LR images (we require only one lower resolution green
sensor for the secondary set of LR images).
_____________________________
18
A beam splitter is used to split the image into its red, green and blue components to be separately
detected on 3 CCD sensors.
19
The green (luminance) band of a color image is approximately equivalent to its grayscale version.
46
3.6 Post-Processing the SR Image
TV Denoising
Post-denoising the super-resolved image is an option to reduce the noise further when
the PCA pre-denoising, on its own, is not sufficient.
Total variation (TV) is a well-known edge-preserving denoising method. The denoiser
solves the minimization [53]
2
(3.24) min ud 1 ud u
ud 2
where ud is the denoised version of the original image, u, and is the parameter that
controls the fidelity to data (the original noisy image). We use the code written by Pascal
Getreuer which is an implementation of the algorithm described in [63] for iteratively
solving the minimization problem (3.24). The code also handles color images by jointly
denoising using the vectorial generalization of the TV, implementing the algorithm in
[64] which is a generalization of the algorithm in [63].
Unsharp Masking
The super-resolved image can be blurred, mainly because the estimation of the
reference PPC is biased to some degree (chapter IV). Also, as we shall explain in chapter
V, the CCD sensor causes additional blur as well. Unsharp masking (UM) is a generic
and a very simple sharpening technique [72]. In UM, a blurred version of the original
image is subtracted from it and the result is scaled and then added to the original image.
We use MATLAB’s unsharp masking with default settings.
After deblurring using the unsharp masking, the processed image usually contains
what looks like impulsive noise around the edges. This could probably be due to the fact
that we estimate the HR image by estimating its PPCs separately and then interlacing,
which might cause some subtle irregularities in pixel intensity levels, especially around
47
the edges, that become more pronounced after sharpening. This problem is easily dealt
with by using a simple 2x2 median filter.
3.7 Summary
48
In practice, the estimated reference PPC is usually blurred, and it is therefore the main
source of bias in the super-resolved image. Also, some additional edge-preserving
denoising might be desired. For these reasons, we use TV denoising, followed by unsharp
masking and median filtering.
The following list shows some of the errors, images captured by digital cameras are
usually corrupted with.
Camera sensor readout noise (zero-mean, white Gaussian, independent of signal).
Cause: electronics.
Shot noise (Poisson distribution, signal dependent). Cause: fluctuation of photon
counts. It becomes negligible and more Gaussian-like distributed with more photons
(good light conditions, and larger pixels).
Impulsive noise (Laplacian or heavy-tailed distribution). Cause: long exposure time,
A/D errors, and transmission errors (rare).
Compression artifacts. This depends on the user defined compression level.
Throughout this chapter, we assumed the errors are uncorrelated and Gaussian
distributed, which is generally a reasonable assumption. Depending on the application,
however, other types of noise might dominate and must be addressed accordingly. In
particular, since we use PCA as a pre-denoiser of the LR images, it is essential for us to
consider other forms of PCA in accordance with the application at hand. In addition, we
might need to consider data-fitting terms other than the L2-norm (LS solution). For
example, we could use weighted LS if the reference PPC contains colored noise or an L1-
norm data-fitting term for impulsive noise.
Variants of PCA
Assuming that the error and the signal parts of the data are independent, if the
covariance matrix of error, Rv , is known and the data’s covariance matrix, C y , is known
49
as well, then the PCA basis, e d1 , that maximizes the SNR of the PCs, subject to their
being uncorrelated20 with respect to the error’s covariance matrix, is given by [32].
eT C y e
(3.25) max subject to eqT Rv e 0 for 1 q , 1,
e eT Rv e
which is equivalent to
eT Rv e 1
eqT Rv e 0 for 1 q , 1.
2
Clearly, if Rv v Id , then e d1 are the eigenvectors of C y , which is the
conventional PCA basis. Otherwise, the non-convex problem (3.25) can be solved by
solving the eigen problem
C y e Rv e
50
Other Post-Processing Options
The post processing techniques mentioned in §3.6 are admittedly very generic and
therefore we might want to consider more sophisticated options. For example, if the
leftover noise is a bit significant, using TV with a low enough data fidelity parameter
would smooth out textured areas of the image and hence using an adaptive TV method
[65] would be a better option. Also, we might get better results by jointly deblurring and
denoising [66]. In addition, there are other alternatives for the data fitting term in the
minimization problem (3.24), for handling non-Gaussian error, like the impulsive noise
[67] or poisson noise [68]. Of course, the literature on denoising and deblurring is huge,
but these examples are particularly attractive since they involve edge-preserving
processing.
51
CHAPTER IV
4.1 Introduction
At the end of chapter II, we mentioned that in order to be able to estimate the
reference PPC, two sets of LR images must be obtained from two image sensors with
different sensor densities. We refer to the set of the LR images acquired by the primary
sensor (Figure 2.4) as the primary LR set (corresponding to the primary downsampling
factor, I). The LR images acquired by the secondary sensor are referred to as the
secondary LR set (corresponding to the secondary downsampling factor, J). The I I
I2 J2
PPCs, U n n 1 , and the J J PPCs, U m m1 are referred to as the primary and
between 1 and J 2 .
As explained in chapter II, under the assumption of linearity, a set of LR images can
span PPCs of the same resolution level (corresponding to the same downsampling factor).
Therefore we assume that
2
U n nI 1 R Y
U m mJ 1 R Y S ,
2
(4.1)
where Y and Y S contain the primary and secondary LR images, respectively. According
to the property of sampling diversity we have
52
j Tn m
i Tm n
(4.2) U n , j U m ,i
for any n and m. Recall that Un, j and Um,i are the j-th and i-th sub PPCs of Un and U m ,
respectively, and they are equal for j Tn m and i Tm n . Refer to §2.3 for details.
downsampling matrices corresponding to the j-th and i-th sub PPCs, respectively, and in
light of (4.1), equation (4.2) can be re-written as
(4.6) Ax 0,
where
A A1 A2
T
x x1T , x2T .
_____________________________
1
In §4.4 we explain that equation (4.3) is not unique for any arbitrary choice of m and n and we
describe how this fact should be dealt with.
53
An obvious approach to solving equation (4.6) is to minimize the L2-norm of Ax, subject
to avoiding the trivial zero solution,
2 2
(4.7) min Ax =xT AT Ax subject to x 1.
x
Problem (4.7) is non-convex (because of the quadratic equality constraint) but it has a
well-known analytical solution. First, let
N
(4.8) A W V T k wk vkT
k 1
(4.9) xˆ vN ,
2 2 2
(4.10) min A1 x1 A2 x2 subject to x1 x2 1,
x1, x2
which simply finds the two vectors in R A1 and R A2 , with the minimum Euclidean
Solving (4.5) by solving (4.10) is based on the assumption that the two vectors in
R A1 and R A2 , that best approximate Un, j and Um,i , respectively, have the minimum
Euclidean distance between them. But how accurate is this assumption? Note that (4.3),
and thus (4.5), implicitly assume noise-free and complete primary and secondary LR
basis, in which case the minimum Euclidean distance (4.10) is equal to zero and thus
solving (4.10) solves (4.5) exactly. Of course, the LR images are always noisy and they
do not exactly fully represent the PPCs, and therefore solving (4.10) is not necessarily the
_____________________________
2
This result can be easily derived using Lagrange multipliers [22]. Note that p M 1 M 2 I 2 J 2 N 1
is a necessary condition for a unique solution. This supersedes (3.6).
54
best option. In fact, the Euclidean distance, as a dissimilarity measure, is known to be
sensitive to errors (noise, outlier LR images and the incompleteness of the LR basis, in
our case). The (squared) Euclidean distance is simply the sum of the square of differences
between pixels, and since the pixels are highly correlated, errors will greatly bias the
decision as to which two vectors in R A1 and R A2 are closest to each other. In §4.3,
we suggest a better alternative to solving (4.5). Moreover, besides bias, the problem setup
of (4.10) can be numerically unstable as we shall see next.
Small gaps between the last few of the N singular values of matrix A in problem (4.7),
which is exactly equivalent to (4.10), result in a similar numerical instability as that of the
TLS solution we discussed in chapter III. Since the columns of A are sub LR images
(unwrapped by column) obtained from the primary and secondary LR sets, these columns
can be highly correlated causing the gaps between the last few of the N singular values to
be small3. Specifically, if we partition the matrix A as follows
A Z zN ,
where Z is a submatrix of A containing all the columns of A except the last column which
we denote zN , then in light of §3.2, the solution (4.9) can be rewritten as
xˆ vN
T
T
1
c Z T Z N2 I Z z N
T
c ,
(4.11)
where c is the last element in vN , and N is the smallest singular value of A (4.8).
Therefore, if the last singular values of A are close to each other, then, by the interlacing
theorem, its submatrix Z will have its last singular values close to each other and to N
as well.
_____________________________
3
Recall that the last few of the N singular values cannot be zero due to presence of (white) noise.
55
Denoising
Equation (4.11) reveals that the components of the solution x̂ can be large. In order to
regularize, one might consider adding a regularization constraint to the non-convex
problem (4.7). For example, we could limit the L1-norm of the solution to a certain
threshold, but in this case, an analytical solution to the new non-convex problem does not
exist and we would have to solve it approximately (using the convex-concave procedure,
for example).
A simple and effective method to denoise x̂2 , which contains the last K S elements of
x̂ , is inspired by the TSVD discussed in §3.1. First, let B2 denote the matrix containing
the left singular vectors of A2 corresponding to the K S (non-zero) singular values, then
A2 xˆ2 R B2 .
Equation (4.11) suggests that the highest order components of Axˆ and thus, A2 xˆ2 , could
be very noisy. Therefore, we could represent A2 xˆ2 in terms of a reduced basis matrix, B2 ,
which excludes the left singular vectors corresponding to the smallest q singular values.
This is equivalent to removing the highest order components of A2 xˆ2 . We then perform a
change of coordinates to get back a denoised version of x̂2 , which we denote xˆ2d
1
(4.12) xˆ2d A2T A2 A2T B2 B2T A2 xˆ2 .
(4.13) Uˆ m Y S xˆ2d .
56
decision bias. Nevertheless, we still get better results by solving the problem as described
next.
f and g are with minimal dissimilarity, gives a less biased decision. Therefore, we could
having the highest variances, which gives them the greatest weight in the choice of the
pair f and g with minimal dissimilarity. The underlying assumption here is that the PCs
with high variance represent significant features. Moreover, the fact that the SNR of the
low order PCs is maximized means smaller decision bias (although maximizing the SNR
does not address error due to incompleteness of the basis). Hence, (4.10) is replaced with
2
min DT A1 x1 A2 x2
2 2
subject to x1 x2 1,
x1, x2
which is equivalent to
2 2
(4.14) min DT Ax = xT AT DDT Ax subject to x 1,
x
where
______________________________________________________________________
5
Smaller amount of (white) noise means larger gaps between the last singular values and less noise to
be magnified.
57
A A1 A2
T
x x1T , x2T ,
A1 and A2 are obtained from the PCA pre-denoised data, and D is the reduced PCA
matrix used to denoise the data as described in §3.4.1. Hence, the same matrix D used to
denoise the LR images (by denoising sub LR images) is also used to decorrelate
f R A1 and g R A2 . The solution of problem (4.14) is the last right singular
vector of DT A .
In chapter II, and at the beginning of this chapter, we explained that any secondary
PPC shares a sub PPC with any primary PPC (4.2). In chapter III, we have seen how this
fact is used to estimate the expansion coefficients of the primary PPCs given their sub
PPCs which are derived from a reference (secondary) PPC. In this chapter, we also use
the property of sampling diversity to estimate the expansion coefficients of the reference
PPC as demonstrated by (4.3). However, equation (4.3) is not unique for any arbitrary
choice of m and n. For example, suppose the primary downsampling factor, I = 4, and the
secondary downsampling factor, J = 5, and we want to estimate the 13-th (out of 25)
secondary PPC, U m13 , as our reference PPC, using its sub PPC shared with the first (out
of 16) primary PPC, U n1 . According to the sampling diversity property (see §2.3.2)
j Tn 1 13 19
i Tm 13 1 11
U n 1, j 19 U m 13,i 11.
In other words, the 19-th (out of 25) sub PPC of the first primary PPC is equal to the 11-
th (out of 16) sub PPC of the 13-th secondary PPC. However, the 19-th sub PPC of the
second primary PPC is equal to the 11-th sub PPC of the 14-th secondary PPC. Also, the
19-th sub PPC of the 11-th primary PPC is equal to the 11-th sub PPC of the 25-th
secondary PPC. In fact, we have
58
U n 1, j 19 U m 13,i 11
U n 2, j 19 U m 14,i 11
U n 3, j 19 U m 15,i 11
U n 5, j 19 U m 18,i 11
U n 6, j 19 U m 19,i 11
U n 7, j 19 U m 20,i 11
U n 9, j 19 U m 23,i 11
U n 10, j 19 U m 24,i 11
U n 11, j 19 U m 25,i 11.
This has one consequence: equation (4.3) is not unique. Because, for example, while the
11-th sub PPC component of the 13-th secondary PPC is not equal to the 11-th sub PPC
of the 20-th secondary PPC, any 11-th sub PPC is spanned by the same set of sub LR
images (of the secondary set). Similarly, while the 19-th sub PPC component of the first
primary PPC is not equal to the 19-th sub PPC of the 7-th primary PPC, any 19-th sub
PPC is spanned by the same set of sub LR images (of the primary set). Namely, we have
to solve the same equation
regardless of whether our goal is to estimate the 13-th, 14-th, 15-th, 18-th, 19-th, 20-
th, 23-th, 24-th, or the 25-th secondary PPC, as our reference PPC. In other words,
solving (4.15) will give us the expansion coefficients, xm , of a reference PPC, without
knowing which one (which m) it is from among the list above. In fact, (4.3) is unique
only for the following choices of n and m
For example, for I = 4, and J = 5, only the 7-th sub PPC of the first (n = 1) primary PPC
is equal to the first sub PPC of the 25-th (m = 25) secondary PPC.
2
So which secondary PPC component (out of J ) should we estimate as our reference
PPC? And using which sub PPC (out of I 2 )? Should we only limit ourselves to the four
possible choices (4.16) for which equation (4.3) is unique? The fact is these four choices
59
do not necessarily give the best estimation of a reference PPC. The procedure for
estimating the reference PPC, and determining which secondary PPC it is, is as follows.
Pick m = m*, which is the middle value between 1 and J 2 . For example, initially
assume that we are estimating the 13-th (m* =13) secondary PPC out of 25 (J = 5).
Find I 2 different estimates of the reference PPC (the m*-th secondary PPC) based
on all I 2 possible sub PPCs ( n 1,..., I 2 ). In other words, solve (4.3) I 2 times for a
fixed m*.
Since a PPC is expected to have large high frequency components, due to aliasing,
we pick n = n*, for which the estimated reference PPC has significant energy
content in the high frequency band, relative to its total energy, thus discarding
smooth estimates of the reference PPC. Namely,
2 2
Uˆ m Uˆ m
n n
n* max 2
F
subject to 2
F
ub,
n
Uˆ m Uˆ m
n F n F
where ub is the upper bound6 (~1%) on the energy of the high frequency
components of the reference PPC, relative to its the total energy, ** denotes the 2-D
convolution, and
1 2 1
1
(4.17) 2 4 2
16
1 2 1
(4.18)
n, m q, 1,..., I 2 1,..., J 2 : Tq Tn* m
and T q Tm* n .
This gives the set of candidate values of m, from which we find the most suitable
one to be assigned to the estimated reference PPC as described below.
_____________________________
6
Recall that we solve (4.3) by solving (4.14) where the decision, as to which pair of vectors in the
feature subspace are closest to each other, is mainly determined by the low order PCs, and that the decision
is independent of the mean. Therefore, depending on the sub data matrices, these low order PCs can bias
the decision towards a solution with greatly emphasized high frequency contents.
60
Using the estimated reference PPC, we estimate the HR image (by estimating its I 2
primary PPCs) for each value of m from the set defined in (4.18). Since
misassignment of the estimated reference PPC (assigning the wrong value of m to
the estimated reference PPC) results in a rough HR image, we pick the value of m
from the set (4.18) for which the reconstructed HR image has the smallest high
frequency components
2
min uˆ m ,
m F
where uˆ m is the estimated HR image using the estimated reference PPC as being
1
2
3
4
5
6
7
8
n
9
10
11
12
13
14
15
16
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
m
Figure 4.1: For I = 4, J = 5, m* = 13 and n = 1,...,16, the highlighted (white) blocks represent all the pairs
(n, m) that give the same pairs of sub data matrices given by (n, m*). For example, the green dotted blocks
represent the pairs (n, m) that share the same equation (4.3) corresponding to (n = 1, m* = 13).
61
4.5 An Intuitive Alternative to Estimating the Reference PPC
are farthest from the (downsized) mean of the primary LR set, dP , are the least
relevant to the reconstruction of the primary PPCs of the HR image (refer to §3.4.2),
we make sure that we do not pick an ‘outlier’ secondary LR image. For choosing the
‘best’ secondary LR image as our reference PPC, we use
2
ykS
(4.19) F
max ,
k ykS dP
S
where is the differentiator defined in (4.17), and yk is the k-th secondary LR
image.
Using the chosen secondary LR image (4.19), we determine which of the secondary
PPCs it best represents (determine the most suitable m) by estimating the HR image
for m 1,..., J 2 . Then we assign, to the chosen LR image, the value of m, for which
the reconstructed HR image is the smoothest, i.e.
_____________________________
7
In fact, even for the case of pure translational motions, a LR image is blurred because of the CCD
averaging effect as shall be explained in chapter V.
62
2
min uˆ m ,
m F
When there are outlier images, estimating the reference PPC is greatly affected as it
involves solving an equation of the form, A1 x1 A2 x2 , and thus outlier elements can be
on both sides of the equation. In §3.4.2 we described a simple method to get rid of
outliers from both the secondary and the primary sets of LR images, for better PCA pre-
denoising. The same method can be used for a better estimation of the reference PPC in
presence of outliers. Of course, if we choose a secondary LR image as our reference PPC,
as described above, outliers will have no effect as their corresponding expansion
coefficients should be zero, since the chosen secondary LR cannot be an outlier. This fact
can be used to estimate the number of outlier primary LR images. Specifically, using a
secondary LR image as a reference PPC, if we average the squared expansion coefficients
of all primary PPCs in terms of the primary LR set, we can estimate the number of
irrelevant (outlier) primary LR images by counting the number of averaged squared
coefficients that are close to zero. The number of outlier secondary LR images will be
also the same if the primary and secondary sensors see the same scene at the same time,
by using a beam splitter. Otherwise, the number of outlier secondary LR images has to be
guessed.
Finally, we would like to reiterate that removing outlier LR images should be
considered only for better PCA pre-denoising performance and if we are going to
estimate the reference PPC (rather than simply choosing the best secondary LR image as
our reference PPC).
63
CHAPTER V
5.1 Applications
5.1.1 Introduction
f
1.22 ,
a
where is the wavelength of light, f is the focal length, and a is the diameter of the
aperture. This means that any imaging system can benefit from the resolution
enhancement2 via (signal processing) SR methods, at least when imaging wide areas.
In the following sections, we discuss some of the applications where our proposed SR
method can be implemented.
_____________________________
1
The Airy radius is the smallest resolvable distance between two point objects. The larger the
diffraction of light, the larger the radius.
2
When the sensor has a pixel density of 2 pixels per Airy radius, the sensor is said to be diffraction-
limited which means that higher pixel densities cannot enhance the resolution.
64
5.1.2 The Case of Approximately Pure Translations
In some applications, the relative scene motion can be modeled as pure translations.
For example, a video camera recording a video sequence of a static scene while moving
with slight translations, or a scanner scanning the same document several times with
slightly different initial points [7]. Several papers were completely devoted to treat this
classical SR problem, for example [3-7]. Unlike previous work, our fast blind
reconstruction method does not require registration.
Ground-based astronomical imaging and satellite imaging of the Earth are two
applications that require imaging through the atmosphere. Unfortunately, the turbulent
nature of the imaging medium (the atmosphere), distorts the images. The distortion can
65
be modeled as convolving the image with a speckle3 PSF. The size, shape and location of
the PSF are time-variant (different from frame to frame). In addition, in the case of wide-
area-imaging, the distortion is space-variant as well, which means that different regions,
within the same frame, are distorted differently. This is known as anisoplanatic distortion,
as opposed to isoplanatic distortion which is associated with a space-invariant PSF.
Typically, all imaging through the atmosphere is subject to the anisoplanatic type of
distortion unless the FOV is very narrow [50].
In short, imaging through the atmosphere can be modeled as a linear shift-variant
(LSV) transform that is different from frame to frame. This means that our method can
benefit from these randomly transformed frames to achieve super-resolution. However, it
is well known that atmospheric distortion can be severe for long-exposure imaging (few
frames per second). In addition, far-field imaging increases the severity of distortions. In
our case, a certain amount of (time-variant) distortions is useful or in fact, necessary to
achieve SR but according to the discussion in §2.2, large size PSFs (corresponding to
severe blurring) require too many LR frames, and we cannot use too many LR images,
even if we had a lot of them, since we need to keep our systems of equations
overdetermined. Namely, only a moderate amount of atmospheric distortion can be useful
for our method to give reasonable results. This means that the method is best suited for
near-field, short-exposure imaging under reasonable atmospheric conditions. There are
two applications that fit these requirements:
- Lunar imaging4.
- Satellite imaging of the earth.
In the case of lunar (and planetary) imaging at high rates of frames per second, while
it reduces the severity of the distortions, it also lowers the SNR which makes it difficult
to deblur these images as deblurring magnifies the noise. Stacking is a method aimed at
preparing the images in such a way that they can be added together without increasing the
blur while enhancing the SNR. The stacked image is then deblurred using one of the
sharpening tools. Typically, hundreds of frames are used for stacking and the process is a
lengthy one. While the purpose of stacking is deblurring, our goal is primarily removing
_____________________________
3
Speckle PSFs have very irregular shapes.
4
Obviously, the moon is a lot closer to Earth than any planet or star (near-field imaging) and it is a lot
brighter which allows for much shorter exposure without the images getting too dim.
66
aliasing by increasing the pixel density. It is rather interesting to note that in the absence
of atmospheric distortions, stacking is needless while in our case, SR is impossible.
When it comes to satellite imaging of objects on Earth, the distortions due to
atmospheric turbulence are much smaller because the Earth’s surface is in contact with
the turbulent imaging medium (the atmosphere). This is similar to when an object behind
a diffuse glass is observed. When the object is very close to the diffuse glass it appears
much clearer than when it is far from it. Therefore, even when the conditions of the
atmosphere are somewhat bad, satellite imaging of objects on Earth is still expected to be
reasonably distorted, which makes our SR method particularly well-suited and potentially
useful for this type of application5.
To the best of our knowledge, no one tried to super-resolve images distorted by the
atmosphere6. This could probably be due to the fact that the atmospheric distortion
contains both warps and blurring elements. Blur-based methods7 are not designed to work
with warps and motion-based techniques might fail due to the fact that the prior step of
image registration is sensitive to the randomness of the blur from frame to frame. And
while there are attempts to handle the case of random motion blur [38-39, 45], the case of
super-resolution of atmospherically distorted images is not addressed before.
In this section we present the results we obtained from working with both synthetic
and real data. Before we proceed, we would like to discuss the integrating effect of the
CCD sensor. In particular, the LR images are related to the transformed
(warped/distorted) HR images via downsampling by integration of pixels of the HR
image. This can be modeled as an averaging PSF convolved with the transformed HR
images followed by decimation.
Except for two of our experiments, we used primary LR images corresponding to
↓4x4 and secondary LR images corresponding to ↓5x5. For ↓4x4, the CCD PSF was
_____________________________
5
Although satellite surveillance usually uses high resolution imaging systems, for this type of
application, being able to zoom out to cover larger areas, without aliasing, is an extremely useful feature
that can be delivered using super-resolution.
6
By super-resolve, we primarily mean removal of aliasing.
7
Blur-based SR is very sensitive to model errors (for example, due to inaccurate estimates of the PSFs,
when not known).
67
assumed to be a 4x4 Gaussian with variance equal to one [7, 12-14] and we used a 5x5
Gaussian PSF with the same variance for ↓5x5. This is reasonable since only a portion of
the LR CCD pixel is active which means that the HR pixels (within a LR pixel) should
not have the same integration weights. See Figure 5.1 and 5.2 for an illustration of the
integration effect of the LR CCD arrays for ↓4x4 and ↓5x5, respectively.
For the remaining two experiments (Experiment 5 and 7), to obtain easily appreciable
aliasing effect, the primary and secondary LR images correspond to downsampling by
↓8x8 and ↓10x10, respectively. For ↓8x8 and ↓10x10, downsampling, the CCD PSFs we
used were (scaled) and resized versions of the 4x4 and 5x5 Gaussian PSFs mentioned
above, respectively.
Note that the CCD PSF introduces the same additional distortion to all the frames, and
thus its effect cannot be alleviated with more LR images. Specifically, if the HR image is
distorted by different PSFs and then by the same averaging blur, the overall effect is that
what we solve for is a blurred version of the HR image. This is another reason why post-
processing (via unsharp masking, for example) is necessary since our method is non-
parametric and the solution cannot account for the common CCD averaging effect. In
short, the CCD PSF is an additional source of bias, over which we have no control and
cannot address except via post-processing.
In chapter III, we discussed the bias of the super-resolved image under the assumption
that (the noiseless version of) the LR images form a complete basis. However, the
incompleteness of the (noiseless version) of the LR basis adds more bias to the solution.
According to our experiments, this additional bias takes the form of both aliasing and
blur when
(5.1) K rI 2 ,
where r is the number of LSI kernels that approximate the LSV transform, undergone by
the HR. (r = 1 in the LSI case. Refer to §2.2.2). However, if
(5.2) rI 2 K rL1L2 ,
68
where L1 L2 is the size of an LSI kernel and L1 I and L2 I (§2.2.1), then the bias due
to the incompleteness of (the noiseless version of) the LR images takes the form of blur
only.
In short, if the (noiseless version of) the LR set is incomplete only with respect to the
extent of the distortions, then this will add bias in the form of blur only, which is far more
tolerable than aliasing. The same can be said regarding estimating the reference PPC,
which is more sensitive to the incompleteness of the basis (and errors, in general) since it
involves solving an equation of the form, A1x1 A2 x2.
Figure 5.1: An illustration of the integration effect of the primary LR CCD array corresponding to 4 4 .
The gray shaded areas represent the active portions of the LR pixels. The small blue squares represent the
active portions of the pixels of the HR CCD array.
69
Figure 5.2: An illustration of the integration effect of the secondary LR CCD array corresponding to
5 5 . The gray shaded areas represent the active portions of the LR pixels. The small blue squares
represent the active portions of the pixels of the HR CCD array.
Miscellaneous
In all (but one of) these experiments, we PCA pre-denoised the data matrices, using a
PCA matrix containing 10-30% of the eigenvectors8 of the sample covariance matrix of
the sub LR images (§3.4). As mentioned previously, our method involves the solution of
a few systems of linear equations where the number of unknowns is equal to the number
of LR images. However, the PCA pre-denoising step considerably slows down9 the
overall solution as it involves finding the eigenvectors of the sample covariance matrix.
_____________________________
8
Larger number of eigenvectors must be retained when using a lot of LR images, since the number of
retained eigenvectors must exceed the total number of LR images or else (4.14) will not have a unique
solution.
9
All computations were performed using MATLAB running on a 1.5 GHz Intel Core Duo CPU with
2GB RAM.
70
We compare some of our results to those obtained by the “iterative L1” solution,
which is an implementation of equation (22) in [14], using the software in [49]. To be
more specific, the authors in [14] propose solving equation (2.3) with an L1-norm data-
fitting term and bilateral total variation for regularization. The two main advantages of
their method are robustness to error (e.g. registration errors) and relative speed.
In the following experiments, our method proved to be at most ~10 times slower than
bicubic interpolation and at least ~20 times faster than the iterative L1 algorithm.
Moreover, our method works without motion/blur/distortion estimation and therefore it
has an advantage over any model-based solution.
71
The overall computation time (including pre-denoising the red and blue LR images)
was 14.5 seconds (of which 4.66 seconds was for post-processing). Bicubic interpolation
took 2.89 seconds.
_____________________________
11
Again, since we downsample by averaging according to the CCD PSF, the super-resolved image will
always be blurred and post-processing is needed at least to address the CCD blurring effect.
72
(a) Bicubic interpolation. Comp. time = 2.89 sec.
73
(a) Bicubic interpolation. Comp. time = 2.83 sec.
74
5.2.2 Real Data Experiments
Since we do not have cameras with two different12 density sensors, we used real-world
distorted HR image sequences and then downsampled them (by integrating the HR
pixels) to get the two sets of primary and secondary LR images. In other words, in these
experiments, the only simulated part of the degradation process is the downsampling.
For Experiments 3, 4, and 6, all the images were captured using the same camera,
SONY Cyber-shot DSC-L1. For Experiment 5, Canon EOS DIGITAL REBEL XT was
used.
The HR test sequence of images used for this experiment was obtained using a hand-
held camera taking multiple monochromatic shots, of size 480 640, of the same scene13,
“Outdoors”. However, the camera moved slightly every time a picture was taken, thus
approximating the pure translations case. A total of 108 shots were taken. The first half of
these images were downsampled by↓5x5 and the other half was downsampled by↓4x4,
producing the secondary and primary sets of LR images, respectively. This simulates the
case where the two sensors are either placed in two different cameras or in the same
camera, using a fully reflective mirror positioned in the optical path for half of the
imaging time (refer to the discussion in §2.3.3).
We used only 35 primary LR images that are closest to the mean. Similarly, only 35
secondary LR images that are closest to the (resized) mean of the primary set were kept
(§3.4.2). Then we pre-denoised these images using PCA. The HR image was
reconstructed using the 35 primary LR set as a basis for its primary PPCs, and for a
reference PPC, we used a single secondary LR image, chosen according to the procedure
described in §4.5.
As noted in chapter IV, choosing a single secondary LR image for our reference PPC
is expected to give better results, in the case of approximately pure translations, than
estimating the reference PPC. This is because the translational motion does not cause any
_____________________________
12
The different densities should correspond to downsampling factors that are relatively prime (or more
usefully, consecutive integers.)
13
This was a page from the AAA Living magazine, May/June 2005 issue.
75
blur. We used UM for post-processing mainly to reduce the blur due to CCD averaging
effect.
Figure 5.5 (a) shows the main portion of the first primary LR image, resized (↑4x4)
using bicubic interpolation. Figure 5.5 (b) shows the main portion of the super-resolved
image after post-processing (UM+MD). It took 1.03 seconds to perform the bicubic
interpolation while the super-resolved image was computed in only 10.88 seconds14.
Figure 5.6 (a), (c) and Figure 5.7 (a), (c) show two different detail areas of the images
shown in Figure 5.5.
Finally, for comparison, we reconstructed the HR image using the iterative L1 method
[14, 49]. This took about 4 minutes (using 40 iterations, 0.001 regularization factor, and
the shift & add image for the initial guess). The same two detail areas (of the dog’s face
and text) are shown in Figure 5.6 (b) and Figure 5.7 (b), respectively. Comparing Figure
5.6 (b) to Figure 5.6 (c), we notice that our method outperforms the iterative L1 method.
However, by examining Figure 5.7 (b) and Figure 5.7 (c), we observe that the iterative
L1’s result is better. In other words, the two methods have an overall comparable
performance when it comes to this experiment (although the blind SR method is much
faster).
_____________________________
14
We note here that there was virtually no need for pre-denoising, but we pre-denoised to learn how
much time this would cost for this experiment.
76
(a) Bicubic interpolation. Comp. time = 1.03 sec.
77
(a) Bicubic interpolation. (b) Iterative L1.
Figure 5.6: Approximately pure translations. Details: dog’s face. (# of LRs = 35).
78
(a) Bicubic interpolation.
In this experiment we used a video of a HR static scene, “Watch”, of size 480 640,
displayed on a laptop screen. The video’s temporal resolution was 30 frames/second. The
video contained periodic streaks which normally result from very close-range shooting of
an LCD screen. The camera was slightly moving while recording. This approximately
corresponds to the pure translational motion case.
We downsampled the first frame by ↓5x5 and used it as our reference PPC. Then, we
downsampled every other frame in the next 100 frames by ↓4x4, of which we kept only
30 frames that are closest to the mean. In other words, we used only 30 frames, which we
pre-denoised using PCA and then used as our primary LR basis set. The super-resolved
image was then post-processed using TV, UM and MD.
Figure 5.8 shows the main portion of the super-resolved image compared to the
corresponding area of the bicubic interpolated (↑4x4) first primary LR frame. The
iterative L1 result is shown in Figure 5.9, for comparison (number of iterations was 20,
the regularization factor was 0.001 and the shift & add image was used as an initial
guess).
79
(a) Bicubic interpolation. Comp. time = 3 sec.
80
Figure 5.9: Approximately pure translations—video: Iterative L1. (# of LRs = 30).
A digital camera was mounted on a tripod and placed on a vibrating table. The
captured images, of the black and white “Michigan Seal”, were thus randomly motion-
blurred15. We used only the first 35 images. These motion-blurred images were of very
high resolution (large number of pixels). We cropped16 them to size 960x960 and then
downsampled them by ↓8x8 and ↓10x10 to obtain the primary and secondary sets of LR
images of easily noticeable aliasing, respectively. Then we super-resolved to size17
480x480.
Figure 5.10 (a) shows the first primary LR image, resized (↑4x4) using bicubic
interpolation. The reference PPC was first estimated in the pixel domain, by solving
problem (4.7) without pre-denoising. Also, we ignored denoising the expansion
_____________________________
15
The vibrations were produced by continuously pounding on the table in different random locations
while the camera was taking separate shots with a lowered shutter’s speed (exposure time = 1 second).
16
We cropped the blank ‘wall’ space in the images.
17
Note that given the dimensions of the primary and secondary LR images we can only super-resolve
with a resolution gain of 4 4, since the ratio of their dimensions is 5/4. Refer to §2.3.3.
81
coefficients as per (4.12). Due to noise magnification associated with estimating the
expansion coefficients of the reference PPC, in the pixel domain (§4.2.2), the estimated
reference PPC was extremely noisy which resulted in the noisy super-resolved image
shown in Figure 5.10 (b).
The LR images were pre-denoised using PCA and then one of the secondary LR
images was chosen as the reference PPC, according to the procedure in §4.5. The
corresponding super-resolved image is shown in Figure 5.11 (a). Note that since the
frames are motion blurred, even the best secondary LR image (that is closest to the mean)
is slightly blurred and thus the corresponding super-resolved image is blurred as well.
Figure 5.11 (b) shows the super-resolved image based on an estimation of the
reference PPC in the feature space (4.14) as described in §4.4. The result is clearly
sharper than the super-resolved image based on a chosen secondary LR image.
In this experiment, there is some translational motion but most of the distortion is
random blur. Moreover, motion estimation, because of the randomness of the blur, is
inaccurate and thus the iterative L1 solution18 performed poorly as shown in Figure 5.12.
This experiment serves to prove the advantage of our non-parametric approach to the
solution of the problem of SR.
We obtained a color video sequence of size 480 640 70 (with temporal resolution of
30 frames/second) of the image “Life” (a page from National Geographic magazine,
featuring life’s diversity and DNA, May 2010 issue). The camera was placed at
approximately 1.5 feet from the page and the zoom-in function was used so as to avoid
empty wall space. Vibrations were produced mechanically by attaching a vibrating device
to the table on which we placed the camera. The vibrations were rhythmic in nature
resulting in both global motion and motion blur.
The 70 frames were downsampled by ↓4x4 and ↓5x5 to produce the primary and
secondary sets of LR images, respectively. These LR images were not pre-denoised19 and
the reference PPC was taken to be one of the secondary LR images.
_____________________________
18
Number of iterations was 20, regularization factor was 0.001, and the initial guess was the shift &
add image.
19
The TV post-processing could take care of the noise augmentation on its own.
82
(a) Bicubic interpolation. Comp. time = 0.83 sec.
(b) Blind SR + post-processed (TV+UM+MD). Ref. PPC was estimated in the pixel domain.
Figure 5.10: Random vibrations: estimating the ref. PPC in the pixel domain. No denoising. (# LRs = 35).
83
(a) Blind SR + post-processed (TV+UM+MD). A single sec. LR image was used as a ref. PPC. Comp. time
= 7.22 sec.
(b) Blind SR + post-processed (TV+UM+MD). Ref. PPC was estimated (in the feature subspace).
Comp. time = 6.9 sec.
Figure 5.11: Random vibrations: using a single sec. LR image vs. estimating the ref. PPC. (# LRs = 35).
84
Figure 5.12: Random vibrations: Iterative L1 + sharpened (UM). (# LRs = 35).
The super-resolved image was then post-processed using TV and UM20. The total
processing time was 13.57 seconds (of which 8.6 seconds were for TV post-processing!).
The reason we needed more images for this experiment, despite its being
representative of the LSI case, is the fact that the rhythmic distortions did not allow for
much change in the captured images within a small time frame. In fact, because the
associated blur was not very random and that there was more global motion shifts,
compared to the previous experiment, the iterative L1 method did relatively well,
although there were still noticeable artifacts around the edges due to registration errors
caused by the presence of (less random) motion blur.
Figures 5.13-5.15 show portions of the bicubic interpolated (and sharpened) first
primary LR image and the corresponding portions of the (sharpened) iterative L1 SR
image21 along with the matching parts of the SR image according to our method.
_____________________________
20
For this experiment we used Photoshop’s unsharp masking, as MATLAB does not provide much
freedom with its unsharp masking tool.
21
Number of iterations was 50, regularization factor was 0.0015, and the initial guess was the shift &
add image.
85
(c) Blind SR + post-processed (TV+UM). (b) Iterative L1 + sharpened (UM). (a) Bicubic interpolation + sharpened (UM).
Comp. time = 13.57 sec. Comp. time = 15+ minutes. Comp. time = 2.65 sec.
86
Figure 5.13: Rhythmic vibrations. Details part I. (# of LRs = 70).
(a) Bicubic interpolation + (b) Iterative L1 + sharpened (UM). (c) Blind SR + post-processed
sharpened (UM). (TV+UM).
87
(a) Bicubic interpolation + (b) Iterative L1 + sharpened (UM). (c) Blind SR + post-processed
sharpened (UM). (TV+UM).
88
Experiment 7: Atmospheric Turbulence
The original high resolution AVI sequence of the Moon, used for this experiment, is
courtesy of Dr. Joseph M. Zawodny, NASA Langley Center. It was shot in coastal22
Virginia at angular (spatial) sampling of 0.34 arcsecond/pixel. The resolution (in terms of
pixel density) almost met the diffraction limit at 1.7 pixels/Airy radius. The temporal
resolution was 30 frames/second.
The sequence contains 1300 frames of size 768x1024, of which, we only used 100
frames23. To obtain easily noticeable aliasing we downsampled them by ↓8x8 and
↓10x10 to obtain the primary and secondary sets of LR images, respectively. These were
pre-denoised using PCA.
The first LR image from the primary set was resized (↑4x4) using bicubic
interpolation and then sharpened as shown in Figure 5.16 (a). Figure 5.16 (b) shows the
sharpened24 and median filtered super-resolved image corresponding to choosing one of
the secondary LR images as a reference PPC, which is slightly better than the super-
resolved image corresponding to estimating the reference PPC, shown in Figure 5.17 (b).
This suggests that we should always obtain two estimates of the HR image corresponding
to estimating the reference PPC and choosing a secondary LR image as reference PPC as
well. Of course, the PCA pre-denoising step need not be repeated.
Finally, Figure 5.17 (a) shows the reconstructed HR image using the iterative L1
method25, after sharpening. The aliasing and other artifacts are due to the fact that the
warping effect is LSV and the motion estimation methods included in the software [49]
can only handle the global motion case, not to mention that the randomness of the blur
negatively affects the performance of motion estimation.
_____________________________
22
The effect of the atmospheric turbulence is larger at lower altitudes.
23
We appended zeros to the HR frames to have dimensions of 800x1040, which are integer multiples of
80. Refer to the discussion related to equation (2.9).
24
We used Photoshop’s unsharp masking, instead of MATLAB’s, for more deblurring freedom.
25
Number of iterations was 20, regularization factor was 0.001, and the initial guess was the shift &
add image.
89
(a) Bicubic interpolation + sharpened. Comp. time = 0.83 sec.
(a) Blind SR + post-processed (UM+MD). A single sec. LR image was used as a ref. PPC. Comp. time =
9.31 sec.
90
(a) Iterative L1 + sharpened (UM).
(b) Blind SR + post-processed (UM+MD). The ref. PPC was estimated. Comp. time = 10.9 sec.
Figure 5.17: Atmospheric turbulence: Blind SR vs. Iterative L1. (# of LRs = 100).
91
CHAPTER VI
6.1 Summary
92
In effect, our proposed method veers away from the major limitations associated with
typical model-based solution of the SR problem. Specifically, our SR method is fast, does
not require any estimation of the degradation process and is robust in the sense that the
only ‘model’ we use is in fact completely accurate: portraying sub PPCs as shifted and
decimated versions of the PPCs. And besides the trivial hardware requirement,
completeness of the LR basis is the only key assumption we make; the invalidity of
which has only one consequence: the PPCs will be partially reconstructed.
Finally, in certain applications where typical multiframe SR performs poorly (e.g. in
the case of random vibrations), our method not only provides a much faster solution, it
actually benefits from the random nature of distortions.
93
larger (or denser) imaging chips, as there are always physical limits that can only be
beaten using super-resolution techniques.
A special application of interest is satellite imaging of the Earth. Driven by the
success of our experiment involving super-resolving lunar images corrupted with
random atmospheric distortion, we would like to pay special attention to super-
resolving satellite images, which are also affected (to a lesser degree) by the
atmosphere.
Can we extend our method to handle the case of dynamic SR? We believe the
answer could be yes, depending on the temporal resolution of the video sequence.
To be more specific, we could use each secondary frame as a reference PPC, thus
obtaining a sequence of SR images that are, in essence, HR versions of the
secondary LR images. This, however, would probably require a temporal resolution
high enough for a valid assumption of the rigidity of the scene within reasonably
short time windows.
An active field of research is learning-based super-resolution, where SR methods
are designed to reconstruct a HR image from a single LR frame. The success of the
reconstruction process is heavily dependent on the training set of images carefully
chosen to be within the same class of the HR image. The best example of this is face
hallucination [35, 36], where a HR face image can be reconstructed from its LR
version, given a database of HR face images. As a future research direction, it is
interesting to investigate whether such methods might benefit from the idea of
applying the property of sampling diversity, where the single (distortion-free) LR
frame plays the role of the reference PPC. In other words, instead of estimating the
HR image directly from the LR image, estimating the PPCs using the LR image as a
reference PPC, might be advantageous since signals at lower resolutions have more
in common. That is to say, it might be easier to train a basis to reconstruct low
resolution signals (PPCs) and as such, the sampling diversity idea could be extended
to single frame SR and without the additional requirement of a secondary sensor.
94
BIBLIOGRAPHY
95
BIBLIOGRAPHY
[6] S. H. Rhee and M. G. Kang, “Discrete cosine transform based regularized high-
resolution image reconstruction algorithm,” Opt. Eng., vol. 38, no.8, pp. 1348-1356,
1999.
[7] M. Elad and Y. Hel-Or, “A fast super-resolution reconstruction algorithm for pure
translational motion and common space-invariant blur,” IEEE Trans. IP, vol. 10, pp.
1187-1193, 2001.
96
[11] S. Chaudhuri and J. Manjunath, Motion-free super-resolution, Springer-Verlag,
New York, 2005.
[14] S. Farsiu, M. D. Robinson, M. Elad, and P. Milanfar, “Fast and robust multiframe
super resolution,” IEEE Trans. IP, vol. 13, pp. 1327-1344, 2004.
[16] H. Stark and P. Oskoui, “High resolution image recovery from image plane arrays,
using convex projections,” J. Opt. Soc. Amer. A, vol. 6, pp. 1715-1726, 1989.
[19] L. Brown, “A survey of image registration techniques,” ACM Comput. Surv., vol.
24, pp. 325-376, 1992.
[22] G. H. Golub and C. F. Van Loan, Matrix Computations: Third Edition, Johns
Hopkins University Press, Baltimore, MD, 1996.
[23] S. Van Huffel and J. Vanderwalle, The Total Least Squares Problem—
Computational Aspects and Analysis, SIAM, Philadelphia, PA, 1991.
97
[26] R. C. Thompson, “Principal submatrices IX: interlacing inequalities for singular
values of submatrices,” Linear Algebra Appl., vol. 5, pp. 1-12, 1972.
[28] G. H. Costa and J. C. M. Bermudez, “Are registration errors always bad for super-
resolution?” ICASSP, vol.1, pp. 569-572, 2007.
[29] R. Tibshirani, “Regression shrinkage and selection via the LASSO,” Journal of
Royal Statistical Society, vol. 58, pp. 267-288, 1996.
[33] I. Markovsky and S. Van Huffel, “Overview of total least-squares methods,” Signal
Processing, vol. 87, pp. 2283-2302, 2007.
[35] J. Yang, H. Tang, Y. Ma, and T. Huang, “Face hallucination via sparse coding,”
ICIP, pp. 1264-1267, 2008.
[36] B. G. V. Kumar and R. Aravind, “A 2D model for face superresolution,” ICPR, pp.
1-4, 2008.
98
[41] K. A. Parulski, L. J. D’Luna, B. L. Benamati, and P. R. Shelley, “High performance
digital color video camera,” J. Electron. Imaging, vol. 1, pp. 35–45, 1992.
[42] A. Zomet, A. Rav-Acha, and S. Peleg, “Robust super resolution,” CVPR, vol. 1, pp.
645–650, 2001.
[44] R. C. Hardie, K. J. Barnard, and E. E. Armstrong, “Joint MAP registration and high-
resolution image estimation using a sequence of undersampled images,” IEEE
Trans. IP, vol. 6, pp. 1621-1633, 1997.
[50] M. C. Roggemann, and B. Welsh, Imaging Through Turbulence, CRC Press, Boca
Raton, Florida, 1996.
[51] R. Paxman, T. Schulz, and J. Fienup, “Joint estimation of object and aberrations by
using phase diversity,” J. Opt. Soc. Amer. A, vol. 9, pp. 1072–1085, 1992.
[52] R. Kindermann and J. L. Snell, Markov Random Fields and Their Applications,
American Math. Soc., Providence, RI, 1980.
[53] L. Rudin, S. Osher, and E. Fatemi, “Nonlinear total variation based noise removal
algorithms,” Physica D, vol. 60, pp. 259–268, 1992.
[54] C. Vogel and M. Oman, “Fast, robust total variation based reconstruction of noisy,
blurred images,” IEEE Trans. IP, vol. 7, pp. 813–824, 1998.
99
[55] S. Mika, B. Schölkopf, A.J. Smola, K. R. Müller, M. Scholz, and G. Rätsch, “Kernel
PCA and De-Noising in Feature Spaces,” Advances in Neural Information
Processing Systems II, M. S. Kearns, S. A. Solla, and D. A. Cohn, eds., pp. 536-542,
MIT Press, Cambridge, MA, 1999.
[57] P. Rousseeuw and K. Van Driessen, “A fast algorithm for the Minimum Covariance
Determinant estimator,” Technometrics, vol. 41, pp. 212–223, 1999.
[58] M. Grant and S. Boyd. CVX: Matlab software for disciplined convex programming
(web page and software). http://stanford.edu/~boyd/cvx, June 2009.
[59] M. Grant and S. Boyd, “Graph implementations for nonsmooth convex programs,”
Recent Advances in Learning and Control (a tribute to M. Vidyasagar), V. Blondel,
S. Boyd, and H. Kimura, editors, pp. 95-110, Lecture Notes in Control and
Information Sciences, Springer, 2008. http://stanford.edu/~boyd/graph_dcp.html.
[63] A. Chambolle, “An algorithm for total variation minimization and applications,” J.
Math. Imaging and Vision, vol. 20, pp. 89-97, 2004.
[64] X. Bresson and T. F. Chan, “Fast minimization of the vectorial total variation norm
and applications to color image processing,” CAM Report 07-25.
100
[68] T. Le, R. Chartrand and T. Asaki, “A variational approach to constructing images
corrupted by poisson noise," J. Math. Imaging and Vision, vol. 27, pp. 257-263,
2007.
[71] M. Schuermans, I. Markovsky, P. Wentzell, and S. Van Huffel, “On the equivalence
between total least squares and maximum likelihood PCA,” Anal. Chim. Acta, vol.
544, pp. 254–267, 2005.
101