Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
2 views

algorithm_guide

This document serves as a tutorial for lensless imaging algorithms utilized in DiffuserCam, which replaces traditional lenses with diffusers for lightweight and less precise imaging systems. It discusses the principles of lensless imaging, the forward and inverse problems in image reconstruction, and provides an overview of the gradient descent algorithm used for solving these problems. The document emphasizes the advantages of DiffuserCam, including its ability to capture 3D images and the ease of creating the device with household materials.

Uploaded by

高守廷
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

algorithm_guide

This document serves as a tutorial for lensless imaging algorithms utilized in DiffuserCam, which replaces traditional lenses with diffusers for lightweight and less precise imaging systems. It discusses the principles of lensless imaging, the forward and inverse problems in image reconstruction, and provides an overview of the gradient descent algorithm used for solving these problems. The document emphasizes the advantages of DiffuserCam, including its ability to capture 3D images and the ease of creating the device with household materials.

Uploaded by

高守廷
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Diffuser Cam: Lensless Imaging Algorithms

Camille Biscarrat and Shreyas Parthasarathy


Advisors: Nick Antipa, Grace Kuo, Laura Waller

December 10, 2018

1 Introduction
This guide is meant as a tutorial for the lensless image reconstruction algorithms used in DiffuserCam.
It provides a brief overview of the optics involved and how it was used to develop the most current version.
See our other document (“How to build a (Pi) DiffuserCam”) for information on how to actually build and
calibrate DiffuserCam.

1.1 Why Diffusers?

For most 2D imaging applications, lens-based systems have been optimized in design and fabrication
to be the best option. However, lensless imaging systems have not been investigated nearly as much.
DiffuserCam is a lensless system that replaces the lens element with a diffuser (a thin, transparent, lightly
scattering material). See Figure 1 below.

Figure 1: Cartoon schematic of DiffuserCam

Possible advantages include:


• Lensless systems are lightweight. Most of the weight and size of imaging systems comes from the
physical constraints of lens design. Substituting lenses for a thin, flat material can allow for smaller,
lighter imaging systems.
• Diffusers require less precise fabrication. We demonstrated that DiffuserCams (of varying quality) can
be created by household scatterers such as Scotch tape. Since the structure of a diffuser is naturally
random, you can create a DiffuserCam yourself without access to precise fabrication tools.

1
• Possibility of 3D imaging/microscopy. We’ve also shown that lensless cameras can capture 3D images
and are robust to missing or dead pixels (see this paper), both of which are promising in the field of
microscopy.

1.2 DiffuserCam

Every diffuser has a “focal plane”. Instead of mapping a faraway point source to a point in this plane
(as lenses do), the diffuser maps a point source to a “caustic pattern” (see Fig. 2a) over the entire plane. So,
replacing the lens in a camera with a diffuser of the same focal length creates a system that maps points in
the scene to many points on the sensor (see Fig. 2b)

(a) Caustic image of a single point (b) Sensor reading of a hand (c) Reconstructed image of a hand
source

Figure 2: The 3 important steps in DiffuserCam’s operation.

The key to DiffuserCam’s operation is that, while light information is spread out over the sensor, none
of that information is lost. You can see in Fig. 2b that the sensor reading won’t look like the object.
However, we can recover the object image using a reconstruction algorithm that requires a single calibration
measurement of the caustic produced by a point source. This measurement, called a point spread function
(PSF), completely characterizes the scattering behavior of the diffuser (under certain assumptions).

1.3 Imaging Systems

To derive the algorithm and understand where these assumptions come from, it’s helpful to think of the
imaging system as a function that maps objects in the real world to images on the sensor. More precisely, it
is a function f that maps a 2D array v of light intensity values (the scene) to a 2D array of pixel values b on
the sensor. Recovering the scene v from a sensor reading b is equivalent to inverting this function (though
sometimes the function isn’t invertible):

f (v) = b =⇒ v = f −1 (b)

First, we need to describe f mathematically. In computational imaging, characterizing f (usually


through a theoretical model of the optics involved) is known as constructing a “forward model,” and inverting
it efficiently is known as the corresponding “inverse problem.” This tutorial covers DiffuserCam algorithms
in roughly that order. Note that f is not always invertible, but that is usually because many v’s can map
to the same b. So, we often introduce priors, or assumptions that constrain the possible v’s in order to
construct an estimate for the scene.

2
2 Problem Specification

2.1 Forward Model

Roughly speaking, f is the composition of everything that happens to light as it travels from the object
scene to the sensor. Each ray from a point in the scene propagates a certain distance to the diffuser and is
locally refracted by the diffuser surface, then propagated again to the sensor plane. Whether or not the ray
hits the sensor depends on how it was bent – we will start by ignoring this issue and addressing the finite
sensor size after constructing the rest of the model.
We make the following approximations:
• Shift invariance: A lateral shift of the point source causes a lateral translation of the sensor reading.

Figure 3: As the point source shifts to the right, the image on the sensor shifts to the left

• Linearity: Scaling the intensity of a point source corresponds to scaling the intensity of the sensor
reading by the same amount. Also, the pattern due to two point sources is the sum of their individual
contributions. These two assumptions amount to having incoherent light sources and a sensor that
responds to light intensity linearly. Both of these conditions are often satisfied.

(a) Point source on axis (b) Point source off axis (c) Superposition of both point
sources

Figure 4: Each point source creates a pattern on the sensor. When two point sources are present,
the sensor reads the superposition of the patterns created by each individual point source.

3
In short, the diffuser system is assumed to be linear shift-invariant (LSI). We assume that v can be
represented as the sum of many point sources of varying intensity and position. By the LSI property of the
system, the output f (v) corresponding to the input v can be represented as a 2D convolution with a single
PSF h:
f (v) = h ∗ v

Since f is linear, it is conceptually helpful to think of it as a matrix. However, matrices operate on


vectors, not 2D images like v and b. We can get around this by vectorizing the images – creating a vector
that contains the same information as the image by stacking all of the columns on top of each other. Thus
our mathematical model can consistently treat these images as 1-dimensional vectors. For example, an m×n
sensor reading would now be an mn-length vector. This trick allows us to represent our convolution as a
2D matrix M where h ∗ v ⇐⇒ Mv. For all the following derivations, we will reserve lowercase letters
for images, and bolded lowercase letters for the corresponding vectorized images. Function notation (with
parentheses or braces denoting arguments) will be used to denote linear operators, and bolded uppercase
letters will be used to denote the matrix representations of these operators.
Now that we’ve constructed a model for how the light propagates to the sensor plane, we need to account
for the sensor’s finite size. While all of the light rays hit the sensor plane, not all of them hit the physical
sensor. So while the output of the diffuser system is a convolution, only part of that convolution is recorded
on the sensor. In other words, the 2D sensor reading is a cropped convolution: f (v) = crop(h ∗ v). The
equivalent vectorized formulation is
crop(h ∗ v) ⇐⇒ CMv
f (v) ⇐⇒ Av
where C is a matrix representation of cropping. We use A as shorthand for CM. This equation serves as
our forward model.

2.2 Inverse Problem

A first approach to solving for v, which ignores the crop, would be to try Wiener deconvolution. This
method is a common way to reverse convolution, but it relies on diagonalizing the measurement matrix, and
cannot model the cropping behavior at all (see our ADMM Jupyter notebook for explanation of diagonal-
ization). While Wiener deconvolution would work if A were convolutional, i.e. A = M, adding in the crop
makes A too complex to invert analytically.
Instead, we must find an efficient numerical way to “invert” f . In general, f isn’t invertible at all:
multiple v’s can be mapped to the same b. We can see A isn’t invertible for two reasons:
• Information is lost in the crop operation, so C is not an invertible matrix.
• Convolving with a fixed function, e.g. h, is not always invertible, so M is not necessarily invertible.
The typical approach to solving Av = b for non-invertible A is to formulate it as an optimization problem,
which has the same form regardless of whether A is convolutional or not:
1
v∗ = argmin kAv − bk22
v 2
When v = v∗ , Av∗ = b and the objective function Av − b is minimized.
It is worth noting that A is extremely large, and scales with the area of the sensor. Our sensor has ∼ 106
pixels, so A would have on the order of 106 × 106 = 1012 entries. While A is useful mathematically, it’s
computationally useless to ever load/store it in memory. Whichever algorithm we choose to solve the
minimization problem has to avoid ever loading A in memory. Our general approach to addressing this issue
will be to make sure the algorithm can be implemented in terms of the linear operators that make up f :
crop and convolution. Both of these operations have fast implementations on 2D images that don’t require
loading their corresponding matrices.

4
3 Solving for v

3.1 Gradient Descent

Gradient descent is an iterative algorithm that finds the minimum of a convex function by following the
slope “downhill” until it reaches a minimum. To solve the minimization problem

minimize g(x),

we find the gradient of g wrt x, ∇x g, and use the property that the gradient always points in the direction
of steepest ascent. In order to minimize g, we go the other direction:

x0 = initial guess
xk+1 ← xk − αk ∇g(xk ),

where α is a step size that determines how far in the descent direction we go at each iteration.
Applied to our problem:
1
g(v) = kAv − bk22
2
∇v g(v) = AH (Av − b),

where AH is the adjoint of A. Again, we want to write A as a composition of linear operators that are easy
to implement, so we never have to deal with A itself. For a product of arbitrary linear matrices FG, the
adjoint is (FG)H = GH FH . In our case:

Av = CMv
AH v = MH CH v

We’ve reduced the problem of finding the adjoint of A to finding the adjoints of M and C.
Finding the adjoint of M: The adjoint of M, a convolution, can be found by writing the operation using
Fourier transforms. The convolution theorem states:

Mv ⇐⇒ h ∗ v = F −1 (F(h) · F(v)),

where the · denotes pointwise multiplication, and F denotes the 2D Fourier transform operator. This
theorem is also known as “convolution of two signals in real space is multiplication in Fourier space.” Next,
we vectorize the previous statement by recognizing that 2D Fourier transforms are linear operators, so we
have the equivalence F(v) ⇐⇒ Fv. To fully write M as a product of matrices, we must also convert the
pointwise multiplication to a matrix multiplication:

F(h) · F(v) ⇐⇒ diag(Fh) Fv.

Also, FH = F−1 by “unitarity” of the Fourier transform. Finally, the adjoint of a diagonal matrix is formed
by taking the complex conjugate of its entries.
In summary,
H
MH v = F−1 diag(Fh) F v
H H −1 H

= F diag(Fh) (F ) v
H ∗
=F diag(Fh) F(v),

where denotes complex conjugation.
Finding the adjoint of C: Finally, we note that the adjoint of cropping, CH , is zero-padding (see section
2.4 the appendix)

5
Plugging in to the formula for AH , we find
( (
A = CF−1 diag(Fh) F f (v) = crop F −1 {F(h) · F(v)}
 
⇐⇒
AH = F−1 diag(Fh)∗ FCH f H (x) = F −1 {F(h)∗ · F(pad [x])} ,

where we have written A in its matrix formulation (left) and the corresponding way it is implemented
in code (right). Note that we converted efficient operations like pointwise multiplication to matrices purely
for the derivation. See the GD Jupyter notebook for the actual implementation of these operators.

3.1.1 GD Implementation

The iterative reconstruction of v looks like:

v0 = anything
vk+1 ← vk − αk AH (Avk − b)
Repeat forever
H
F(h) can be precomputed (because h is measured beforehand), and the action of diag(Fh) can be imple-
mented as pointwise multiplication with the conjugate F(h)∗ . Since all the other operations involve only
Fourier transforms, every operation in the gradient calculation can be efficiently calculated. For implemen-
tation details, see the GD Jupyter notebook.
In our problem, we need to keep in mind the physical interpretation of v. Since it represents an image,
it must be non-negative. We can add this constraint into the algorithm by “projecting” v onto the space
of non-negative images. In short, we zero all negative pixel values in the current image estimate at every
iteration.
One thing to keep in mind is the step size, αk . We want it to be large at first – “coarse” jumps to get
closer to the minimum quickly. As we get closer, large steps will cause the estimate to “bounce around” the
minimum, overshooting it each time. Ideally we would want to decrease the step size with each iteration
at a rate that would ensure continual progress. While varying step size might yield a faster convergence, it
requires hand tuning and can be time consuming. A constant but sufficiently small step size is guaranteed to
converge, with no parameter tuning necessary. In our case, it is possible to calculate the largest constant step
2
size that guarantees convergence in terms of A: 0 < α < H
, where kAH Ak2 is the maximum singular
kA Ak2
value of AH A (see this page for why). The GD Jupyter notebook shows how we actually approximate this
singular value (using M instead).
Lastly, all convergence guarantees are for an infinite number of iterations: “repeat forever”. In practice,
after a certain number of iterations (which varies by application) the updates are too small to change the
estimate significantly. In our case, after incorporating the speedup techniques below, most of the progress is
seen in the first 150-200 iterations. Sharper, more detailed images may require a few hundred more.
We also need to supply an initial “guess” of our image. It doesn’t actually matter what we use for this.
Currently, we are using a uniform image of half intensity, but you could initialize with all 0’s or a random
image.
Incorporating all of these details, we have:

v0 = I/2
for k = 0 to num iters:
0 1.8
vk+1 ← vk − AH (Avk − b)
kAH Ak
0
vk+1 ← proj(vk+1 )
v≥0

6
3.1.2 Gradient Descent Speedup

Gradient descent as written above works, but in practice, people always add a “momentum term” that
incorporates the old descent direction into the calculation of the new descent direction. This guards against
changing the descent direction too much and too often, which can be counterproductive. We implement
momentum by introducing µ, a factor that determines how much the new descent direction is determined
by the old descent direction. Typically µ = 0.9 is a good place to start. Another common practice is to
use “Nesterov” momentum, which involves an intermediate update p. We call this method, along with the
projection step, “accelerated projected gradient descent”.

v0 = I/2, µ = 0.9, p0 = 0
for k = 0 to num iters:
pk+1 ← µpk − αk grad(vk )
0
vk+1 ← vk − µpk + (1 + µ)pk+1
0
vk+1 ← proj(vk+1 )
v≥0

See this page for more details on parameter updates using momentum terms.

3.1.3 FISTA

Another way to speed up gradient descent is the Fast Iterative Shrinkage-Thresholding Algorithm
(FISTA). This also computes the accelerated projected gradient descent, but is more flexible about what the
projection step (or more generally the “proximal” step pL ) does. For example, one can show that doing ac-
celerated descent with `1 -regularization only requires exchanging the projection step with a soft-thresholding
step. Enforcing sparsity in other domains (for instance, on the gradient of the image rather than the image
itself) can be achieved via soft-thresholding transformations of the image. This algorithm is very useful for
solving linear inverse problems in image processing.
Each iteration is as follows (see this paper for a derivation and explanation of each term):

v0 = I/2, t1 = 1, x0 = v0
for k = 0 to num iters:
xk ← pL (vk )
p
1 + 1 + 4t2k
tk+1 ←
2
tk − 1
vk+1 ← xk + (xk − xk−1 )
tk+1

3.2 ADMM

Although gradient descent is a reliable algorithm that is guaranteed to converge, it is still slow. If we
want to process larger sets of data (e.g. 3D imaging), have a live feed of DiffuserCam, or just want to process
images more quickly, we need to tailor the algorithm more closely to the optical system involved. While this
introduces more tuning parameters (“knobs” to turn), speed of reconstruction can be drastically improved.
Here we present (without proof) the result of using alternating direction method of multipiers (ADMM) to
reconstruct the image.
We will only briefly motivate the use of ADMM and then provide the derivation of the update steps
specific to our problem. For background on ADMM, please refer to sections 2 and 3 of: Prof. Boyd’s

7
ADMM tutorial. To understand this document, background knowledge from Chapters 5 (Duality) and 9
(Unconstrained minimization) from his textbook on optimization may be necessary.
Recall the original minimization problem:
1
v̂ = argmin kb − Avk22 , (1)
v≥0 2

where 2D images are interpreted as vectors. We seek to split the single minimization over the vector v into
separable minimizations – for example:
1
v̂ = argmin kb − Cvk22
w≥0,x 2 (2)
s.t. x = Mv, w = v,

where we have decomposed the action of DiffuserCam C = CM into the convolution M followed by a crop
C. The primary reason is to make the expression more amenable to the ADMM algorithm, which adds a
set of “update steps“ for each additional constraint. If we don’t find a nice decomposition, some of these
updates will be inefficent to calculate.
In addition, because of these parallel update steps, we can add constraints (prior information) easily. A
common useful prior we add is to encourage the gradient of the image to be sparse – most natural images
can be approximated by piecwise constant intensities. Typically, gradient sparsity is enforced through “total
variation” regularization, where we include the `1 -norm of the gradient in our objective function:
1
v̂ = argmin kb − Cxk22 + τ kuk1
w≥0,u,x 2 (3)
s.t. x = Mv, u = Ψv, w = v,

where Ψ is a derivative (difference) operator.


The next step is to form the augmented Lagrangian (see section 2.3 in the ADMM reference), which can
be directly read off from the constraints and objective function:
1
L({u, x, w, v}, {ξ, η, ρ}) = kb − Cxk22 + τ kuk1
2
µ1
+ kMv − xk22 + ξ | (Mv − x)
2
µ2 (4)
+ kΨv − uk22 + η | (Ψv − u)
2
µ3
+ kv − wk22 + ρ| (v − w)
2
+ 1+ (w),

where the 1+ (w) term arises from the implicit constraint w ≥ 0:


(
∞ w<0
1+ (w) =
0 w≥0

The Lagrangian dual approach to minimizing the objective function is to solve the following optimization
problem:
maximize min L({u, x, w, v}, {ξ, η, ρ}) (5)
ξ,η,ρ u,x,w,v

The min above indicates that, ideally, we would want to jointly minimize over all the primal variables
(u, x, w, v) first, before performing the outer maximization over the dual variables (ξ, η, ρ). The ADMM
algorithm is a specific way of iteratively finding this optimal point. In reality, we only have estimates of

8
each of the variables, so the algorithm updates our estimates for the minimum primal variables during every
iteration that solves for the maximum dual variables.
Based on this paradigm, we can write down all the intermediate updates that take place in one “global”
update step:


 uk+1 ← argminu L({u, xk , wk , vk }, {ξk , ηk , ρk })

x
k+1 ← argminx L({uk+1 , x, wk , vk }, {ξk , ηk , ρk })
Primal Updates:


 w k+1 ← argminw L({uk+1 , xk+1 , w, vk }, {ξk , ηk , ρk })
vk+1 ← argminv L({uk+1 , xk+1 , wk+1 , v}, {ξk , ηk , ρk })


ξk+1 ← ξk + µ1 (Mvk − xk+1 )

Dual Updates: ηk+1 ← ηk + µ2 (Ψvk+1 − uk+1 )

ρk+1 ← ρk + µ2 (vk+1 − wk+1 )

Notice that each dual update step tries to solve the maximization problem via gradient ascent. In each
global iteration, we make one step in the ascent direction.
Next, for each primal variable, the individual optimization problem only depends on the terms in the
Lagrangian corresponding to that variable. For example, in the u-update, we only need to include the terms
τ kuk1 , µ22 ku − Ψvk22 , and η | (u − Ψv); all the other terms are constant with respect to u. So, we have:

uk+1 ← argminu τ kuk1 + + µ22 kΨvk − uk22 + ηk| (Ψvk − u)





← argminx 12 kb − Cxk22 + µ21 kMvk − xk22 + ξk| (Mvk − x)

x
k+1


 wk+1 ← argminw µ23 kvk − wk22 + ρ|k (vk − w) + 1+ (w)
vk+1 ← argminv µ21 kMv − xk+1 k22 + µ22 kΨv − uk+1 k22 + µ23 kvk+1 − wk+1 k22


ξk+1 ← ξk + µ1 (Mvk − xk+1 )

ηk+1 ← ηk + µ2 (Ψvk+1 − uk+1 )

ρk+1 ← ρk + µ2 (vk+1 − wk+1 )

The primal minimization updates can be solved using standard convex optimization techniques, which
are worked out in the DiffuserCam Derivations Supplement. The results are:

uk+1 ← T µτ (Ψvk + ηk /µ2 )


2
−1
xk+1 ← (C| C + µ1 I) (ξk + µ1 Mvk + C| b)
wk+1 ← max(ρk /µ3 + vk , 0)
vk+1 ← (µ1 M| M + µ2 Ψ| Ψ + µ3 I)−1 rk ,
ξk+1 ← ξk + µ1 (Mvk+1 − xk+1 )
ηk+1 ← ηk + µ2 (Ψvk+1 − uk+1 )
ρk+1 ← ρk + µ2 (vk+1 − wk+1 )

where
rk = (µ3 wk+1 − ρk ) + Ψ| (µ2 uk+1 − ηk ) + M| (µ1 xk+1 − ξk )

You might also like