Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Subspace Pursuit For Compressive Sensing Signal Reconstruction

Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

a

r
X
i
v
:
0
8
0
3
.
0
8
1
1
v
3


[
c
s
.
N
A
]


8

J
a
n

2
0
0
9
1
Subspace Pursuit for Compressive Sensing Signal
Reconstruction
Wei Dai and Olgica Milenkovic
Department of Electrical and Computer Engineering
University of Illinois at Urbana-Champaign
AbstractWe propose a new method for reconstruction of
sparse signals with and without noisy perturbations, termed the
subspace pursuit algorithm. The algorithm has two important
characteristics: low computational complexity, comparable to
that of orthogonal matching pursuit techniques when applied
to very sparse signals, and reconstruction accuracy of the same
order as that of LP optimization methods. The presented analysis
shows that in the noiseless setting, the proposed algorithm can
exactly reconstruct arbitrary sparse signals provided that the
sensing matrix satises the restricted isometry property with a
constant parameter. In the noisy setting and in the case that
the signal is not exactly sparse, it can be shown that the mean
squared error of the reconstruction is upper bounded by constant
multiples of the measurement and signal perturbation energies.
Index TermsCompressive sensing, orthogonal matching pur-
suit, reconstruction algorithms, restricted isometry property,
sparse signal reconstruction.
I. INTRODUCTION
Compressive sensing (CS) is a sampling method closely
connected to transform coding which has been widely used
in modern communication systems involving large scale data
samples. A transform code converts input signals, embedded
in a high dimensional space, into signals that lie in a space
of signicantly smaller dimensions. Examples of transform
coders include the well known wavelet transforms and the
ubiquitous Fourier transform.
Compressive sensing techniques perform transform cod-
ing successfully whenever applied to so-called compressible
and/or K-sparse signals, i.e., signals that can be represented by
K N signicant coefcients over an N-dimensional basis.
Encoding of a K-sparse, discrete-time signal x of dimension
N is accomplished by computing a measurement vector y that
consists of m N linear projections of the vector x. This
can be compactly described via
y = x.
Here, represents an m N matrix, usually over the eld
of real numbers. Within this framework, the projection basis
is assumed to be incoherent with the basis in which the signal
has a sparse representation [1].
Although the reconstruction of the signal x R
N
from the
possibly noisy random projections is an ill-posed problem, the
This work is supported by NSF Grants CCF 0644427, 0729216 and the
DARPA Young Faculty Award of the second author.
Wei Dai and Olgica Milenkovic are with the Department of Electrical and
Computer Engineering, University of Illinois at Urbana-Champaign, Urbana,
IL 61801-2918 USA (e-mail: weidai07@ uiuc.edu; milenkov@uiuc.edu).
strong prior knowledge of signal sparsity allows for recovering
x using m N projections only. One of the outstanding
results in CS theory is that the signal x can be reconstructed
using optimization strategies aimed at nding the sparsest
signal that matches with the m projections. In other words,
the reconstruction problem can be cast as an l
0
minimization
problem [2]. It can be shown that to reconstruct a K-sparse
signal x, l
0
minimization requires only m = 2K random pro-
jections when the signal and the measurements are noise-free.
Unfortunately, the l
0
optimization problem is NP-hard. This
issue has led to a large body of work in CS theory and practice
centered around the design of measurement and reconstruction
algorithms with tractable reconstruction complexity.
The work by Donoho and Cands et. al. [1], [3], [4], [5]
demonstrated that CS reconstruction is, indeed, a polynomial
time problem albeit under the constraint that more than
2K measurements are used. The key observation behind these
ndings is that it is not necessary to resort to l
0
optimization
to recover x from the under-determined inverse problem; a
much easier l
1
optimization, based on Linear Programming
(LP) techniques, yields an equivalent solution, as long as the
sampling matrix satises the so called restricted isometry
property (RIP) with a constant parameter.
While LP techniques play an important role in designing
computationally tractable CS decoders, their complexity is
still highly impractical for many applications. In such cases,
the need for faster decoding algorithms - preferably operating
in linear time - is of critical importance, even if one has
to increase the number of measurements. Several classes of
low-complexity reconstruction techniques were recently put
forward as alternatives to linear programming (LP) based
recovery, which include group testing methods [6], and al-
gorithms based on belief propagation [7].
Recently, a family of iterative greedy algorithms received
signicant attention due to their low complexity and simple
geometric interpretation. They include the Orthogonal Match-
ing Pursuit (OMP), the Regularized OMP (ROMP) and the
Stagewise OMP (StOMP) algorithms. The basic idea behind
these methods is to nd the support of the unknown signal
sequentially. At each iteration of the algorithms, one or several
coordinates of the vector x are selected for testing based
on the correlation values between the columns of and
the regularized measurement vector. If deemed sufciently
reliable, the candidate column indices are subsequently added
to the current estimate of the support set of x. The pursuit
algorithms iterate this procedure until all the coordinates in
2
the correct support set are included in the estimated support
set. The computational complexity of OMP strategies depends
on the number of iterations needed for exact reconstruction:
standard OMP always runs through K iterations, and there-
fore its reconstruction complexity is roughly O(KmN) (see
Section IV-C for details). This complexity is signicantly
smaller than that of LP methods, especially when the signal
sparsity level K is small. However, the pursuit algorithms do
not have provable reconstruction quality at the level of LP
methods. For OMP techniques to operate successfully, one
requires that the correlation between all pairs of columns
of is at most 1/2K [8], which by the Gershgorin Circle
Theorem [9] represents a more restrictive constraint than
the RIP. The ROMP algorithm [10] can reconstruct all K-
sparse signals provided that the RIP holds with parameter

2K
0.06/

log K, which strengthens the RIP requirements


for l
1
-linear programming by a factor of

log K.
The main contribution of this paper is a new algorithm,
termed the subspace pursuit (SP) algorithm. It has provable
reconstruction capability comparable to that of LP methods,
and exhibits the low reconstruction complexity of matching
pursuit techniques for very sparse signals. The algorithm can
operate both in the noiseless and noisy regime, allowing
for exact and approximate signal recovery, respectively. For
any sampling matrix satisfying the RIP with a constant
parameter independent of K, the SP algorithm can recover
arbitrary K-sparse signals exactly from its noiseless mea-
surements. When the measurements are inaccurate and/or the
signal is not exactly sparse, the reconstruction distortion is
upper bounded by a constant multiple of the measurement
and/or signal perturbation energy. For very sparse signals
with K const

N, which, for example, arise in certain


communication scenarios, the computational complexity of the
SP algorithm is upper bounded by O(mNK), but can be
further reduced to O(mN log K) when the nonzero entries
of the sparse signal decay slowly.
The basic idea behind the SP algorithm is borrowed from
coding theory, more precisely, the A

order-statistic algo-
rithm [11] for additive white Gaussian noise channels. In
this decoding framework, one starts by selecting the set of
K most reliable information symbols. This highest reliability
information set is subsequently hard-decision decoded, and
the metric of the parity checks corresponding to the given
information set is evaluated. Based on the value of this
metric, some of the low-reliability symbols in the most reliable
information set are changed in a sequential manner. The
algorithm can therefore be seen as operating on an adaptively
modied coding tree. If the notion of most reliable symbol
is replaced by column of sensing matrix exhibiting highest
correlation with the vector y, the notion of parity-check
metric by residual metric, then the above method can be
easily changed for use in CS reconstruction. Consequently,
one can perform CS reconstruction by selecting a set of K
columns of the sensing matrix with highest correlation that
span a candidate subspace for the sensed vector. If the distance
of the received vector to this space is deemed large, the
algorithm incrementally removes and adds new basis vectors
according to their reliability values, until a sufciently close
candidate word is identied. SP employs a search strategy in
which a constant number of vectors is expurgated from the
candidate list. This feature is mainly introduced for simplicity
of analysis: one can easily extend the algorithm to include
adaptive expurgation strategies that do not necessarily operate
on xed-sized lists.
In compressive sensing, the major challenge associated with
sparse signal reconstruction is to identify in which subspace,
generated by not more than K columns of the matrix ,
the measured signal y lies. Once the correct subspace is
determined, the non-zero signal coefcients are calculated by
applying the pseudoinversion process. The dening character
of the SP algorithm is the method used for nding the K
columns that span the correct subspace: SP tests subsets of
K columns in a group, for the purpose of rening at each
stage an initially chosen estimate for the subspace. More
specically, the algorithm maintains a list of K columns of ,
performs a simple test in the spanned space, and then renes
the list. If y does not lie in the current estimate for the correct
spanning space, one renes the estimate by retaining reliable
candidates, discarding the unreliable ones while adding the
same number of new candidates. The reliability property is
captured in terms of the order statistics of the inner products
of the received signal with the columns of , and the subspace
projection coefcients.
As a consequence, the main difference between ROMP and
the SP reconstruction strategy is that the former algorithm
generates a list of candidates sequentially, without back-
tracing: it starts with an empty list, identies one or several
reliable candidates during each iteration, and adds them to
the already existing list. Once a coordinate is deemed to be
reliable and is added to the list, it is not removed from it
until the algorithm terminates. This search strategy is overly
restrictive, since candidates have to be selected with extreme
caution. In contrast, the SP algorithm incorporates a simple
method for re-evaluating the reliability of all candidates at
each iteration of the process.
At the time of writing this manuscript, the authors became
aware of the related work by J. Tropp, D. Needell and R. Ver-
shynin [12], describing a similar reconstruction algorithm. The
main difference between the SP algorithm and the CoSAMP
algorithm of [12] is in the manner in which new candidates are
added to the list. In each iteration, in the SP algorithm, only K
new candidates are added, while the CoSAMP algorithm adds
2K vectors. This makes the SP algorithm computationally
more efcient, but the underlying analysis more complex. In
addition, the restricted isometry constant for which the SP
algorithm is guaranteed to converge is larger than the one
presented in [12]. Finally, this paper also contains an analysis
of the number of iterations needed for reconstruction of a
sparse signal (see Theorem 6 for details), for which there is
no counterpart in the CoSAMP study.
The remainder of the paper is organized as follows. Sec-
tion II introduces relevant concepts and terminology for de-
scribing the proposed CS reconstruction technique. Section III
contains the algorithmic description of the SP algorithm, along
with a simulation-based study of its performance when com-
pared with OMP, ROMP, and LP methods. Section IV contains
3
the main result of the paper pertaining to the noiseless setting:
a formal proof for the guaranteed reconstruction performance
and the reconstruction complexity of the SP algorithm. Sec-
tion V contains the main result of the paper pertaining to the
noisy setting. Concluding remarks are given in Section VI,
while proofs of most of the theorems are presented in the
Appendix of the paper.
II. PRELIMINARIES
A. Compressive Sensing and the Restricted Isometry Property
Let supp(x) denote the set of indices of the non-zero
coordinates of an arbitrary vector x = (x
1
, . . . , x
N
), and let
[supp(x)[ = ||
0
denote the support size of x, or equivalently,
its l
0
norm
1
. Assume next that x R
N
is an unknown signal
with [supp(x)[ K, and let y R
m
be an observation of x
via M linear measurements, i.e.,
y = x,
where R
mN
is henceforth referred to as the sampling
matrix.
We are concerned with the problem of low-complexity
recovery of the unknown signal x from the measurement y.
A natural formulation of the recovery problem is within an l
0
norm minimization framework, which seeks a solution to the
problem
min |x|
0
subject to y = x.
Unfortunately, the above l
0
minimization problem is NP-hard,
and hence cannot be used for practical applications [3], [4].
One way to avoid using this computationally intractable for-
mulation is to consider a l
1
-regularized optimization problem,
min |x|
1
subject to y = x,
where
|x|
1
=
N

i=1
[x
i
[
denotes the l
1
norm of the vector x.
The main advantage of the l
1
minimization approach is that
it is a convex optimization problem that can be solved ef-
ciently by linear programming (LP) techniques. This method
is therefore frequently referred to as l
1
-LP reconstruction [3],
[13], and its reconstruction complexity equals O
_
m
2
N
3/2
_
when interior point methods are employed [14]. See [15], [16],
[17] for other methods to further reduce the complexity of l
1
-
LP.
The reconstruction accuracy of the l
1
-LP method is de-
scribed in terms of the restricted isometry property (RIP),
formally dened below.
Denition 1 (Truncation): Let R
mN
, x R
N
and
I 1, , N. The matrix
I
consists of the columns of
with indices i I, and x
I
is composed of the entries of x
indexed by i I. The space spanned by the columns of
I
is denoted by span (
I
).
Denition 2 (RIP): A matrix R
mN
is said to satisfy
the Restricted Isometry Property (RIP) with parameters (K, )
1
We interchangeably use both notations in the paper.
for K m, 0 1, if for all index sets I 1, , N
such that [I[ K and for all q R
|I|
, one has
(1 ) |q|
2
2
|
I
q|
2
2
(1 + ) |q|
2
2
.
We dene
K
, the RIP constant, as the inmum of all
parameters for which the RIP holds, i.e.

K
:= inf
_
: (1 ) |q|
2
2
|
I
q|
2
2
(1 + ) |q|
2
2
,
[I[ K, q R
|I|
_
.
Remark 1 (RIP and eigenvalues): If a sampling matrix
R
mN
satises the RIP with parameters (K,
K
), then
for all I 1, , N such that [I[ K, it holds that
1
K

min
(

I
)
max
(

I
) 1 +
K
,
where
min
(

I
) and
max
(

I
) denote the minimal
and maximal eigenvalues of

I
, respectively.
Remark 2 (Matrices satisfying the RIP): Most known fam-
ilies of matrices satisfying the RIP property with optimal or
near-optimal performance guarantees are random. Examples
include:
1) Random matrices with i.i.d. entries that follow either
the Gaussian distribution, Bernoulli distribution with
zero mean and variance 1/n, or any other distribution
that satises certain tail decay laws. It was shown in
[13] that the RIP for a randomly chosen matrix from
such ensembles holds with overwhelming probability
whenever
K C
m
log (N/m)
,
where C is a function of the RIP constant.
2) Random matrices from the Fourier ensemble. Here,
one selects m rows from the N N discrete Fourier
transform matrix uniformly at random. Upon selection,
the columns of the matrix are scaled to unit norm. The
resulting matrix satises the RIP with overwhelming
probability, provided that
K C
m
(log N)
6
,
where C depends only on the RIP constant.
There exists an intimate connection between the LP recon-
struction accuracy and the RIP property, rst described by
Cands and Tao in [3]. If the sampling matrix satises the
RIP with constants
K
,
2K
, and
3K
, such that

K
+
2K
+
3K
< 1, (1)
then the l
1
-LP algorithm will reconstruct all K-sparse signals
exactly. This sufcient condition (1) can be improved to

2K
<

2 1, (2)
as demonstrated in [18].
For subsequent derivations, we need two results summarized
in the lemmas below. The rst part of the claim, as well as a
related modication of the second claim also appeared in [3],
[10]. For completeness, we include the proof of the lemma in
Appendix A.
4
Lemma 1 (Consequences of the RIP):
1) (Monotonicity of
K
) For any two integers K K

K

K
.
2) (Near-orthogonality of columns) Let I, J 1, , N
be two disjoint sets, I

J = . Suppose that
|I|+|J|
<
1. For arbitrary vectors a R
|I|
and b R
|J|
,
[
I
a,
J
b)[
|I|+|J|
|a|
2
|b|
2
,
and
|

J
b|
2

|I|+|J|
|b|
2
.
The lemma implies that
K

2K

3K
, which conse-
quently simplies (1) to
3K
< 1/3. Both (1) and (2) represent
sufcient conditions for exact reconstruction.
In order to describe the main steps of the SP algorithm, we
introduce next the notion of the projection of a vector and its
residue.
Denition 3 (Projection and Residue): Let y R
m
and

I
R
m|I|
. Suppose that

I
is invertible. The projection
of y onto span(
I
) is dened as
y
p
= proj (y,
I
) :=
I

I
y,
where

I
:= (

I
)
1

I
denotes the pseudo-inverse of the matrix
I
, and

stands for
matrix transposition.
The residue vector of the projection equals
y
r
= resid(y,
I
) := y y
p
.
We nd the following properties of projections and residues
of vectors useful for our subsequent derivations.
Lemma 2 (Projection and Residue):
1) (Orthogonality of the residue) For an arbitrary vector
y R
m
, and a sampling matrix
I
R
mK
of full
column rank, let y
r
= resid (y,
I
). Then

I
y
r
= 0.
2) (Approximation of the projection residue) Consider a
matrix R
mN
. Let I, J 1, N be two
disjoint sets, I

J = , and suppose that


|I|+|J|
< 1.
Furthermore, let y span(
I
), y
p
= proj (y,
J
) and
y
r
= resid(y,
J
). Then
|y
p
|
2


|I|+|J|
1
max(|I|,|J|)
|y|
2
, (3)
and
_
1

|I|+|J|
1
max(|I|,|J|)
_
|y|
2
|y
r
|
2
|y|
2
. (4)
The proof of Lemma 2 can be found in Appendix B.
III. THE SP ALGORITHM
The main steps of the SP algorithm are summarized below.
2
Algorithm 1 Subspace Pursuit Algorithm
Input: K, , y
Initialization:
1) T
0
= K indices corresponding to the largest magni-
tude entries in the vector

y.
2) y
0
r
= resid
_
y,

T
0
_
.
Iteration: At the
th
iteration, go through the following steps
1)

T

= T
1

K indices corresponding to the largest


magnitude entries in the vector

y
1
r
_
.
2) Set x
p
=

y.
3) T

= K indices corresponding to the largest elements


of x
p
.
4) y

r
= resid(y,
T
) .
5) If
_
_
y

r
_
_
2
>
_
_
y
1
r
_
_
2
, let T

= T
1
and quit the
iteration.
Output:
1) The estimated signal x, satisfying x
{1, ,N}T
= 0
and x
T
=

y.
A schematic diagram of the SP algorithm is depicted in
Fig. 1(b). For comparison, a diagram of OMP-type methods
is provided in Fig. 1(a). The subtle, but important, differ-
ence between the two schemes lies in the approach used to
generate T

, the estimate of the correct support set T. In


OMP strategies, during each iteration the algorithm selects
one or several indices that represent good partial support
set estimates and then adds them to T

. Once an index is
included in T

, it remains in this set throughout the remainder


of the reconstruction process. As a result, strict inclusion
rules are needed to ensure that a signicant fraction of the
newly added indices belongs to the correct support T. On
the other hand, in the SP algorithm, an estimate T

of size
K is maintained and rened during each iteration. An index,
which is considered reliable in some iteration but shown to be
wrong at a later iteration, can be added to or removed from the
estimated support set at any stage of the recovery process. The
expectation is that the recursive renements of the estimate of
the support set will lead to subspaces with strictly decreasing
distance from the measurement vector y.
We performed extensive computer simulations in order to
compare the accuracy of different reconstruction algorithms
empirically. In the compressive sensing framework, all sparse
signals are expected to be exactly reconstructed as long as the
level of the sparsity is below a certain threshold. However,
the computational complexity to test this uniform reconstruc-
tion ability is O
_
N
K
_
, which grows exponentially with K.
Instead, for empirical testing, we adopt the simulation strategy
described in [5] which calculates the empirical frequency of
2
In Step 3) of the SP algorithm, K indices with the largest correlation
magnitudes are used to form T

. In CoSaMP [12], 2K such indices are used.


This small difference results in different proofs associated with Step 3) and
different RIP constants that guarantee successful signal reconstruction.
5
(a) Iterations in OMP, Stagewise OMP, and Regularized OMP: in each
iteration, one decides on a reliable set of candidate indices to be added
into the list T
1
; once a candidate is added, it remains in the list until the
algorithm terminates.
(b) Iterations in the proposed Subspace Pursuit Algorithm: a list of K can-
didates, which is allowed to be updated during the iterations, is maintained.
Figure 1: Description of reconstruction algorithms for K-
sparse signals: though both approaches look similar, the basic
ideas behind them are quite different.
exact reconstruction for the Gaussian random matrix ensemble.
The steps of the testing strategy are listed below.
1) For given values of the parameters m and N, choose a
signal sparsity level K such that K m/2;
2) Randomly generate a m N sampling matrix from
the standard i.i.d. Gaussian ensemble;
3) Select a support set T of size [T[ = K uniformly at
random, and generate the sparse signal vector x by either
one of the following two methods:
a) Draw the elements of the vector x restricted to T
from the standard Gaussian distribution; we refer
to this type of signal as a Gaussian signal. Or,
b) set all entries of x supported on T to ones; we
refer to this type of signal as a zero-one signal.
Note that zero-one sparse signals are of special interest
for the comparative study, since they represent a partic-
ularly challenging case for OMP-type of reconstruction
strategies.
4) Compute the measurement y = x, apply a recon-
struction algorithm to obtain x, the estimate of x, and
compare x to x;
5) Repeat the process 500 times for each K, and then
simulate the same algorithm for different values of m
and N.
The improved reconstruction capability of the SP method,
compared with that of the OMP and ROMP algorithms, is
illustrated by two examples shown in Fig. 2. Here, the signals
are drawn both according to the Gaussian and zero-one model,
and the benchmark performance of the LP reconstruction
0 10 20 30 40 50 60 70
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Signal Sparsity: K
F
r
e
q
u
e
n
c
y

o
f

E
x
a
c
t

R
e
c
o
n
s
t
r
u
c
t
i
o
n
Reconstruction Rate (500 Realizations): m=128, N=256
Linear Programming (LP)
Subspace Pursuit (SP)
Regularized OMP
Standard OMP
(a) Simulations for Gaussian sparse signals: OMP and ROMP start to fail when
K 19 and when K 22 respectively,
1
-LP begins to fail when K 35,
and the SP algorithm fails only when K 45.
0 10 20 30 40 50 60 70
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Signal Sparsity: K
F
r
e
q
u
e
n
c
y

o
f

E
x
a
c
t

R
e
c
o
n
s
t
r
u
c
t
i
o
n
Reconstruction Rate (500 Realizations): m=128, N=256
Linear Programming (LP)
Subspace Pursuit (SP)
Regularized OMP
Standard OMP
(b) Simulations for zero-one sparse signals: both OMP and ROMP starts to
fail when K 10,
1
-LP begins to fail when K 35, and the SP algorithm
fails when K 29.
Figure 2: Simulations of the exact recovery rate: compared
with OMPs, the SP algorithm has signicantly larger critical
sparsity.
technique is plotted as well.
Figure 2 depicts the empirical frequency of exact reconstruc-
tion. The numerical values on the x-axis denote the sparsity
level K, while the numerical values on the y-axis represent
the fraction of exactly recovered test signals. Of particular
interest is the sparsity level at which the recovery rate drops
below 100% - i.e. the critical sparsity - which, when exceeded,
leads to errors in the reconstruction algorithm applied to some
of the signals from the given class.
The simulation results reveal that the critical sparsity of
the SP algorithm by far exceeds that of the OMP and ROMP
techniques, for both Gaussian and zero-one inputs. The re-
construction capability of the SP algorithm is comparable to
that of the LP based approach: the SP algorithm has a slightly
higher critical sparsity for Gaussian signals, but also a slightly
6
lower critical sparsity for zero-one signals. However, the SP
algorithms signicantly outperforms the LP method when
it comes to reconstruction complexity. As we analytically
demonstrate in the exposition to follow, the reconstruction
complexity of the SP algorithm for both Gaussian and zero-one
sparse signals is O(mN log K), whenever K O
_

N
_
,
while the complexity of LP algorithms based on interior point
methods is O
_
m
2
N
3/2
_
[14] in the same asymptotic regime.
IV. RECOVERY OF SPARSE SIGNALS
For simplicity, we start by analyzing the reconstruction
performance of SP algorithms applied to sparse signals in
the noiseless setting. The techniques used in this context, and
the insights obtained are also applicable to the analysis of
SP reconstruction schemes with signal or/and measurement
perturbations. Note that throughout the remainder of the paper,
we use the notation Si (S D, L, i Z
+
) stacked over
an inequality sign to indicate that the inequality follows from
Denition(D) or Lemma (L) i in the paper.
A sufcient condition for exact reconstruction of arbitrary
sparse signals is stated in the following theorem.
Theorem 1: Let x R
N
be a K-sparse signal, and let
its corresponding measurement be y = x R
m
. If the
sampling matrix satises the RIP with constant

3K
< 0.165, (5)
then the SP algorithm is guaranteed to exactly recover x from
y via a nite number of iterations.
Remark 3: The requirement on RIP constant can be relaxed
to

3K
< 0.205,
if we replace the stopping criterion
_
_
y

r
_
_
2

_
_
y
1
r
_
_
2
with
_
_
y

r
_
_
2
= 0. This claim is supported by substituting

3K
< 0.205 into Equation (6). However, for simplicity of
analysis, we adopt
_
_
y

r
_
_
2

_
_
y
1
r
_
_
2
for the iteration stopping
criterion.
Remark 4: In the original version of this manuscript, we
proved the weaker result
3K
0.06. At the time of revision
of the paper, we were given access to the manuscript [19] by
Needel and Tropp. Using some of the proof techniques in their
work, we managed to improve the results in Theorem 3 and
therefore the RIP constant of the original submission. The in-
terested reader is referred to http://arxiv.org/abs/0803.0811v2
for the rst version of the theorem. This paper contains only
the proof of the stronger result.
This sufcient condition is proved by applying Theorems 2
and 6. The computational complexity is related to the number
of iterations required for exact reconstruction, and is discussed
at the end of Section IV-C. Before providing a detailed analysis
of the results, let us sketch the main ideas behind the proof.
We denote by x
TT
1 and x
TT
the residual signals
based upon the estimates of supp(x) before and after the

th
iteration of the SP algorithm. Provided that the sampling
matrix satises the RIP with constant (5), it holds that
|x
TT
|
2
< |x
TT
1 |
2
,
Figure 3: After each iteration, a K-dimensional hyper-plane
closer to y is obtained.
which implies that at each iteration, the SP algorithm identies
a K-dimensional space that reduces the reconstruction error
of the vector x. See Fig. 3 for an illustration. This observation
is formally stated as follows.
Theorem 2: Assume that the conditions of Theorem 1 hold.
For each iteration of the SP algorithm, one has
|x
TT
|
2
c
K
|x
TT
1 |
2
, (6)
and
_
_
y

r
_
_
2

c
K
1 2
3K
_
_
y
1
r
_
_
2
<
_
_
y
1
r
_
_
2
, (7)
where
c
K
=
2
3K
(1 +
3K
)
(1
3K
)
3
. (8)
To prove Theorem 2, we need to take a closer look at the
operations executed during each iteration of the SP algorithm.
During one iteration, two basic sets of computations and
comparisons are performed: rst, given T
1
, K additional
candidate indices for inclusion into the estimate of the support
set are identied; and second, given

T

, K reliable indices out


of the total 2K indices are selected to form T

. In Subsections
IV-A and IV-B, we provide the intuition for choosing the se-
lection rules. Now, let x
T

be the residue signal coefcient


vector corresponding to the support set estimate

T

. We have
the following two theorems.
Theorem 3: It holds that
_
_
x
T

_
_
2

2
3K
(1
3K
)
2
|x
TT
1|
2
.
The proof of the theorem is postponed to Appendix D.
Theorem 4: The following inequality is valid
|x
TT
|
2

1 +
3K
1
3K
_
_
x
T

_
_
2
.
The proof of the result is deferred to Appendix E.
Based on Theorems 3 and 4, one arrives at the result claimed
in Equation (6).
7
Furthermore, according to Lemmas 1 and 2, one has
_
_
y

r
_
_
2
= |resid(y,
T
)|
2
= |resid(
TT
x
TT
,
T
)
+resid(
T
x
T
,
T
)|
2
D3
= |resid(
TT
x
TT
,
T
) +0|
2
(4)
|
TT
x
TT
|
2
(6)

_
1 +
K
c
K
|x
TT
1 |
2
, (9)
where the second equality holds by the denition of the
residue, while (4) and (6) refer to the labels of the inequalities
used in the bounds. In addition,
_
_
y
1
r
_
_
2
= |resid (y,
T
1 )|
2
= |resid (
TT
1x
TT
1 ,
T
1 )|
2
(4)

1
K

2K
1
K
|
TT
1x
TT
1 |
2

1 2
2K
1
K
_
1
K
|x
TT
1|
2

1 2
2K

1
K
|x
TT
1 |
2
. (10)
Upon combining (9) and (10), one obtains the following upper
bound
_
_
y

r
_
_
2

_
1
2
K
1 2
2K
c
K
_
_
y
1
r
_
_
2
L1

1
1 2
3K
c
K
_
_
y
1
r
_
_
2
.
Finally, elementary calculations show that when
3K
0.165,
c
K
1 2
3K
< 1,
which completes the proof of Theorem 2.
A. Why Does Correlation Maximization Work for the SP
Algorithm?
Both in the initialization step and during each iteration
of the SP algorithm, we select K indices that maximize
the correlations between the column vectors and the residual
measurement. Henceforth, this step is referred to as correlation
maximization (CM). Consider the ideal case where all columns
of are orthogonal
3
. In this scenario, the signal coefcients
can be easily recovered by calculating the correlations v
i
, y)
- i.e., all indices with non-zero magnitude are in the correct
support of the sensed vector. Now assume that the sampling
matrix satises the RIP. Recall that the RIP (see Lemma
1) implies that the columns are locally near-orthogonal. Con-
sequently, for any j not in the correct support, the magnitude
of the correlation v
j
, y) is expected to be small, and more
precisely, upper bounded by
K+1
|x|
2
. This seems to provide
a very simple intuition why correlation maximization allows
for exact reconstruction. However, this intuition is not easy
3
Of course, in this case no compression is possible.
to analytically justify due to the following fact. Although it
is clear that for all indices j / T, the values of [v
j
, y)[ are
upper bounded by
K+1
|x|, it may also happen that for all
i T, the values of [v
i
, y)[ are small as well. Dealing with
maximum correlations in this scenario cannot be immediately
proved to be a good reconstruction strategy. The following
example illustrates this point.
Example 1: Without loss of generality, let
T = 1, , K. Let the vectors v
i
(i T) be orthonormal,
and let the remaining columns v
j
, j / T, of be constructed
randomly, using i.i.d. Gaussian samples. Consider the
following normalized zero-one sparse signal
y =
1

iT
v
i
.
Then, for K sufciently large,
[v
i
, y)[ =
1

K
1, for all 1 i K.
It is straightforward to envision the existence of an index j /
T, such that
[v
j
, y)[
K+1
>
1

K
.
The latter inequality is critical, because achieving very small
values for the RIP constant is a challenging task.
This example represents a particularly challenging case for
the OMP algorithm. Therefore, one of the major constraints
imposed on the OMP algorithm is the requirement that
max
iT
[v
i
, y)[ =
1

K
> max
j / T
[v
j
, y)[
K+1
.
To meet this requirement,
K+1
has to be less than 1/

K,
which decays fast as K increases.
In contrast, the SP algorithm allows for the existence of
some index j / T with
max
iT
[v
i
, y)[ < [v
j
, y)[ .
As long as the RIP constant
3K
is upper bounded by the
constant given in (5), the indices in the correct support of
x, that account for the most signicant part of the energy
of the signal, are captured by the CM procedure. Detailed
descriptions of how this can be achieved are provided in the
proofs of the previously stated Theorems 3 and 5.
Let us rst focus on the initialization step. By the denition
of the set T
0
in the initialization stage of the algorithm, the
set of the K selected columns ensures that
|

T
0y|
2
|

T
y|
2
D2
(1
K
) |x|
2
. (11)
Now, if we assume that the estimate T
0
is disjoint from
the correct support, i.e., that T
0

T = , then by the near


orthogonality property of Lemma 1, one has
|

T
0 y|
2
= |

T
0
T
x
T
|
2

2K
|x|
2
.
The last inequality clearly contradicts (11) whenever
K

2K
< 1/2. Consequently, if
2K
< 1/2, then
T
0

T ,= ,
8
and at least one correct element of the support of x is in T
0
.
This phenomenon is quantitatively described in Theorem 5.
Theorem 5: After the initialization step, one has
_
_
x
T
0
T
T
_
_
2

1
K
2
2K
1 +
K
|x|
2
,
and
|x
TT
0 |
2

_
8
2K
8
2
2K
1 +
2K
|x|
2
.
The proof of the theorem is postponed to Appendix C.
To study the effect of correlation maximization during each
iteration, one has to observe that correlation calculations are
performed with respect to the vector
y
1
r
= resid(y,
T
1 )
instead of being performed with respect to the vector y.
As a consequence, to show that the CM process captures a
signicant part of residual signal energy requires an analysis
including a number of technical details. These can be found
in the Proof of Theorem 3.
B. Identifying Indices Outside of the Correct Support Set
Note that there are 2K indices in the set

T

, among which
at least K of them do not belong to the correct support set T.
In order to expurgate those indices from

T

, or equivalently,
in order to nd a K-dimensional subspace of the space
span (

) closest to y, we need to estimate these K incorrect


indices.
Dene T :=

T

T
1
. This set contains the K indices
which are deemed incorrect. If T

T = , our estimate of
incorrect indices is perfect. However, sometimes T

T ,= .
This means that among the estimated incorrect indices, there
are some indices that actually belong to the correct support set
T. The question of interest is how often these correct indices
are erroneously removed from the support estimate, and how
quickly the algorithm manages to restore them back.
We claim that the reduction in the ||
2
norm introduced by
such erroneous expurgation is small. The intuitive explanation
for this claim is as follows. Let us assume that all the
indices in the support of x have been successfully captured, or
equivalently, that T

T

. When we project y onto the space


span(

T
), it can be shown that its corresponding projection
coefcient vector x
p
satises
x
p
= x

T
,
and that it contains at least K zeros. Consequently, the K
indices with smallest magnitude - equal to zero - are clearly
not in the correct support set.
However, the situation changes when T

T

, or equiva-
lently, when T

T

,= . After the projection, one has


x
p
= x

T
+
for some nonzero R
[

[
. View the projection coefcient
vector x
p
as a smeared version of x

T
(see Fig. 4 for
illustration): the coefcients indexed by i / T may become
non-zero; the coefcients indexed by i T may experience
Figure 4: The projection coefcient vector x

p
is a smeared
version of the vector x
T
T
T
.
changes in their magnitudes. Fortunately, the energy of this
smear, i.e., ||
2
, is proportional to the norm of the residual
signal x
T

T
, which can be proved to be small according to
the analysis accompanying Theorem 3. As long as the smear
is not severe, x
p
x

, one should be able to obtain a


good estimate of T

via the largest projection coefcients.


This intuitive explanation is formalized in the previously stated
Theorem 4.
C. Convergence of the SP Algorithm
In this subsection, we upper bound the number of iterations
needed to reconstruct an arbitrary K-sparse signal using the
SP algorithm.
Given an arbitrary K-sparse signal x, we rst arrange its
elements in decreasing order of magnitude. Without loss of
generality, assume that
[x
1
[ [x
2
[ [x
K
[ > 0,
and that x
j
= 0, j > K. Dene

min
:=
[x
K
[
|x|
2
=
min
1iK
[x
i
[
_

K
i=1
x
2
i
. (12)
Let n
it
denote the number of iterations of the SP algorithm
needed for exact reconstruction of x. Then the following
theorem upper bounds n
it
in terms of c
K
and
min
. It can be
viewed as a bound on the complexity/performance trade-off
for the SP algorithm.
Theorem 6: The number of iterations of the SP algorithm
is upper bounded by
n
it
min
_
log
min
log c
K
+ 1,
1.5 K
log c
K
_
.
This result is a combination of Theorems 7 and (12),
4
described below.
4
The upper bound in Theorem 7 is also obtained in [12] while the one in
Theorem 8 is not.
9
Theorem 7: One has
n
it

log
min
log c
K
+ 1.
Theorem 8: It can be shown that
n
it

1.5 K
log c
K
.
The proof of Theorem 7 is intuitively clear and presented
below, while the proof of Theorem 8 is more technical and
postponed to Appendix F.
Proof of Theorem 7: The theorem is proved by contra-
diction. Consider T

, the estimate of T, with


l =
_
log
min
log c
K
+ 1
_
.
Suppose that T T

, or equivalently, T T

,= . Then
|x
TT
|
2
=


iTT

x
2
i
min
iT
[x
i
[
(12)
=
min
|x|
2
.
However, according to Theorem 2,
|x
TT
|
2
(c
K
)

|x|
2
<
min
|x|
2
,
where the last inequality follows from our choice of such
that (c
K
)

<
min
. This contradicts the assumption T T

and therefore proves Theorem 7.


A drawback of Theorem 7 is that it sometimes overestimates
the number of iterations, especially when
min
1. The
example to follow illustrates this point.
Example 2: Let K = 2, x
1
= 2
10
, x
2
= 1, x
3
= =
x
N
= 0. Suppose that the sampling matrix satises the RIP
with c
K
=
1
2
. Noting that
min
2
10
, Theorem 6 implies
that
n
it
11.
Indeed, if we take a close look at the steps of the SP algorithm,
we can verify that
n
it
1.
After the initialization step, by Theorem 5, it can be shown
that
|x
TT
0 |
2

_
8
2K
8
2
2K
1 +
2K
|x|
2
< 0.95 |x|
2
.
As a result, the estimate T
0
must contain the index one and
|x
TT
0 |
2
1. After the rst iteration, since
|x
TT
1 |
2
c
K
|x
TT
0 | < 0.95 < min
iT
[x
i
[ ,
we have T T
1
.
This example suggests that the upper bound (7) can be
tightened when the signal components decay fast. Based on
the idea behind this example, another upper bound on n
it
is
described in Theorem 8 and proved in Appendix F.
It is clear that the number of iterations required for exact re-
construction depends on the values of the entries of the sparse
signal. We therefore focus our attention on the following three
particular classes of sparse signals.
1) Zero-one sparse signals. As explained before, zero-one
signals represent the most challenging reconstruction
category for OMP algorithms. However, this class of
signals has the best upper bound on the convergence
rate of the SP algorithm. Elementary calculations reveal
that
min
= 1/

K and that
n
it

log K
2 log(1/c
K
)
.
2) Sparse signals with power-law decaying entries (also
known as compressible sparse signals). Signals in this
category are dened via the following constraint
[x
i
[ c
x
i
p
,
for some constants c
x
> 0 and p > 1. Compressible
sparse signals have been widely considered in the CS
literature, since most practical and naturally occurring
signals belong to this class [13]. It follows from Theo-
rem 7 that in this case
n
it

p log K
log(1/c
K
)
(1 + o (1)) ,
where o (1) 0 when K .
3) Sparse signals with exponentially decaying entries. Sig-
nals in this class satisfy
[x
i
[ c
x
e
pi
, (13)
for some constants c
x
> 0 and p > 0. Theorem 6 implies
that
n
it

_
pK
log(1/cK)
(1 + o (1)) if 0 < p 1.5
1.5K
log(1/cK)
if p > 1.5
,
where again o (1) 0 as K .
Simulation results, shown in Fig. 5, indicate that the above
analysis gives the right order of growth in complexity with
respect to the parameter K. To generate the plots of Fig.
5, we set m = 128, N = 256, and run simulations for
different classes of sparse signals. For each type of sparse
signal, we selected different values for the parameter K, and
for each K, we selected 200 different randomly generated
Gaussian sampling matrices and as many different support
sets T. The plots depict the average number of iterations
versus the signal sparsity level K, and they clearly show that
n
it
= O(log (K)) for zero-one signals and sparse signals
with coefcients decaying according to a power law, while
n
it
= O(K) for sparse signals with exponentially decaying
coefcients.
With the bound on the number of iterations required for
exact reconstruction at hand, the computational complexity of
the SP algorithm can be easily estimated: it equals the com-
plexity of one iteration multiplied by the number of iterations.
10
0 5 10 15 20 25 30
0
1
2
3
4
5
6
K
A
v
e
r
a
g
e

N
u
m
b
e
r

o
f

I
t
e
r
a
t
i
o
n
s

:

n
i
t
m=128, N=256, # of realizations=200
Zeroone sparse signal
Power law decaying sparse signal: p=2
Exponentially decaying sparse signal: p=log(2)/2
O(K)
O(log(K))
Figure 5: Convergence of the subspace pursuit algorithm for
different signals.
In each iteration, CM requires mN computations in general.
For some measurement matrices with special structures, for ex-
ample, sparse matrices, the computational cost can be reduced
signicantly. The cost of computing the projections is of the
order of O
_
K
2
m
_
, if one uses the Modied Gram-Schmidt
(MGS) algorithm [20, pg. 61]. This cost can be reduced
further by reusing the computational results of past iterations
within future iterations. This is possible because most practical
sparse signals are compressible, and the signal support set
estimates in different iterations usually intersect in a large
number of indices. Though there are many ways to reduce
the complexity of both the CM and projection computation
steps, we only focus on the most general framework of the SP
algorithm, and assume that the complexity of each iteration
equals O
_
mN + mK
2
_
. As a result, the total complexity
of the SP algorithm is given by O
_
m
_
N + K
2
_
log K
_
for
compressible sparse signals, and it is upper bounded by
O
_
m
_
N + K
2
_
K
_
for arbitrary sparse signals. When the
signal is very sparse, in particular, when K
2
O(N), the
total complexity of SP reconstruction is upper bounded by
O(mNK) for arbitrary sparse signals and by O(mN log K)
for compressible sparse signals (we once again point out that
most practical sparse signals belong to this signal category
[13]).
The complexity of the SP algorithm is comparable to OMP-
type algorithms for very sparse signals where K
2
O(N).
For the standard OMP algorithm, exact reconstruction always
requires K iterations. In each iteration, the CM operation costs
O(mN) computations and the complexity of the projection
is marginal compared with the CM. The corresponding total
complexity is therefore always O(mNK). For the ROMP
and StOMP algorithms, the challenging signals in terms of
convergence rate are also the sparse signals with exponentially
decaying entries. When the p in (13) is sufciently large, it can
be shown that both ROMP and StOMP also need O(K) iter-
ations for reconstruction. Note that CM operation is required
in both algorithms. The total computational complexity is then
O(mNK).
The case that requires special attention during analysis
is K
2
> O(N). Again, if compressible sparse signals are
considered, the complexity of projections can be signicantly
reduced if one reuses the results from previous iterations at the
current iteration. If exponentially decaying sparse signals are
considered, one may want to only recover the energetically
most signicant part of the signal and treat the residual of
the signal as noise reduce the effective signal sparsity
to K

K. In both cases, the complexity depends on the


specic implementation of the CM and projection operations
and is beyond the scope of analysis of this paper.
One advantage of the SP algorithm is that the number of
iterations required for recovery is signicantly smaller than
that of the standard OMP algorithm for compressible sparse
signals. To the best of the authors knowledge, there are no
known results on the number of iterations of the ROMP and
StOMP algorithms needed for recovery of compressible sparse
signals.
V. RECOVERY OF APPROXIMATELY SPARSE SIGNALS
FROM INACCURATE MEASUREMENTS
We rst consider a sampling scenario in which the signal
x is K-sparse, but the measurement vector y is subjected to
an additive noise component, e. The following theorem gives
a sufcient condition for convergence of the SP algorithm in
terms of the RIP constant
3K
, as well as an upper bounds on
the recovery distortion that depends on the energy (l
2
-norm)
of the error vector e.
Theorem 9 (Stability under measurement perturbations):
Let x R
N
be such that [supp(x)[ K, and let its
corresponding measurement be y = x+e, where e denotes
the noise vector. Suppose that the sampling matrix satises
the RIP with parameter

3K
< 0.083. (14)
Then the reconstruction distortion of the SP algorithm satises
|x x|
2
c

K
|e|
2
,
where
c

K
=
1 +
3K
+
2
3K

3K
(1
3K
)
.
The proof of the theorem is given in Section V-A.
We also study the case where the signal x is only approxi-
mately K-sparse, and the measurement y is contaminated by
a noise vector e. To simplify the notation, we henceforth use
x
K
to denote the vector obtained from x by maintaining the
K entries with largest magnitude and setting all other entries
in the vector to zero. In this setting, a signal x is said to be
approximately K-sparse if x x
K
,= 0. Based on Theorem
9, we can upper bound the recovery distortion in terms of the
l
1
and l
2
norms of x x
K
and e, respectively, as follows.
Corollary 1: (Stability under signal and measurement per-
turbations) Let x R
N
be approximately K-sparse, and let
y = x + e. Suppose that the sampling matrix satises the
RIP with parameter

6K
< 0.083.
11
Then
|x x|
2
c

2K
_
|e|
2
+
_
1 +
6K
K
|x x
K
|
1
_
.
The proof of this corollary is given in Section V-B. As
opposed to the standard case where the input sparsity level of
the SP algorithm equals the signal sparsity level K, one needs
to set the input sparsity level of the SP algorithm to 2K in
order to obtain the claim stated in the above corollary.
Theorem 9 and Corollary 1 provide analytical upper bounds
on the reconstruction distortion of the noisy version of the
SP algorithm. In addition to these theoretical bounds, we
performed numerical simulations to empirically estimate the
reconstruction distortion. In the simulations, we rst select the
dimension N of the signal x, and the number of measurements
m. We then choose a sparsity level K such that K m/2.
Once the parameters are chosen, an m N sampling matrix
with standard i.i.d. Gaussian entries is generated. For a given
K, the support set T of size [T[ = K is selected uniformly
at random. A zero-one sparse signal is constructed as in
the previous section. Finally, either signal or a measurement
perturbations are added as follows:
1) Signal perturbations: the signal entries in T are kept un-
changed but the signal entries outside of T are perturbed
by i.i.d. Gaussian ^
_
0,
2
s
_
samples.
2) Measurement perturbations: the perturbation vector e is
generated using a Gaussian distribution with zero mean
and covariance matrix
2
e
I
m
, where I
m
denotes the m
m identity matrix.
We ran the SP reconstruction process on y, 500 times for
each K,
2
s
and
2
e
. The reconstruction distortion |x x|
2
is obtained via averaging over all these instances, and the
results are plotted in Fig. 6. Consistent with the ndings of
Theorem 9 and Corollary 1, we observe that the recovery dis-
tortion increases linearly with the l
2
-norm of the measurement
error. Even more encouraging is the fact that the empirical
reconstruction distortion is typically much smaller than the
corresponding upper bounds. This is likely due to the fact that,
in order to simplify the expressions involved, many constants
and parameters used in the proof were upper bounded.
A. Recovery Distortion under Measurement Perturbations
The rst step towards proving Theorem 9 is to upper bound
the reconstruction error for a given estimated support set

T,
as succinctly described in the lemma to follow.
Lemma 3: Let x R
N
be a K-sparse vector, |x|
0
K,
and let y = x+e be a measurement for which R
mN
satises the RIP with parameter
K
. For an arbitrary

T
1, , N such that

K, dene x as
x

T
=

T
y,
and
x
{1, ,N}

T
= 0.
Then
|x x|
2

1
1
3K
_
_
x
T

T
_
_
2
+
1 +
3K
1
3K
|e|
2
.
10
2
10
1
10
0
10
3
10
2
10
1
10
0
Perturbation Level
R
e
c
o
v
e
r
y

D
i
s
t
o
r
t
i
o
n
Recovery Distortion (500 Realizations): m=128, N=256
Approx. Sparse Signal : K=20
Approx. Sparse Signal : K=5
Noisy Measurement : K=20
Noisy Measurement : K=10
Noisy Measurement : K=5
Figure 6: Reconstruction distortion under signal or measure-
ment perturbations: both perturbation level and reconstruction
distortion are described via the l
2
norm.
The proof of the lemma is given in Appendix G.
Next, we need to upper bound the norm |x
TT
|
2
in the

th
iteration of the SP algorithm. To achieve this task, we
describe in the theorem to follow how |x
TT
|
2
depends on
the RIP constant and the noise energy |e|
2
.
Theorem 10: It holds that
_
_
x
T

_
_
2

2
3K
(1
3K
)
2
|x
TT
1 |
2
+
2 (1 +
3K
)
1
3K
|e|
2
,
(15)
|x
TT
|
2

1 +
3K
1
3K
_
_
x
T

T

_
_
2
+
2
1
3K
|e|
2
, (16)
and therefore,
|x
TT
|
2

2
3K
(1 +
3K
)
(1
3K
)
3
|x
TT
1 |
2
+
4 (1 +
3K
)
(1
3K
)
2
|e|
2
.
(17)
Furthermore, suppose that
|e|
2

3K
|x
TT
1|
2
. (18)
Then one has
_
_
y

r
_
_
2
<
_
_
y
1
r
_
_
2
whenever

3K
< 0.083.
Proof: The upper bounds in Inequalities (15) and (16) are
proved in Appendix H and I, respectively. The inequality (17)
is obtained by substituting (15) into (16) as shown below:
|x
TT
|
2

2
3K
(1 +
3K
)
(1
3K
)
3
|x
TT
1 |
2
+
2 (1 +
3K
)
2
+ 2 (1
3K
)
(1
3K
)
2
|e|
2

2
3K
(1 +
3K
)
(1
3K
)
3
|x
TT
1 |
2
+
4 (1 +
3K
)
(1
3K
)
2
|e|
2
.
12
To complete the proof, we make use of Lemma 2 stated in
Section II. According to this lemma, we have
_
_
y

r
_
_
2
= |resid(y,
T
)|
2
|resid(
TT
x
TT
,
T
)|
2
+|resid(e,
T
)|
2
L2
|
TT
x
TT
|
2
+|e|
2

_
1 +
3K
|x
TT
|
2
+|e|
2
, (19)
and
_
_
y
1
r
_
_
2
= |resid(y,
T
1 )|
2
|resid(
TT
1x
TT
1 ,
T
1 )|
2
|resid (e,
T
1)|
2
L2

1 2
3K
1
3K
|
TT
1 x
TT
1 |
2
|e|
2

1 2
3K
1
3K
_
1
3K
|x
TT
1 |
2
|e|
2
=
1 2
3K

1
3K
|x
TT
1 |
2
|e|
2
. (20)
Apply the inequalities (17) and (18) to (19) and (20). Nu-
merical analysis shows that as long as
3K
< 0.085, the
right hand side of (19) is less than that of (20) and therefore
_
_
y

r
_
_
2
<
_
_
y
1
r
_
_
2
. This completes the proof of the theorem.
Based on Theorem 10, we conclude that when the SP
algorithm terminates, the inequality (18) is violated and we
must have
|e|
2
>
3K
|x
TT
1 |
2
.
Under this assumption, it follows from Lemma 3 that
|x x|
2

_
1
1
3K
1

3K
+
1 +
3K
1
3K
_
|e|
2
=
1 +
3K
+
2
3K

3K
(1
3K
)
|e|
2
,
which completes the proof of Theorem 9.
B. Recovery Distortion under Signal and Measurement Per-
turbations
The proof of Corollary 1 is based on the following two
lemmas, which are proved in [21] and [22], respectively.
Lemma 4: Suppose that the sampling matrix R
mN
satises the RIP with parameter
K
. Then, for every x R
N
,
one has
|x|
2

_
1 +
K
_
|x|
2
+
1

K
|x|
1
_
.
Lemma 5: Let x R
d
be K-sparse, and let x
K
denote the
vector obtained from x by keeping its K entries of largest
magnitude, and by setting all its other components to zero.
Then
|x x
K
|
2

|x|
1
2

K
.
To prove the corollary, consider the measurement vector
y = x +e
= x
2K
+ (x x
2K
) +e.
By Theorem 9, one has
| x x
2K
|
2
C
6K
(|(x x
2K
)|
2
+|e|
2
) ,
and invoking Lemma 4 shows that
|(x x
2K
)|
2

_
1 +
6K
_
|x x
2K
|
2
+
|x x
2K
|
1

6K
_
.
Furthermore, Lemma 5 implies that
|x x
2K
|
2
= |(x x
K
) (x x
K
)
K
|
2

1
2

K
|x x
K
|
1
.
Therefore,
|(x x
2K
)|
2

_
1 +
6K
_
|x x
K
|
1
2

K
+
|x x
2K
|
1

6K
_

_
1 +
6K
|x x
K
|
1

K
,
and
| x x
2K
|
2
c

2K
_
|e|
2
+
_
1 +
6K
|x x
K
|
1

K
_
,
which completes the proof.
VI. CONCLUSION
We introduced a new algorithm, termed subspace pursuit,
for low-complexity recovery of sparse signals sampled by ma-
trices satisfying the RIP with a constant parameter
3K
. Also
presented were simulation results demonstrating that the re-
covery performance of the algorithm matches, and sometimes
even exceeds, that of the LP programming technique; and,
simulations showing that the number of iterations executed
by the algorithm for zero-one sparse signals and compressible
signals is of the order O(log K).
VII. ACKNOWLEDGMENT
The authors are grateful to Prof. Helmut Blcskei for
handling the manuscript, and the reviewers for their thorough
and insightful comments and suggestions.
APPENDIX
We provide next detailed proofs for the lemmas and theo-
rems stated in the paper.
13
A. Proof of Lemma 1
1) The rst part of the lemma follows directly from the
denition of
K
. Every vector q R
K
can be extended
to a vector q

R
K

by attaching K

K zeros to it.
From the fact that for all J 1, , N such that
[J[ K

, and all q

R
K

, one has
(1
K
) |q

|
2
2
|
J
q

|
2
2
(1 +
K
) |q

|
2
2
,
it follows that
(1
K
) |q|
2
2
|
I
q|
2
2
(1 +
K
) |q|
2
2
for all [I[ K and q R
K
. Since
K
is dened as
the inmum of all parameters that satisfy the above
inequalities,
K

K
.
2) The inequality
[
I
a,
J
b)[
|I|+|J|
|a|
2
|b|
2
obviously holds if either one of the norms |a|
2
and
|b|
2
is zero. Assume therefore that neither one of them
is zero, and dene
a

= a/ |a|
2
, b

= b/ |b|
2
,
x

=
I
a, y

=
J
b.
Note that the RIP implies that
2
_
1
|I|+|J|
_
|x

+y

|
2
2
=
_
_
_
_
[
i

j
]
_
a

__
_
_
_
2
2
2
_
1 +
|I|+|J|
_
, (21)
and similarly,
2
_
1
|I|+|J|
_
|x

|
2
2
=
_
_
_
_
[
i

j
]
_
a

__
_
_
_
2
2
2
_
1 +
|I|+|J|
_
.
We thus have
x

, y

) =
|x

+y

|
2
2
|x

|
2
2
4

|I|+|J|
,
x

, y

) =
|x

|
2
2
|x

+y

|
2
2
4

|I|+|J|
,
and therefore
[
I
a,
J
b)[
|a|
2
|b|
2
= [x

, y

)[
|I|+|J|
.
Now,
|

J
b|
2
= max
q: q
2
=1
[q

J
b)[
max
q: q
2
=1

|I|+|J|
|q|
2
|b|
2
=
|I|+|J|
|b|
2
,
which completes the proof.
B. Proof of Lemma 2
1) The rst claim is proved by observing that

I
y
r
=

I
_
y
I
(

I
)
1

I
y
_
=

I
y

I
y = 0.
2) To prove the second part of the lemma, let
y
p
=
J
x
p
, and y =
I
x.
By Lemma 1, we have
[y
p
, y)[ = [
J
x
p
,
I
x)[
L1

|I|+|J|
|x
p
|
2
|x|
2

|I|+|J|
|y
p
|
2
_
1
|J|
|y|
2
_
1
|I|


|I|+|J|
1
max(|I|,|J|)
|y
p
|
2
|y|
2
.
On the other hand, the left hand side of the above
inequality reads as
y
p
, y) = y
p
, y
p
+y
r
) = |y
p
|
2
2
.
Thus, we have
|y
p
|
2


|I|+|J|
1
max(|I|,|J|)
|y|
2
.
By the triangular inequality,
|y
r
|
2
= |y y
p
|
2
|y|
2
|y
p
|
2
,
and therefore,
|y
r
|
2

_
1

|I|+|J|
1
max(|I|,|J|)
_
|y|
2
.
Finally, observing that
|y
r
|
2
2
+|y
p
|
2
2
= |y|
2
2
and |y
p
|
2
2
0, we show that
_
1

|I|+|J|
1
max(|I|,|J|)
_
|y|
2
|y
r
|
2
|y|
2
.
C. Proof of Theorem 5
The rst step consists in proving Inequality (11), which
reads as
|

T
0 y|
2
(1
K
) |x|
2
.
By assumption, [T[ K, so that
|

T
y|
2
= |

T
x|
2
D2
(1
K
) |x|
2
.
According to the denition of T
0
,
|

T
0 y|
2
= max
|I|K

iI
[v
i
, y)[
2
|

T
y|
2
(1
K
) |x|
2
.
The second step is to partition the estimate of the support
set T
0
into two subsets: the set T
0

T, containing the indices


14
in the correct support set, and T
0
T, the set of incorrectly
selected indices. Then
|

T
0y|
2

_
_
_

T
0
T
T
y
_
_
_
2
+
_
_

T
0
T
y
_
_
2

_
_
_

T
0
T
T
y
_
_
_
2
+
2K
|x|
2
, (22)
where the last inequality follows from the near-orthogonality
property of Lemma 1.
Furthermore,
_
_
_

T
0
T
T
y
_
_
_
2

_
_
_

T
0
T
T

T
0
T
T
x
T
0
T
T
_
_
_
2
+
_
_
_

T
0
T
T

TT
0x
TT
0
_
_
_
2
(1 +
K
)
_
_
x
T
0
T
T
_
_
2
+
2K
|x|
2
. (23)
Combining the two inequalities (22) and (23), one can show
that
|

T
0y|
2
(1 +
K
)
_
_
x
T
0
T
T
_
_
2
+ 2
2K
|x|
2
.
By invoking Inequality (11) it follows that
(1
K
) |x|
2
(1 +
K
)
_
_
x
T
0
T
T
_
_
2
+ 2
2K
|x|
2
.
Hence,
_
_
x
T
0
T
T
_
_
2

1
K
2
2K
1 +
K
|x|
2
,
which can be further relaxed to
_
_
x
T
0
T
T
_
_
2
L1

1 3
2K
1 +
2K
|x|
2
To complete the proof, we observe that
|x
TT
0 |
2
=
_
|x|
2
2

_
_
x
T
0
T
T
_
_
2
2

_
(1 +
2K
)
2
(1 3
2K
)
2
1 +
2K
|x|
2

_
8
2K
8
2
2K
1 +
2K
|x|
2
.
D. Proof of Theorem 3
In this section we show that the CM process allows for
capturing a signicant part of the residual signal power, that
is,
_
_
x
T

T

_
_
2
c
1
|x
TT
1 |
2
for some constant c
1
. Note that in each iteration, the CM
operation is performed on the vector y
1
r
. The proof heavily
relies on the inherent structure of y
1
r
. Specically, in the
following two-step roadmap of the proof, we rst show how
the measurement residue y
1
r
is related to the signal residue
x
TT
1 , and then employ this relationship to nd the energy
captured by the CM process.
1) One can write y
1
r
as
y
1
r
=
T
S
T
1x
1
r
= [
TT
1
T
1]
_
x
TT
1
x
p,T
1
_
(24)
for some x
1
r
R
[T
S
T
1
[
and x
p,T
1 R
[T
1
[
.
Furthermore,
_
_
x
p,T
1
_
_
2


2K
1
2K
|x
TT
1 |
2
. (25)
2) It holds that
_
_
x
T

T
l
_
_
2

2
3K
(1
3K
)
2
|x
TT
1 |
2
.
Proof: The claims can be established as below.
1) It is clear that
y
1
r
= resid (y,
T
1 )
(a)
= resid(
TT
1 x
TT
1 ,
T
1)
+ resid
_

T
T
T
1x
T
T
T
1,
T
1
_
(b)
= resid(
TT
1 x
TT
1 ,
T
1) +0
D3
=
TT
1 x
TT
1
proj (
TT
1 x
TT
1 ,
T
1)
(c)
=
TT
1 x
TT
1 +
T
1 x
p,T
1
= [
TT
1,
T
1]
_
x
TT
1
x
p,T
1
_
,
where (a) holds because y =
TT
1x
TT
1 +

T
T
T
1x
T
T
T
1 and resid(,
T
1) is a
linear function, (b) follows from the fact that

T
T
T
1x
T
T
T
1 span (
T
1), and (c) holds by
dening
x
p,T
1 = (

T
1
T
1 )
1

T
1 (
TT
1x
TT
1 ) .
As a consequence of the RIP,
_
_
x
p,T
1
_
_
2
=
_
_
_(

T
1
T
1)
1

T
1 (
TT
1 x
TT
1 )
_
_
_
2

1
1
K
|

T
1 (
TT
1x
TT
1 )|
2


2K
1
K
|x
TT
1 |
2


2K
1
2K
|x
TT
1 |
2
.
This proves the stated claim.
2) For notational convenience, we rst dene
T

:=

T

T
1
,
which is the set of indices captured by the CM
process. By the denition of T

, we have
_
_

T
y
1
r
_
_
2

_
_

T
y
1
r
_
_
2

_
_

TT
1 y
1
r
_
_
2
.
(26)
Removing the common columns between
T
and

TT
1 and noting that T

T
1
= , we arrive
at
_
_

TT
y
1
r
_
_
2

_
_

TT
1
T
y
1
r
_
_
2
=
_
_
_

y
1
r
_
_
_
2
. (27)
15
An upper bound on the left hand side of (27) is given
by
_
_

TT
y
1
r
_
_
2
=
_
_

TT

T
S
T
1x
1
r
_
_
2
L1

|T
S
T
1
S
T|
_
_
x
1
r
_
_
2

3K
_
_
x
1
r
_
_
2
. (28)
A lower bound on the right hand side of (27) can be
derived as
_
_
_

y
1
r
_
_
_
2

_
_
_

_
x
1
r
_
T

_
_
_
2

_
_
_

(T
S
T
1
)(T

T
1
)

_
x
1
r
_
(T
S
T
1
)(T

T
1
)
_
_
_
2
L1
(1
K
)
_
_
_
_
x
1
r
_
T

_
_
_
2

3K
_
_
x
1
r
_
_
2
. (29)
Substitute (29) and (28) into (27). We get
_
_
_
_
x
1
r
_
T

_
_
_
2

2
3K
1
K
_
_
x
1
r
_
_
2

2
3K
1
3K
_
_
x
1
r
_
_
2
.
(30)
Note the explicit form of x
1
r
in (24). One has
_
x
1
r
_
TT
1
= x
TT
1
_
x
1
r
_
T

T
l
= x
T

T
l
(31)
and
_
_
x
1
r
_
_
2
|x
TT
1 |
2
+
_
_
x
p,T
1
_
_
2
(25)

_
1 +

2K
1
2K
_
|x
TT
1 |
2
L1

1
1
3K
|x
TT
1 |
2
. (32)
From (31) and (32), it is clear that
_
_
x
T

T
l
_
_
2

2
3K
(1
3K
)
2
|x
TT
1 |
2
,
which completes the proof.
E. Proof of Theorem 4
As outlined in Section IV-B, let
x
p
=

y
be the projection coefcient vector, and let
= x
p
x

be the smear vector. We shall show that the smear mag-


nitude ||
2
is small, and then from this fact deduce that
|x
TT
|
2
c
_
_
x
T

_
_
for some positive constant c. We
proceed with establishing the validity of the following three
claims.
1) It can be shown that
||
2


3K
1
3K
_
_
x
T

_
_
2
.
2) Let T :=

T

. One has
_
_
x
T
T
T
_
_
2
2 ||
2
.
This result implies that the energy concentrated in the
erroneously removed signal components is small.
3) Finally,
|x
TT
|
2

1 +
3K
1
3K
_
_
x
T

_
_
2
.
Proof: The proofs can be summarized as follows.
1) To prove the rst claim, note that
x
p
=

y =

T
x
T
=

T
T

T
x
T
T

T
+

T
x
T

T
T

T

T
_
_
x
T
T

T

0
_
+

T
x
T

x
T

T

= x

x
T

, (33)
where the last equality follows from the denition of

. Recall the denition of , based on which we have


||
2
= |x
p
x

|
2
(33)
=
_
_
_
_

_
1

x
T

T

_
_
_
_
2
L1


3K
1
3K
_
_
x
T

_
_
2
. (34)
2) Consider an arbitrary index set T

of cardinality
K that is disjoint from T,
T

T = . (35)
Such a set T

exists because

K. Since
(x
p
)
T

= (x

T
)
T

+
T
= 0 +
T
,
we have
_
_
(x
p
)
T

_
_
2
||
2
.
On the other hand, by Step 4) of the subspace algorithm,
T is chosen to contain the K smallest projection
coefcients (in magnitude). It therefore holds that
_
_
(x
p
)
T
_
_
2

_
_
(x
p
)
T

_
_
||
2
. (36)
Next, we decompose the vector (x
p
)
T
into a signal
part and a smear part. Then
_
_
(x
p
)
T
_
_
2
= |x
T
+
T
|
2
|x
T
|
2
|
T
|
2
,
which is equivalent to
|x
T
|
2

_
_
(x
p
)
T
_
_
2
+|
T
|
2

_
_
(x
p
)
T
_
_
2
+||
2
. (37)
Combining (36) and (37) and noting that x
T
=
x
T
T
T
(x is supported on T, i.e., x
T
c = 0), we have
_
_
x
T
T
T
_
_
2
2 ||
2
. (38)
16
This completes the proof of the claimed result.
3) This claim is proved by combining (34) and (38). Since
x
TT
=
_
x

T
T
T
, x

, one has
|x
TT
|
2

_
_
x
T
T
T
_
_
2
+
_
_
x
T

_
_
2
(38)
2 ||
2
+
_
_
x
T

_
_
2
(34)

_
2
3K
1
3K
+ 1
_
_
_
x
T

T

_
_
2
=
1 +
3K
1
3K
_
_
x
T

T

_
_
2
.
This proves Theorem 4.
F. Proof of Theorem 8
Without loss of generality, assume that
[x
1
[ [x
2
[ [x
K
[ > 0.
The following iterative algorithm is employed to create a
partition of the support set T that will establish the correctness
of the claimed result.
Algorithm 2 Partitioning of the support set T
Initialization:
Let T
1
= 1, i = 1 and j = 1.
Iteration Steps:
If i = K, quit the iterations; otherwise, continue.
If
1
2
[x
i
[
_
_
x
{i+1, ,K}
_
_
2
, (39)
set T
j
= T
j

i + 1; otherwise, it must hold that


1
2
[x
i
[ >
_
_
x
{i+1, ,K}
_
_
2
, (40)
and we therefore set j = j + 1 and T
j
= i + 1.
Increment the index i, i = i + 1. Continue with a new
iteration.
Suppose that after the iterative partition, we have
T = T
1
_
T
2
_

_
T
J
,
where J K is the number of the subsets in the partition.
Let s
j
= [T
j
[, j = 1, , J. It is clear that
J

j=1
s
j
= K.
Then Theorem 8 is proved by invoking the following lemma.
Lemma 6:
1) For a given index j, let [T
j
[ = s, and let
T
j
= i, i + 1, , i + s 1 .
Then,
[x
i+s1k
[ 3
k
[x
i+s1
[ , for all 0 k s 1,
(41)
and therefore
[x
i+s1
[
2
3
s
_
_
x
{i, ,K}
_
_
2
. (42)
2) Let
n
j
=
_
s
j
log 3 log 2 + 1
log c
K
_
, (43)
where denotes the oor function. Then, for any 1
j
0
J, after
=
j0

j=1
n
j
iterations, the SP algorithm has the property that
j0
_
j=1
T
j
T

. (44)
More specically, after
n =
J

j=1
n
j

1.5 K
log c
K
(45)
iterations, the SP algorithm guarantees that T T
n
.
Proof: Both parts of this lemma are proved by mathemat-
ical induction as follows.
1) By the construction of T
j
,
1
2
[x
i+s1
[
(40)
>
_
_
x
{i+s, ,K}
_
_
2
. (46)
On the other hand,
1
2
[x
i+s2
[
(39)

_
_
x
{i+s1, ,K}
_
_
2

_
_
x
{i+s, ,K}
_
_
2
+[x
i+s1
[
(46)

3
2
[x
i+s1
[ .
It follows that
[x
i+s2
[ 3 [x
i+s1
[ ,
or equivalently, the desired inequality (41) holds for k =
1. To use mathematical induction, suppose that for an
index 1 < k s 1,
[x
i+s1
[ 3

[x
i+s1
[ for all 1 k 1. (47)
Then,
1
2
[x
i+s1k
[
(39)

_
_
x
{i+sk, ,K}
_
_
[x
i+sk
[ + +[x
i+s1
[
+
_
_
x
{i+s, ,K}
_
_
2
(47)

_
3
k1
+ + 1 +
1
2
_
[x
i+s1
[

3
k
2
[x
i+s1
[ .
17
This proves Equation (41) of the lemma. Inequality (42)
then follows from the observation that
_
_
x
{i, ,K}
_
_
2
[x
i
[ + +[x
i+s1
[ +
_
_
x
{i+s, ,K}
_
_
2
(41)

_
3
s1
+ + 1 +
1
2
_
[x
i+s1
[

3
s
2
[x
i+s1
[ .
2) From (43), it is clear that for 1 j J,
c
nj
K
<
2
3
sj
.
According to Theorem 2, after n
1
iterations,
|x
TT
n
1 |
2
<
2
3
s1
|x|
2
.
On the other hand, for any i T
1
, it follows from the
rst part of this lemma that
[x
i
[ [x
s1
[
2
3
s1
|x| .
Therefore,
T
1
T
n1
.
Now, suppose that for a given j
0
J, after
1
=

j01
j=1
n
j
iterations, we have
j01
_
j=1
T
j
T
1
.
Let T
0
=

j01
j=1
T
j
. Then
_
_
x
TT

1
_
_
2
|x
TT0
|
2
.
Denote the smallest coordinate in T
j0
by i, and the
largest coordinate in T
j0
by k. Then
[x
k
[
2
3
sj
0
_
_
x
{i, ,K}
_
_
2
=
2
3
sj
0
|x
TT0
|
2
.
After n
j0
more iterations, i.e., after a total number of
iterations equal to =
1
+ n
j0
, we obtain
|x
TT
|
2
<
2
3
sj
0
_
_
x
TT

1
_
_
2

2
3
sj
0
|x
TT0
|
2
[x
k
[ .
As a result, we conclude that
T
j0
T

is valid after =

j0
j=1
n
j
iterations, which proves
inequality (44). Now let the subspace algorithm run for
n =

J
j=1
n
j
iterations. Then, T T
n
. Finally, note
that
n =
J

j=1
n
j

J

j=1
s
i
log 3 log 2 + 1
log c
K

K log 3 + J (1 log 2)
log c
K

K (log 3 + 1 log 2)
log c
K

K 1.5
log c
K
.
This completes the proof of the last claim (45).
G. Proof of Lemma 3
The claim in the lemma is established through the following
chain of inequalities:
|x x|
2

_
_
_x

T
y
_
_
_
2
+
_
_
x
T

T
_
_
2
=
_
_
_x

T
(
T
x
T
+e)
_
_
_
2
+
_
_
x
T

T
_
_
2

_
_
_x

T
(
T
x
T
)
_
_
_
2
+
_
_
_

T
e
_
_
_
2
+
_
_
x
T

T
_
_
2

_
_
_x

T
_

T
T

T
x
T
T

T
__
_
_
2
+
_
_
_

T
x
T

T
_
_
_
2
+

1 +
K
1
K
|e| +
_
_
x
T

T
_
_
2
(a)
0 +
_

2K
1
K
+ 1
_
_
_
x
T

T
_
_
+

1 +
K
1
K
|e|
2

1
1
2K
_
_
x
T

T
_
_
2
+

1 +
K
1
K
|e|
2
,
where (a) is a consequence of the fact that
x

T
_

T
T

T
x
T
T

T
_
= 0.
By relaxing the upper bound in terms of replacing
2K
by

3K
, we obtain
|x x|
2

1
1
3K
_
_
x
T

T
_
_
2
+
1 +
3K
1
3K
|e|
2
.
This completes the proof of the lemma.
H. Proof of Inequality (15)
The proof is similar to the proof given in Appendix D. We
start with observing that
y
1
r
= resid(y,
T
1 )
=
T
S
T
1 x
r
+ resid (e,
T
1) , (48)
and
|resid (e,
T
1)|
2
|e|
2
. (49)
Again, let T

=

T

T
1
. Then by the denition of T

,
_
_

T
y
1
r
_
_
2

_
_

T
y
1
r
_
_
2

_
_

T
S
T
1 x
1
r
_
_
2
|

T
resid(e,
T
1)|
2
(49)

_
_

T
S
T
1 x
1
r
_
_
2

_
1 +
K
|e|
2
. (50)
The left hand side of (50) is upper bounded by
_
_

T
y
1
r
_
_
2

_
_

T
S
T
1x
1
r
_
_
2
+
_
_

T
resid (e,
T
1)
_
_
2

_
_

T
S
T
1x
1
r
_
_
2
+
_
1 +
K
|e|
2
.
(51)
18
Combine (50) and (51). Then
_
_

T
S
T
1 x
1
r
_
_
2
+ 2
_
1 +
K
|e|
2

_
_

T
S
T
1 x
1
r
_
_
2
. (52)
Comparing the above inequality (52) with its analogue for the
noiseless case, (26), one can see that the only difference is the
2

1 +
K
|e|
2
term on the left hand side of (52). Following
the same steps as used in the derivations leading from (26) to
(29), one can show that
2
3K
_
_
x
1
r
_
_
2
+ 2
_
1 +
K
|e|
2
(1
K
)
_
_
x
T

_
_
2
.
Applying (32), we get
_
_
x
T

_
_
2

2
3K
(1
3K
)
2
|x
TT
1 |
2
+
2

1 +
K
1
K
|e|
2
,
which proves the inequality (15).
I. Proof of Inequality (16)
This proof is similar to that of Theorem 4. When there are
measurement perturbations, one has
x
p
=

y =

(
T
x
T
+ e) .
Then the smear energy is upper bounded by
||
2

_
_
_

T
x
T
x

_
_
_
2
+
_
_
_

e
_
_
_
2

_
_
_

T
x
T
x

_
_
_
2
+
1

1
2K
|e|
2
,
where the last inequality holds because the largest eigenvalue
of

satises

max
_

_
=
max
_
_

_
1

1
2K
.
Invoking the same technique as used for deriving (34), we
have
||
2


3K
1
3K
_
_
x
T

_
_
2
+
1

1
2K
|e|
2
. (53)
It is straightforward to verify that (38) still holds, which now
reads as
_
_
x
T
T
T
_
_
2
2 ||
2
. (54)
Combining (53) and (54), one has
|x
TT
|
2

_
_
x
T
T
T
_
_
2
+
_
_
x
T

_
_
2

1 +
3K
1
3K
_
_
x
T

_
_
2
+
2

1
2K
|e|
2
,
which proves the claimed result.
REFERENCES
[1] D. Donoho, Compressed sensing, IEEE Trans. Inform. Theory, vol. 52,
no. 4, pp. 12891306, 2006.
[2] R. Venkataramani and Y. Bresler, Sub-nyquist sampling of multiband
signals: perfect reconstruction and bounds on aliasing error, in IEEE
International Conference on Acoustics, Speech and Signal Processing
(ICASSP), vol. 3, 12-15 May Seattle, WA, 1998, pp. 16331636 vol.3.
[3] E. Cands and T. Tao, Decoding by linear programming, Information
Theory, IEEE Transactions on, vol. 51, no. 12, pp. 42034215, 2005.
[4] E. Cands, J. Romberg, and T. Tao, Robust uncertainty principles: exact
signal reconstruction from highly incomplete frequency information,
IEEE Trans. Inform. Theory, vol. 52, no. 2, pp. 489509, 2006.
[5] E. Cands, R. Mark, T. Tao, and R. Vershynin, Error correction via
linear programming, in IEEE Symposium on Foundations of Computer
Science (FOCS), 2005, pp. 295 308.
[6] G. Cormode and S. Muthukrishnan, Combinatorial algorithms for
compressed sensing, in Proceedings of the 40th Annual Conference
on Information Sciences and Systems, 2006.
[7] S. Sarvotham, D. Baron, and R. Baraniuk, Compressed sensing recon-
struction via belief propagation. Preprint, 2006.
[8] J. A. Tropp, Greed is good: algorithmic results for sparse approxi-
mation, IEEE Trans. Inform. Theory, vol. 50, no. 10, pp. 22312242,
2004.
[9] R. S. Varga, Gergorin and His Circles. Berlin: Springer-Verlag, 2004.
[10] D. Needell and R. Vershynin, Uniform uncertainty principle and signal
recovery via regularized orthogonal matching pursuit, preprint, 2007.
[11] Y. Han, , C. Hartmann, and C.-C. Chen, Efcient priority-rst search
maximum-likelihood soft-decision decoding of linear block codes,
IEEE Trans. on Information Theory, vol. 51, pp. 15141523, 1993.
[12] J. Tropp, D. Needell, and R. Vershynin, Iterative signal recovery from
incomplete and inaccurate measurements, in Information Theory and
Applications Workshop, Jan. 27 - Feb. 1 San Deigo, CA, 2008.
[13] E. J. Cands and T. Tao, Near-optimal signal recovery from random
projections: Universal encoding strategies? IEEE Trans. Inform. The-
ory, vol. 52, no. 12, pp. 54065425, 2006.
[14] I. E. Nesterov, A. Nemirovskii, and Y. Nesterov, Interior-Point Polyno-
mial Algorithms in Convex Programming. SIAM, 1994.
[15] D. L. Donoho and Y. Tsaig, Fast solution of
1
-norm minimization
problems when the solution may be sparse, Preprint. [Online].
Available: http://www.dsp.ece.rice.edu/cs/FastL1.pdf
[16] S.-J. Kim, K. Koh, M. Lustig, S. Boyd, and D. Gorinevsky, A method
for large-scale
1
-regularized least squares, IEEE Journal on Selected
Topics in Signal Processing, vol. 1, no. 4, pp. 606617, December 2007.
[17] P. Tseng and S. Yun, A coordinate gradient descent method for
nonsmooth separable minimization, Mathematical Programming, vol.
117, no. 1-2, pp. 387423, August, 2007.
[18] E. J. Cands, The restricted isometry property and its implications for
compressed sensing, Compte Rendus de lAcademie des Sciences, vol.
Serie I, no. 346, pp. 589592, 2008.
[19] D. Needell and J. A. Tropp, CoSaMP: Iterative signal recovery from
incomplete and inaccurate samples, Appl. Comp. Harmonic Anal.,
submitted, 2008.
[20] . Bjrck, Numerical Methods for Least Squares Problems. SIAM,
1996.
[21] A. Gilbert, M. Strauss, J. Tropp, and R. Vershynin, One sketch for
all: Fast algorithms for compressed sensing, in Symp. on Theory of
Computing (STOC), San Diego, CA, June, 2007.
[22] D. Needell and R. Vershynin, Signal recovery from incomplete and
inaccurate measurements via regularized orthogonal matching pursuit,
preprint, 2008.
Wei Dai (S01-M08) received his Ph.D. and M.S. degree in Electrical
and Computer Engineering from the University of Colorado at Boulder in
2007 and 2004 respectively. He is currently a Postdoctoral Researcher at the
Department of Electrical and Computer Engineering, University of Illinois
at Urbana-Champaign. His research interests include compressive sensing,
bioinformatics, communications theory, information theory and random matrix
theory.
19
Olgica Milenkovic (S01M03) received the M.S. degree in mathematics and
the Ph.D. degree in electrical engineering from the University of Michigan,
Ann Arbor, in 2002. She is currently with the University of Illinois, Urbana-
Champaign. Her research interests include the theory of algorithms, bioinfor-
matics, constrained coding, discrete mathematics, error-control coding, and
probability theory.

You might also like