0% found this document useful (0 votes)

3 views

04_sparsePCA

Sparse Principal Component Analysis (SPCA) is introduced as a method to enhance the interpretability of principal components by imposing sparsity on the loadings using lasso (elastic net) techniques. This approach reformulates PCA as a regression optimization problem, allowing for efficient computation and application to various data types, including gene expression arrays. The document discusses the methodology, algorithms, and applications of SPCA, demonstrating its effectiveness through real and simulated data examples.

Uploaded by

Mohamed GOU-ALI

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

04_sparsePCA

Uploaded by

Mohamed GOU-ALI

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

Sparse Principal Component Analysis

Hui ZOU , Trevor HASTIE , and Robert TIBSHIRANI

Principal component analysis (PCA) is widely used in data processing and dimension-
ality reduction. However, PCA suffers from the fact that each principal component is a linear
combination of all the original variables, thus it is often difficult to interpret the results. We
introduce a new method called sparse principal component analysis (SPCA) using the lasso
(elastic net) to produce modified principal components with sparse loadings. We first show
that PCA can be formulated as a regression-type optimization problem; sparse loadings are
then obtained by imposing the lasso (elastic net) constraint on the regression coefficients.
Efficient algorithms are proposed to fit our SPCA models for both regular multivariate data
and gene expression arrays. We also give a new formula to compute the total variance of
modified principal components. As illustrations, SPCA is applied to real and simulated data
with encouraging results.

Key Words: Arrays; Gene expression; Lasso/elastic net; Multivariate analysis; Singular
value decomposition; Thresholding.

1. INTRODUCTION
Principal component analysis (PCA) (Jolliffe 1986) is a popular data-processing and
dimension-reduction technique, with numerous applications in engineering, biology, and so-
cial science. Some interesting examples include handwritten zip code classification (Hastie,
Tibshirani, and Friedman 2001) and human face recognition (Hancock, Burton, and Bruce
1996). Recently PCA has been used in gene expression data analysis (Alter, Brown, and
Botstein 2000). Hastie et al. (2000) proposed the so-called gene shaving techniques using
PCA to cluster highly variable and coherent genes in microarray datasets.
PCA seeks the linear combinations of the original variables such that the derived vari-
ables capture maximal variance. PCA can be computed via the singular value decomposition
(SVD) of the data matrix. In detail, let the data X be a n × p matrix, where n and p are the

Hui Zou is Assistant Professor, School of Statistics, University of Minnesota, Minneapolis, MN 55455 (E-mail:
hzou@stat.umn.edu). Trevor Hastie is Professor, Department of Statistics, Stanford University, Stanford, CA
94305 (E-mail: hastie@stat.stanford.edu). Robert Tibshirani is Professor, Department of Health Research Policy,
Stanford University, Stanford, CA 94305 (E-mail: tibs@stat.stanford.edu).

2006
c American Statistical Association, Institute of Mathematical Statistics,
and Interface Foundation of North America
Journal of Computational and Graphical Statistics, Volume 15, Number 2, Pages 265–286
DOI: 10.1198/106186006X113430

265
266 H. ZOU, T. HASTIE, AND R. TIBSHIRANI

number of observations and the number of variables, respectively. Without loss of generality,
assume the column means of X are all 0. Let the SVD of X be

X = UDVT . (1.1)

Z = UD are the principal components (PCs), and the columns of V are the corresponding
loadings of the principal components. The sample variance of the ith PC is D2ii /n. In gene
expression data the standardized PCs U are called the eigen-arrays and V are the eigen-
genes (Alter, Brown, and Botstein 2000). Usually the first q (q min(n, p)) PCs are chosen
to represent the data, thus a great dimensionality reduction is achieved.
The success of PCA is due to the following two important optimal properties:
1. principal components sequentially capture the maximum variability among the
columns of X, thus guaranteeing minimal information loss;
2. principal components are uncorrelated, so we can talk about one principal compo-
nent without referring to others.
However, PCA also has an obvious drawback, that is, each PC is a linear combination of all
p variables and the loadings are typically nonzero. This makes it often difficult to interpret
the derived PCs. Rotation techniques are commonly used to help practitioners to interpret
principal components (Jolliffe 1995). Vines (2000) considered simple principal components
by restricting the loadings to take values from a small set of allowable integers such as 0,
1, and −1.
We feel it is desirable not only to achieve the dimensionality reduction but also to reduce
the number of explicitly used variables. An ad hoc way to achieve this is to artificially set the
loadings with absolute values smaller than a threshold to zero. This informal thresholding
approach is frequently used in practice, but can be potentially misleading in various respects
(Cadima and Jolliffe 1995). McCabe (1984) presented an alternative to PCA which found a
subset of principal variables. Jolliffe, Trendafilov, and Uddin (2003) introduced SCoTLASS
to get modified principal components with possible zero loadings.
The same interpretation issues arise in multiple linear regression, where the response
is predicted by a linear combination of the predictors. Interpretable models are obtained via
variable selection. The lasso (Tibshirani 1996) is a promising variable selection technique,
simultaneously producing accurate and sparse models. Zou and Hastie (2005) proposed
the elastic net, a generalization of the lasso, which has some advantages. In this article we
introduce a new approach for estimating PCs with sparse loadings, which we call sparse
principal component analysis (SPCA). SPCA is built on the fact that PCA can be written as
a regression-type optimization problem, with a quadratic penalty; the lasso penalty (via the
elastic net) can then be directly integrated into the regression criterion, leading to a modified
PCA with sparse loadings.
In the next section we briefly review the lasso and the elastic net. The methodological
details of SPCA are presented in Section 3. We present an efficient algorithm for fitting
the SPCA model. We also derive an appropriate expression for representing the variance
explained by modified principal components. In Section 4 we consider a special case of the
SPCA algorithm for handling gene expression arrays efficiently. The proposed methodology
SPARSE PRINCIPAL COMPONENT ANALYSIS 267

is illustrated by using real data and simulation examples in Section 5. Discussions are in
Section 6. The article ends with an Appendix summarizing technical details.

2. THE LASSO AND THE ELASTIC NET

Consider the linear regression model with n observations and p predictors. Let Y =
(y1 , . . . , yn )T be the response vector and X = [X1 , . . . , Xp ], j = 1, . . . , p the predictors,
where Xj = (x1j , . . . , xnj )T . After a location transformation we can assume all the Xj
and Y are centered.
The lasso is a penalized least squares method, imposing a constraint on the L1 norm of
the regression coefficients. Thus, the lasso estimates β̂lasso are obtained by minimizing the
lasso criterion
p
p

β̂lasso = arg min Y − Xj βj 2 + λ |βj | , (2.1)
β j=1 j=1

where λ is non-negative. The lasso was originally solved by quadratic programming (Tibshi-
rani 1996). Efron, Hastie, Johnstone, and Tibshirani (2004) showed that the lasso estimates
β̂ are piecewise linear as a function of λ, and proposed an algorithm called LARS to effi-
ciently solve the entire lasso solution path in the same order of computations as a single least
squares fit. The piecewise linearity of the lasso solution path was first proved by Osborne,
Presnell, Turlach (2000) where a different algorithm was proposed to solve the entire lasso
solution path.
The lasso continuously shrinks the coefficients toward zero, and achieves its prediction
accuracy via the bias variance trade-off. Due to the nature of the L1 penalty, some coefficients
will be shrunk to exact zero if λ1 is large enough. Therefore the lasso simultaneously
produces both an accurate and sparse model, which makes it a favorable variable selection
method. However, the lasso has several limitations as pointed out by Zou and Hastie (2005).
The most relevant one to this work is that the number of variables selected by the lasso is
limited by the number of observations. For example, if applied to microarray data where
there are thousands of predictors (genes) (p > 1000) with less than 100 samples (n < 100),
the lasso can only select at most n genes, which is clearly unsatisfactory.
The elastic net (Zou and Hastie 2005) generalizes the lasso to overcome these draw-
backs, while enjoying its other favorable properties. For any non-negative λ1 and λ2 , the
elastic net estimates β̂en are given as follows
 
 p p p 
2
β̂en = (1 + λ2 ) arg min Y − Xj βj 2 + λ2 |βj | + λ1 |βj | . (2.2)
 β 
j=1 j=1 j=1

The elastic net penalty is a convex combination of the ridge and lasso penalties. Obviously,
the lasso is a special case of the elastic net when λ2 = 0. Given a fixed λ2 , the LARS-
EN algorithm (Zou and Hastie 2005) efficiently solves the elastic net problem for all λ1
with the computational cost of a single least squares fit. When p > n, we choose some
268 H. ZOU, T. HASTIE, AND R. TIBSHIRANI

λ2 > 0. Then the elastic net can potentially include all variables in the fitted model, so this
particular limitation of the lasso is removed. Zou and Hastie (2005) compared the elastic
net with the lasso and discussed the application of the elastic net as a gene selection method
in microarray analysis.

3. MOTIVATION AND DETAILS OF SPCA

In both lasso and elastic net, the sparse coefficients are a direct consequence of the
L1 penalty, and do not depend on the squared error loss function. Jolliffe, Trendafilov, and
Uddin (2003) proposed SCoTLASS, an interesting procedure that obtains sparse loadings
by directly imposing an L1 constraint on PCA. SCoTLASS successively maximizes the
variance

aTk (XT X)ak , (3.1)

subject to

aTk ak = 1 and (for k ≥ 2) aTh ak = 0, h < k; (3.2)

and the extra constraints

p

|akj | ≤ t (3.3)
j=1

for some tuning parameter t. Although sufficiently small t yields some exact zero loadings,
there is not much guidance with SCoTLASS in choosing an appropriate value for t. One
could try several t values, but the high computational cost of SCoTLASS makes this an im-
practical solution. This high computational cost is probably due to the fact that SCoTLASS
is not a convex optimization problem. Moreover, the examples in Jolliffe, Trendafilov, and
Uddin (2003) showed that the loadings obtained by SCoTLASS are not sparse enough when
one requires a high percentage of explained variance.
We consider a different approach to modifying PCA. We first show how PCA can be
recast exactly in terms of a (ridge) regression problem. We then introduce the lasso penalty
by changing this ridge regression to an elastic-net regression.

3.1 DIRECT SPARSE APPROXIMATIONS

We first discuss a simple regression approach to PCA. Observe that each PC is a linear
combination of the p variables, thus its loadings can be recovered by regressing the PC on
the p variables.
Theorem 1. For each i, denote by Zi = Ui Dii the ith principal component. Consider
a positive λ and the ridge estimates β̂ridge given by

β̂ridge = arg min Zi − Xβ2 + λβ2 . (3.4)

β
SPARSE PRINCIPAL COMPONENT ANALYSIS 269

β̂
Let v̂ = β̂ridge , then v̂ = Vi .
ridge
The theme of this simple theorem is to show the connection between PCA and a
regression method. Regressing PCs on variables was discussed in Cadima and Jolliffe
(1995), where they focused on approximating PCs by a subset of k variables. We extend it
to a more general case of ridge regression in order to handle all kinds of data, especially
gene expression data. Obviously, when n > p and X is a full rank matrix, the theorem does
not require a positive λ. Note that if p > n and λ = 0, ordinary multiple regression has
no unique solution that is exactly Vi . The same happens when n > p and X is not a full
rank matrix. However, PCA always gives a unique solution in all situations. As shown in
Theorem 1, this indeterminancy is eliminated by the positive ridge penalty (λβ2 ). Note
that after normalization the coefficients are independent of λ, therefore the ridge penalty is
not used to penalize the regression coefficients but to ensure the reconstruction of principal
components.
Now let us add the L1 penalty to (3.4) and consider the following optimization problem

β̂ = arg min Zi − Xβ2 + λβ2 + λ1 β1 , (3.5)

p β̂
where β1 = j=1 |βj | is the 1-norm of β. We call Vi = β̂
an approximation to Vi , and
XVi the ith approximated principal component. Zou and Hastie (2005) called (3.5) naive
elastic net which differs from the elastic net by a scaling factor (1 + λ). Since we are using
the normalized fitted coefficients, the scaling factor does not affect Vi . Clearly, large enough
λ1 gives a sparse β̂, hence a sparse Vi . Given a fixed λ, (3.5) is efficiently solved for all λ1
by using the LARS-EN algorithm (Zou and Hastie 2005). Thus, we can flexibly choose a
sparse approximation to the ith principal component.

3.2 SPARSE PRINCIPAL COMPONENTS BASED ON THE SPCA CRITERION

Theorem 1 depends on the results of PCA, so it is not a genuine alternative. However,
it can be used in a two-stage exploratory analysis: first perform PCA, then use (3.5) to find
suitable sparse approximations.
We now present a “self-contained” regression-type criterion to derive PCs. Let xi denote
the ith row vector of the matrix X. We first consider the leading principal component.
Theorem 2. For any λ > 0, let

n

(α̂, β̂) = arg min xi − αβ T xi 2 + λβ2 (3.6)
α,β i=1
subject to α2 = 1.

Then β̂ ∝ V1 .
The next theorem extends Theorem 2 to derive the whole sequence of PCs.
Theorem 3. Suppose we are considering the first k principal components. Let Ap×k =
270 H. ZOU, T. HASTIE, AND R. TIBSHIRANI

[α1 , . . . , αk ] and Bp×k = [β1 , . . . , βk ]. For any λ > 0, let

n
k

(A, B) = arg min xi − ABT xi 2 + λ βj 2 (3.7)
A, B i=1 j=1

subject to AT A = Ik×k .
Then β̂j ∝ Vj for j = 1, 2, . . . , k.
Theorems 2 and 3 effectively transform the PCA problem to a regression-type problem.
n
The critical element is the objective function i=1 xi − ABT xi 2 . If we restrict B = A,
then
n n
xi − ABT xi 2 = xi − AAT xi 2 ,
i=1 i=1
whose minimizer under the orthonormal constraint on A is exactly the first k loading vectors
of ordinary PCA. This formulation arises in the “closest approximating linear manifold”
derivation of PCA (e.g., Hastie, Tibshirani, and Friedman 2001). Theorem 3 shows that we
can still have exact PCA while relaxing the restriction B = A and adding the ridge penalty
term. As can be seen later, these generalizations enable us to flexibly modify PCA.
The proofs of Theorems 2 and 3 are given in the Appendix; here we give an intuitive
explanation. Note that
n

xi − ABT xi 2 = X − XBAT 2 . (3.8)
i=1

Since A is orthonomal, let A⊥ be any orthonormal matrix such that [A; A⊥ ] is p × p

orthonormal. Then we have
X − XBAT 2 = XA⊥ 2 + XA − XB2 (3.9)
k

= XA⊥ 2 + Xαj − Xβj 2 . (3.10)
j=1

Suppose A is given, then the optimal B minimizing (3.7) should minimize

k

arg min {Xαj − Xβj 2 + λβj 2 } (3.11)
B j=1

which is equivalent to k independent ridge regression problems. In particular, if A corre-

sponds to the ordinary PCs, that is, A = V, then by Theorem 1, we know that B should be
proportional to V. Actually, the above view points out an effective algorithm for solving
(3.7), which is revisited in the next section.
We carry on the connection between PCA and regression, and use the lasso approach
to produce sparse loadings (“regression coefficients”). For that purpose, we add the lasso
penalty into the criterion (3.7) and consider the following optimization problem
n
k
k

(A, B) = arg min xi − ABT xi 2 + λ βj 2 + λ1,j βj 1
A, B i=1 j=1 j=1
(3.12)
T
subject to A A = Ik×k .
SPARSE PRINCIPAL COMPONENT ANALYSIS 271

Whereas the same λ is used for all k components, different λ1,j ’s are allowed for penalizing
the loadings of different principal components. Again, if p > n, a positive λ is required in
order to get exact PCA when the sparsity constraint (the lasso penalty) vanishes (λ1,j = 0).
We call (3.12) the SPCA criterion hereafter.

3.3 NUMERICAL SOLUTION

We propose an alternating algorithm to minimize the SPCA criterion (3.12).
B given A: For each j, let Yj∗ = Xαj . By the same analysis used in (3.9)–(3.11), we know
that B̂ = [β̂1 , . . . , β̂k ], where each β̂j is an elastic net estimate

β̂j = arg min Yj∗ − Xβj 2 + λβj 2 + λ1,j βj 1 . (3.13)

βj

A given B: On the other hand, if B is fixed, then we can ignore the penalty part in (3.12)
n
and only try to minimize i=1 xi −ABT xi 2 = X−XBAT 2 , subject to AT A =
Ik×k . The solution is obtained by a reduced rank form of the Procrustes rotation,
given in Theorem 4 below. We compute the SVD

(XT X)B = UDVT , (3.14)

and set A = UVT .

Theorem 4. Reduced Rank Procrustes Rotation. Let Mn×p and Nn×k be two ma-
trices. Consider the constrained minimization problem

A = arg min M − NAT 2 subject to AT A = Ik×k . (3.15)

Suppose the SVD of MT N is UDVT , then A = UVT .

The usual Procrustes rotation (e.g., Mardia, Kent, and Bibby 1979) has N the same size
as M.
It is worth pointing out that to solve (3.13), we only need to know the Gram matrix
T
X X, because

Yj∗ − Xβj 2 + λβj 2 + λ1,j βj 1

= (αj − βj )T XT X(αj − βj ) + λβj 2 + λ1,j βj 1 . (3.16)

The same is true of (3.14).

Now n1 XT X is the sample covariance matrix of X. Therefore if Σ, the covariance matrix
of X, is known, we can replace XT X with Σ in (3.16) and have a population version of
SPCA. If X is standardized beforehand, then we use the (sample) correlation matrix, which
is preferred when the scales of the variables are different.
272 H. ZOU, T. HASTIE, AND R. TIBSHIRANI

Although (3.16) (with Σ instead of XT X) is not quite an elastic net problem, we can
easily turn it into one. Create the artificial response Y ∗∗ and X∗∗ as follows

Y ∗∗ = Σ 2 αj X∗∗ = Σ 2 ,
1 1
(3.17)

then it is easy to check that

β̂j = arg min Y ∗∗ − X∗∗ β2 + λβ2 + λ1,j β1 . (3.18)

Algorithm 1 summarizes the steps of our SPCA procedure outlined above.

Algorithm 1. General SPCA Algorithm

1. Let A start at V[ , 1 : k], the loadings of the first k ordinary principal components.
2. Given a fixed A = [α1 , . . . , αk ], solve the following elastic net problem for j =
1, 2, . . . , k

βj = arg min(αj − β)T XT X(αj − β) + λβ2 + λ1,j β1

3. For a fixed B = [β1 , · · · , βk ], compute the SVD of XT XB = UDVT , then update

A = UVT .
4. Repeat Steps 2–3, until convergence.
β
5. Normalization: Vj = βjj , j = 1, . . . , k.
Some remarks:
1. Empirical evidence suggests that the output of the above algorithm does not change
much as λ is varied. For n > p data, the default choice of λ can be zero. Practically λ
is chosen to be a small positive number to overcome potential collinearity problems
in X. Section 4 discusses the default choice of λ for data with thousands of variables,
such as gene expression arrays.
2. In principle, we can try several combinations of {λ1,j } to figure out a good choice
of the tuning parameters, since the above algorithm converges quite fast. There is a
shortcut provided by the direct sparse approximation (3.5). The LARS-EN algorithm
efficiently delivers a whole sequence of sparse approximations for each PC and the
corresponding values of λ1,j . Hence we can pick a λ1,j that gives a good compromise
between variance and sparsity. When facing the variance-sparsity trade-off, we let
variance have a higher priority.

3.4 ADJUSTED TOTAL VARIANCE

The ordinary principal components are uncorrelated and their loadings are orthogonal.
Let Σ = XT X, then VT V = Ik and VT ΣV is diagonal. It is easy to check that it is
only for ordinary principal components the the loadings can satisfy both conditions. In
Jolliffe, Trendafilov, and Uddin (2003) the loadings were forced to be orthogonal, so the
uncorrelated property was sacrificed. SPCA does not explicitly impose the uncorrelated
components condition either.
SPARSE PRINCIPAL COMPONENT ANALYSIS 273

Let Z be the modified PCs. Usually the total variance explained by Z is calculated
T
by tr(Z Z). This is reasonable when Z are uncorrelated. However, if they are correlated,
T
tr(Z Z) is too optimistic for representing the total variance. Suppose (Ẑi , i = 1, 2, . . . , k)
are the first k modified PCs by any method, and the (k + 1)th modified PC Ẑk+1 is obtained.
We want to compute the total variance explained by the first k + 1 modified PCs, which
should be the sum of the explained variance by the first k modified PCs and the additional
variance from Ẑk+1 . If Ẑk+1 is correlated with (Ẑi , i = 1, 2, . . . , k), then its variance
contains contributions from (Ẑi , i = 1, 2, . . . , k), which should not be included into the
total variance given the presence of (Ẑi , i = 1, 2, . . . , k).
Here we propose a new formula to compute the total variance explained by Z, which
takes into account the correlations among Z. We use regression projection to remove the
linear dependence between correlated components. Denote Ẑj·1,...,j−1 the residual after
adjusting Ẑj for Ẑ1 , . . . , Ẑj−1 , that is

Ẑj·1,...,j−1 = Ẑj − H1,...,j−1 Ẑj , (3.19)

where H1,...,j−1 is the projection matrix on {Ẑi }j−11 . Then the adjusted variance of Ẑj is
k
Ẑj·1,...,j−1 2 , and the total explained variance is defined as j=1 Ẑj·1,...,j−1 2 . When
T
the modified PCs Z are uncorrelated, the new formula agrees with tr(Z Z).
Note that the above computations depend on the order of Ẑi . However, since we have
a natural order in PCA, ordering is not an issue here. Using the QR decomposition, we can
easily compute the adjusted variance. Suppose Z = QR, where Q is orthonormal and R is
upper triangular. Then it is straightforward to see that

Ẑj·1,...,j−1 2 = R2jj . (3.20)

k
Hence the explained total variance is equal to j=1 R2jj .

3.5 COMPUTATION COMPLEXITY

PCA is computationally efficient for both n > p or p n data. We separately discuss
the computational cost of the general SPCA algorithm for n > p and p n.
1. n > p. Traditional multivariate data fit in this category. Note that although the SPCA
criterion is defined using X, it only depends on X via XT X. A trick is to first compute
the p × p matrix Σ = XT X once for all, which requires np2 operations. Then the
same Σ is used at each step within the loop. Computing XT Xβ costs p2 k and the
SVD of XT Xβ is of order O(pk 2 ). Each elastic net solution requires at most O(p3 )
operations. Since k ≤ p, the total computation cost is at most np2 + mO(p3 ), where
m is the number of iterations before convergence. Therefore the SPCA algorithm is
able to efficiently handle data with huge n, as long as p is small (say p < 100).
2. p n. Gene expression arrays are typical examples in this p n category. The
Σ is no longer applicable, because Σ is a huge matrix (p × p) in this
trick of using Σ̂
case. The most consuming step is solving each elastic net, whose cost is of order
274 H. ZOU, T. HASTIE, AND R. TIBSHIRANI

O(pnJ + J 3 ) for a positive finite λ, where J is the number of nonzero coefficients.

Generally speaking the total cost is of order mkO(pJn+J 3 ), which can be expensive
for large J and p. Fortunately, as shown in the next section, there exists a special
SPCA algorithm for efficiently dealing with p n data.

4. SPCA FOR p n AND GENE EXPRESSION ARRAYS

For gene expression arrays the number of variables (genes) is typically much bigger
than the number of samples (e.g., n = 10, 000, p = 100). Our general SPCA algorithm
still fits this situation using a positive λ. However the computational cost is expensive when
requiring a large number of nonzero loadings. It is desirable to simplify the general SPCA
algorithm to boost the computation.
Observe that Theorem 3 is valid for all λ > 0, so in principle we can use any positive
λ. It turns out that a thrifty solution emerges if λ → ∞. Precisely, we have the following
theorem.
β̂j
Theorem 5. Let Vj (λ) = β̂j
(j = 1, . . . , k) be the loadings derived from criterion
(3.12). Let (A, B) be the solution of the optimization problem
k
k

(A, B) = arg min −2tr AT XT XB + βj 2 + λ1,j βj 1 (4.1)
A, B j=1 j=1

subject to AT A = Ik×k .
β̂
When λ → ∞, Vj (λ) → β̂j .
j
We can use the same alternating algorithm in Section 3.3 to solve (4.1), where we only
need to replace the general elastic net problem with its special case (λ = ∞). Note that
given A,

β̂j = arg min −2αjT (XT X)βj + βj 2 + λ1,j βj 1 , (4.2)
βj

which has an explicit form solution given in (4.3).

Gene Expression Arrays SPCA Algorithm. Replacing Step 2 in the general SPCA
algorithm with
Step 2∗ : for j = 1, 2, . . . , k

T T λ1,j
βj = αj X X − Sign(αjT XT X). (4.3)
2 +

The operation in (4.3) is called soft-thresholding. Figure 1 gives an illustration of how the
soft-thresholding rule operates. Recently soft-thresholding has become increasingly popular
in the literature. For example, nearest shrunken centroids (Tibshirani, Hastie, Narasimhan,
and Chu 2002) adopts the soft-thresholding rule to simultaneously classify samples and
select important genes in microarrays.
SPARSE PRINCIPAL COMPONENT ANALYSIS 275

Figure 1. An illustration of soft-thresholding rule y = (|x| − ∆)+ Sign(x) with ∆ = 1.

5. EXAMPLES

5.1 PITPROPS DATA

The pitprops data first introduced by Jeffers (1967) has 180 observations and 13 mea-
sured variables. It is a classic example showing the difficulty of interpreting principal com-
ponents. Jeffers (1967) tried to interpret the first six PCs. Jolliffe, Trendafilov, and Uddin
(2003) used their SCoTLASS to find the modified PCs. Table 1 presents the results of
PCA, while Table 2 presents the modified PC loadings as computed by SCoTLASS and the
adjusted variance computed using (3.20).
As a demonstration, we also considered the first six principal components. Since this
is a usual n p dataset, we set λ = 0. λ1 = (0.06, 0.16, 0.1, 0.5, 0.5, 0.5) were chosen
according to Figure 2 such that each sparse approximation explained almost the same
amount of variance as the ordinary PC did. Table 3 shows the obtained sparse loadings and
the corresponding adjusted variance. Compared with the modified PCs of SCoTLASS, PCs
by SPCA account for a larger amount of variance (75.8% vs. 69.3%) with a much sparser
loading structure. The important variables associated with the six PCs do not overlap, which
further makes the interpretations easier and clearer. It is interesting to note that in Table 3
even though the variance does not strictly monotonously decrease, the adjusted variance
follows the right order. However, Table 2 shows that this is not true in SCoTLASS. It
is also worthy of mention that the entire SPCA computation was done in seconds in R,
while the implementation of SCoTLASS for each t was expensive (Jolliffe, Trendafilov,
and Uddin 2003). Optimizing SCoTLASS over several values of t is an even more difficult
computational challenge.
Although the informal thresholding method, which we henceforth refer to as simple
276 H. ZOU, T. HASTIE, AND R. TIBSHIRANI

Table 1. Pitprops Data: Loadings of the First Six Principal Components

Variable PC1 PC2 PC3 PC4 PC5 PC6

topdiam −0.404 0.218 −0.207 0.091 −0.083 0.120
length −0.406 0.186 −0.235 0.103 −0.113 0.163
moist −0.124 0.541 0.141 −0.078 0.350 −0.276
testsg −0.173 0.456 0.352 −0.055 0.356 −0.054
ovensg −0.057 −0.170 0.481 −0.049 0.176 0.626
ringtop −0.284 −0.014 0.475 0.063 −0.316 0.052
ringbut −0.400 −0.190 0.253 0.065 −0.215 0.003
bowmax −0.294 −0.189 −0.243 −0.286 0.185 −0.055
bowdist −0.357 0.017 −0.208 −0.097 −0.106 0.034
whorls −0.379 −0.248 −0.119 0.205 0.156 −0.173
clear 0.011 0.205 −0.070 −0.804 −0.343 0.175
knots 0.115 0.343 0.092 0.301 −0.600 −0.170
diaknot 0.113 0.309 −0.326 0.303 0.080 0.626

Variance (%) 32.4 18.3 14.4 8.5 7.0 6.3

Cumulative variance (%) 32.4 50.7 65.1 73.6 80.6 86.9

Table 2. Pitprops Data: Loadings of the First Six Modified PCs by SCoTLASS. Empty cells have zero
loadings.

t = 1.75
Variable PC1 PC2 PC3 PC4 PC5 PC6
topdiam 0.664 −0.025 0.002 −0.035
length 0.683 −0.001 −0.040 0.001 −0.018
moist 0.641 0.195 0.180 −0.030
testsg 0.701 0.001 −0.001
ovensg −0.887 −0.056
ringtop 0.293 −0.186 −0.373 0.044
ringbut 0.001 0.107 −0.658 −0.051 0.064
bowmax 0.001 0.735 0.021 −0.168
bowdist 0.283 −0.001
whorls 0.113 −0.001 0.388 −0.017 0.320
clear −0.923
knots 0.001 −0.554 0.016 0.004
diaknot 0.703 0.001 −0.197 0.080

Number of nonzero loadings 6 6 6 6 10 13

Variance (%) 19.6 16.0 13.1 13.1 9.2 9.0
Adjusted variance (%) 19.6 13.8 12.4 8.0 7.1 8.4
Cumulative adjusted variance (%) 19.6 33.4 45.8 53.8 60.9 69.3
SPARSE PRINCIPAL COMPONENT ANALYSIS 277

Figure 2. Pitprops data: The sequences of sparse approximations to the first six principal components. The curves
show the percentage of explained variance (PEV) as a function of λ1 . The vertical broken lines indicate the choice
of λ1 used in our SPCA analysis.
278 H. ZOU, T. HASTIE, AND R. TIBSHIRANI

Table 3. Pitprops Data: Loadings of the First Six Sparse PCs by SPCA. Empty cells have zero loadings.

Variable PC1 PC2 PC3 PC4 PC5 PC6

topdiam −0.477
length −0.476
moist 0.785
testsg 0.620
ovensg 0.177 0.640
ringtop 0.589
ringbut −0.250 0.492
bowmax −0.344 −0.021
bowdist −0.416
whorls −0.400
clear −1
knots 0.013 −1
diaknot −0.015 1

Number of nonzero loadings 7 4 4 1 1 1

Variance (%) 28.0 14.4 15.0 7.7 7.7 7.7
Adjusted variance (%) 28.0 14.0 13.3 7.4 6.8 6.2
Cumulative adjusted variance (%) 28.0 42.0 55.3 62.7 69.5 75.8

thresholding, has various drawbacks, it may serve as the benchmark for testing sparse PCs
methods. A variant of simple thresholding is soft-thresholding. We found that when used in
PCA, soft-thresholding performs very similarly to simple thresholding. Thus, we omitted the
results of soft-thresholding in this article. Both SCoTLASS and SPCA were compared with
simple thresholding. Table 4 presents the loadings and the corresponding variance explained
by simple thresholding. To make the comparisons fair, we let the numbers of nonzero
loadings obtained by simple thresholding match the results of SCoTLASS and SPCA, as
shown in the top and bottom parts of Table 4, respectively. In terms of variance, it seems
that simple thresholding is better than SCoTLASS and worse than SPCA. Moreover, the
variables with nonzero loadings by SPCA are different to that chosen by simple thresholding
for the first three PCs; while SCoTLASS seems to create a similar sparseness pattern as
simple thresholding does, especially in the leading PC.

5.2 A SYNTHETIC EXAMPLE

Our synthetic example has three hidden factors

V1 ∼ N (0, 290), V2 ∼ N (0, 300),

V3 = −0.3V1 + 0.925V2 + , ∼ N (0, 1),

and

V1 , V2 and are independent.

Then 10 observable variables are constructed as follows

Xi = V1 + 1i , 1i ∼ N (0, 1), i = 1, 2, 3, 4,

SPARSE PRINCIPAL COMPONENT ANALYSIS 279

Xi = V2 + 2i , 2i ∼ N (0, 1), i = 5, 6, 7, 8,

X i = V3 + 3i , 3i ∼ N (0, 1), i = 9, 10,
{ji } are independent, j = 1, 2, 3 i = 1, . . . , 10.

We used the exact covariance matrix of (X1 , . . . , X10 ) to perform PCA, SPCA and simple
thresholding (in the population setting).

Table 4. Pitprops Data: Loadings of the First Six Modified PCs by Simple Thresholding. Empty cells
have zero loadings.

Variable PC1 PC2 PC3 PC4 PC5 PC6

Simple thresholding vs. SCoTLASS
topdiam −0.439 0.240 0.120
length −0.441 0.105 −0.114 0.163
moist 0.596 0.354 −0.276
testsg 0.503 0.391 0.360 −0.054
ovensg 0.534 0.178 0.626
ringtop 0.528 −0.320 0.052
ringbut −0.435 0.281 −0.218 0.003
bowmax −0.319 −0.270 −0.291 0.188 −0.055
bowdist −0.388 0.034
whorls −0.412 −0.274 0.209 0.158 −0.173
clear −0.819 −0.347 0.175
knots 0.378 0.307 −0.608 −0.170
diaknot 0.340 −0.362 0.309 0.626

Number of nonzero loadings 6 6 6 6 10 13

Variance (%) 28.9 16.1 15.4 8.4 7.1 6.3
Adjusted Variance (%) 28.9 16.1 13.9 8.2 6.9 6.2
Cumulative adjusted variance (%) 28.9 45.0 58.9 67.1 74.0 80.2

Simple thresholding vs. SPCA

topdiam −0.420
length −0.422
moist 0.640
testsg 0.540 0.425
ovensg 0.580
ringtop −0.296 0.573
ringbut −0.416
bowmax −0.305
bowdist −0.370
whorls −0.394
clear −1
knots 0.406 −1
diaknot 0.365 −0.393 1

Number of nonzero loadings 7 4 4 1 1 1

Variance (%) 30.7 14.8 13.6 7.7 7.7 7.7
Adjusted variance (%) 30.7 14.7 11.1 7.6 5.2 3.6
Cumulative adjusted variance (%) 30.7 45.4 56.5 64.1 68.3 71.9
280 H. ZOU, T. HASTIE, AND R. TIBSHIRANI

Table 5. Results of the Simulation Example: Loadings and Variance

PCA SPCA (λ = 0) Simple Thresholding

PC1 PC2 PC3 PC1 PC2 PC1 PC2
X1 0.116 −0.478 −0.087 0 0.5 0 −0.5
X2 0.116 −0.478 −0.087 0 0.5 0 −0.5
X3 0.116 −0.478 −0.087 0 0.5 0 −0.5
X4 0.116 −0.478 −0.087 0 0.5 0 −0.5
X5 −0.395 −0.145 0.270 0.5 0 0 0
X6 −0.395 −0.145 0.270 0.5 0 0 0
X7 −0.395 −0.145 0.270 0.5 0 −0.497 0
X8 −0.395 −0.145 0.270 0.5 0 −0.497 0
X9 −0.401 0.010 −0.582 0 0 −0.503 0
X10 −0.401 0.010 −0.582 0 0 −0.503 0

Adjusted
Variance (%) 60.0 39.6 0.08 40.9 39.5 38.8 38.6

The variance of the three underlying factors is 290, 300, and 283.8, respectively. The
numbers of variables associated with the three factors are 4, 4, and 2. Therefore V2 and
V1 are almost equally important, and they are much more important than V3 . The first two
PCs together explain 99.6% of the total variance. These facts suggest that we only need
to consider two derived variables with “correct” sparse representations. Ideally, the first
derived variable should recover the factor V2 only using (X5 , X6 , X7 , X8 ), and the second
derived variable should recover the factor V1 only using (X1 , X2 , X3 , X4 ). In fact, if we
sequentially maximize the variance of the first two derived variables under the orthonormal
constraint, while restricting the numbers of nonzero loadings to four, then the first derived
variable uniformly assigns nonzero loadings on (X5 , X6 , X7 , X8 ); and the second derived
variable uniformly assigns nonzero loadings on (X1 , X2 , X3 , X4 ).
Both SPCA (λ = 0) and simple thresholding were carried out by using the oracle
information that the ideal sparse representations use only four variables. Table 5 summarizes
the comparison results. Clearly, SPCA correctly identifies the sets of important variables. In
fact, SPCA delivers the ideal sparse representations of the first two principal components.
Mathematically, it is easy to show that if t = 2 is used, SCoTLASS is also able to find the
same sparse solution. In this example, both SPCA and SCoTLASS produce the ideal sparse
PCs, which may be explained by the fact that both methods explicitly use the lasso penalty.
In contrast, simple thresholding incorrectly includes X9 , X10 in the most important
variables. The variance explained by simple thresholding is also lower than that by SPCA,
although the relative difference is small (less than 5%). Due to the high correlation between
V2 and V3 , variables X9 , X10 achieve loadings which are even higher than those of the true
important variables (X5 , X6 , X7 , X8 ). Thus the truth is disguised by the high correlation.
On the other hand, simple thresholding correctly discovers the second factor, because V1
has a low correlation with V3 .

5.3 RAMASWAMY DATA

An important task in microarray analysis is to find a set of genes which are biologically
SPARSE PRINCIPAL COMPONENT ANALYSIS 281

Figure 3. The sparse leading principal component: percentage of explained variance versus sparsity. Simple
thresholding and SPCA have similar performances. However, there still exists consistent difference in the selected
genes (the ones with nonzero loadings).

relevant to the outcome (e.g. tumor type or survival time). PCA (or SVD) has been a popular
tool for this purpose. Many gene-clustering methods in the literature use PCA (or SVD) as
a building block. For example, gene shaving (Hastie et al. 2000) uses an iterative principal
component shaving algorithm to identify subsets of coherent genes. Here we consider
another approach to gene selection through SPCA. The idea is intuitive: if the (sparse)
principal component can explain a large part of the total variance of gene expression levels,
then the subset of genes representing the principal component are considered important.
We illustrate the sparse PC selection method on Ramaswamy’s data (Ramaswamy et al.
2001) which has 16,063 (p = 16,063) genes and 144 (n = 144) samples. Its first principal
component explains 46% of the total variance. For microarray data like this, it appears
that SCoTLASS cannot be practically useful for finding sparse PCs. We applied SPCA
(λ = ∞) to find the leading sparse PC. A sequence of values for λ1 were used such that
the number of nonzero loadings varied over a wide range. As displayed in Figure 3, the
percentage of explained variance decreases at a slow rate, as the sparsity increases. As few
as 2.5% of these 16,063 genes can sufficiently construct the leading principal component
with an affordable loss of explained variance (from 46% to 40%). Simple thresholding
was also applied to this data. It seems that when using the same number of genes, simple
thresholding always explains slightly higher variance than SPCA does. Among the same
number of selected genes by SPCA and simple thresholding, there are about 2% different
genes, and this difference rate is quite consistent.
282 H. ZOU, T. HASTIE, AND R. TIBSHIRANI

6. DISCUSSION
It has been an interesting research topic for years to derive principal components with
sparse loadings. From a practical point of view, a good method to achieve the sparseness
goal should (at least) possess the following properties.

• Without any sparsity constraint, the method should reduce to PCA.

• It should be computationally efficient for both small p and big p data.
• It should avoid misidentifying the important variables.

The often-used simple thresholding approach is not criterion based. However, this
informal method seems to possess the first two of the desirable properties listed above. If
the explained variance and sparsity are the only concerns, simple thresholding is a reasonable
approach, and it is extremely convenient. We have shown that simple thresholding can work
well with gene expression arrays. The serious problem with simple thresholding is that it
can misidentify the real important variables. Nevertheless, simple thresholding is regarded
as a benchmark for any potentially better method.
Using the lasso constraint in PCA, SCoTLASS successfully derives sparse loadings.
However, SCoTLASS is not computationally efficient, and it lacks a good rule to pick its
tuning parameter. In addition, it is not feasible to apply SCoTLASS to gene expression
arrays, where PCA is a quite popular tool.
In this work we have developed SPCA using our SPCA criterion (3.12). This new
criterion gives exact PCA results when its sparsity (lasso) penalty term vanishes. SPCA
allows flexible control on the sparse structure of the resulting loadings. Unified efficient
algorithms have been proposed to compute SPCA solutions for both regular multivariate
data and gene expression arrays. As a principled procedure, SPCA enjoys advantages in
several aspects, including computational efficiency, high explained variance and an ability
in identifying important variables.
Software in R for fitting the SPCA model (and elastic net models) is available in the
CRAN contributed package elasticnet.

APPENDIX: PROOFS
Proof of Theorem 1: Using XT X = VD2 VT and VT V = I, we have
−1 T D2ii
β̂ridge = XT X + λI X (XVi ) = Vi . (A.1)
D2ii + λ
Hence v̂ = Vi . ✷
Note that since Theorem 2 is a special case of Theorem 3, we will not prove it separately.
We first provide a lemma.
Lemma 1. Consider the ridge regression criterion

Cλ (β) = y − Xβ2 + λβ2 .

SPARSE PRINCIPAL COMPONENT ANALYSIS 283

Then if β̂ = arg minβ Cλ (β),

Cλ (β̂) = yT (I − Sλ )y,

where Sλ is the ridge operator

Sλ = X(XT X + λI)−1 XT .

Proof of Lemma 1: Differentiating Cλ with respect to β, we get that

−XT (y − Xβ̂) + λβ̂ = 0.

Premultiplication by β̂ T and re-arrangement gives λβ̂2 = (y − Xβ̂)T Xβ̂. Since

y − Xβ̂2 = (y − Xβ̂)T y − (y − Xβ̂)T Xβ̂,

Cλ (β̂) = (y − Xβ̂)T y. The result follows since the “fitted values” Xβ̂ = Sλ y. ✷
Proof of Theorem 3. We use the notation introduced in Section 3: A = [α1 , . . . , αk ]
and B = [β1 , . . . , βk ]. Let
n
k

Cλ (A, B) = xi − ABT xi 2 + λ βj 2 .
i=1 j=1

As in (3.9) we have
n

xi − ABT xi 2 = X − XBAT 2 (A.2)
i=1
= XA⊥ 2 + XA − XB2 . (A.3)

Hence, with A fixed, solving

arg min Cλ (A, B)

is equivalent to solving the series of ridge regressions

k

arg min {Xαj − Xβj 2 + λβj 2 }.
{βj }k
1 j=1

It is easy to show that

B = (XT X + λI)−1 XT XA, (A.4)

and using Lemma 1 and (A.2) we have that the partially optimized penalized criterion is
given by

Cλ (A, B) = XA⊥ 2 + tr (XA)T (I − Sλ )(XA) . (A.5)
284 H. ZOU, T. HASTIE, AND R. TIBSHIRANI

Rearranging the terms, we get

Cλ (A, B) = tr(XT X) − tr(AT XT Sλ XA), (A.6)

which must be minimized with respect to A with AT A = I. Hence A should be taken to

be the largest k eigenvectors of XT Sλ X. If the SVD of X = UDVT , it is easy to show that
XT Sλ X = VD2 (D2 + λI)−1 D2 VT , hence A = V[, 1 : k]. Likewise, plugging the SVD of
X into (A.4), we see that each of the β̂j are scaled elements of the corresponding Vj .

Proof of Theorem 4: We expand the matrix norm

M − NAT 2 = tr(MT M) − 2tr(MT NAT ) + tr(ANT NAT ). (A.7)

Since AT A = I, the last term is equal to tr(NT N), and hence we need to maximize (minus
half) the middle term. With the SVD MT N = UDVT , this middle term becomes

tr(MT NAT ) = tr(UDVT AT ) (A.8)

∗T
= tr(UDA ) (A.9)
∗T
= tr(A UD), (A.10)
T
where A∗ = AV, and since V is k × k orthonormal, A∗ A∗ = I. Now since D is diagonal,
T
(A.10) is maximized when the diagonal of A∗ U is positive and maximum. By Cauchy-
Schwartz inequality, this is achieved when A∗ = U, in which case the diagonal elements
are all 1. Hence Â = UVT . ✷
∗ β̂i∗
Proof of Theorem 5: Let B = [β̂1∗ , . . . , β̂k∗ ] with β̂ ∗ = (1 + λ)β̂, then V̂i (λ) = ||β̂i∗ ||
.
β̂ ∗
On the other hand, β̂ = 1+λ means that
∗
(A, B ) = arg min Cλ,λ1 (A, B) subject to AT A = Ik×k , (A.11)
A, B

where
n
k
βj k
BT βj
Cλ,λ1 (A, B) = xi − A xi 2 + λ 2 + λ1,j 1 . (A.12)
1+λ 1+λ 1+λ
i=1 j=1 j=1

Since
k
βj 2 1
= tr(BT B),
1+λ (1 + λ)2
j=1

and
n

BT B T T B T
xi − A xi = tr (X − X
2
A ) (X − X A )
1+λ 1+λ 1+λ
i=1
1 2
= tr XT X + tr BT XT XB − tr AT XT XB .
(1 + λ)2 1+λ
SPARSE PRINCIPAL COMPONENT ANALYSIS 285

Thus we have

Cλ,λ1 (A, B) = tr XT X
 
T k

1  X X + λI
+ tr BT B − 2Tr AT XT XB + λ1,j βj 1  ,
1+λ 1+λ
j=1

which implies that

k
∗ XT X + λI
(A, B ) = arg min tr B T
B − 2tr AT XT XB + λ1,j βj 1 , (A.13)
A, B 1+λ
j=1

subject to AT A = Ik×k . As λ → ∞, tr BT X 1+λ

T
X+λI B → tr(BT B) = k β 2 .
j=1 j
Thus (A.13) approaches (4.1) and the conclusion of Theorem 5 follows. ✷

ACKNOWLEDGMENTS
We thank the editor, an associate editor, and referees for helpful comments and suggestions which greatly
improved the manuscript.

[Received April 2004. Revised June 2005.]

REFERENCES

Alter, O., Brown, P., and Botstein, D. (2000), “Singular Value Decomposition for Genome-Wide Expression Data
Processing and Modeling,” in Proceedings of the National Academy of Sciences, 97, pp. 10101–10106.
Cadima, J., and Jolliffe, I. (1995), “Loadings and Correlations in the Interpretation of Principal Components,”
Journal of Applied Statistics, 22, 203–214.
Efron, B., Hastie, T., Johnstone, I., and Tibshirani, R. (2004), “Least Angle Regression,” The Annals of Statistics,
32, 407–499.
Hancock, P., Burton, A., and Bruce, V. (1996), “Face Processing: Human Perception and Principal Components
Analysis,” Memory and Cognition, 24, 26–40.
Hastie, T., Tibshirani, R., and Friedman, J. (2001), The Elements of Statistical Learning; Data mining, Inference
and Prediction, New York: Springer Verlag.
Hastie, T., Tibshirani, R., Eisen, M., Brown, P., Ross, D., Scherf, U., Weinstein, J., Alizadeh, A., Staudt, L.,
and Botstein, D. (2000), “ ‘gene Shaving’ as a Method for Identifying Distinct Sets of Genes With Similar
Expression Patterns,” Genome Biology, 1, 1–21.
Jeffers, J. (1967), “Two Case Studies in the Application of Principal Component,” Applied Statistics, 16, 225–236.
Jolliffe, I. (1986), Principal Component Analysis, New York: Springer Verlag.
(1995), “Rotation of Principal Components: Choice of Normalization Constraints,” Journal of Applied
Statistics, 22, 29–35.
Jolliffe, I. T., Trendafilov, N. T., and Uddin, M. (2003), “A Modified Principal Component Technique Based on
the Lasso,” Journal of Computational and Graphical Statistics, 12, 531–547.
Mardia, K., Kent, J., and Bibby, J. (1979), Multivariate Analysis, New York: Academic Press.
McCabe, G. (1984), “Principal Variables,” Technometrics, 26, 137–144.
286 H. ZOU, T. HASTIE, AND R. TIBSHIRANI

Osborne, M. R., Presnell, B., and Turlach, B. A. (2000), “A New Approach to Variable Selection in Least Squares
Problems,” IMA Journal of Numerical Analysis, 20, 389–403.
Ramaswamy, S., Tamayo, P., Rifkin, R., Mukheriee, S., Yeang, C., Angelo, M., Ladd, C., Reich, M., Latulippe,
E., Mesirov, J., Poggio, T., Gerald, W., Loda, M., Lander, E., and Golub, T. (2001), “Multiclass Cancer
Diagnosis using Tumor Gene Expression Signature,” in Proceedings of the National Academy of Sciences,
98, pp. 15149–15154.
Tibshirani, R. (1996), “Regression Shrinkage and Selection via the Lasso,” Journal of the Royal Statistical Society,
Series B, 58, 267–288.
Tibshirani, R., Hastie, T., Narasimhan, B., and Chu, G. (2002), “Diagnosis of Multiple Cancer Types by Shrunken
Centroids of Gene,” in Proceedings of the National Academy of Sciences, 99, 6567–6572.
Vines, S. (2000), “Simple Principal Components,” Applied Statistics, 49, 441–451.
Zou, H., and Hastie, T. (2005), “Regularization and Variable Selection via the Elastic Net,” Journal of the Royal
Statistical Society, Series B, 67, 301–320.

HSERILib
No ratings yet
HSERILib
12 pages
Analysis of Two Partial-least-Squares Algorithms For Multivariate Calibration PDF
No ratings yet
Analysis of Two Partial-least-Squares Algorithms For Multivariate Calibration PDF
17 pages
Principal Component Analysis
100% (1)
Principal Component Analysis
34 pages
Sparse Principal Component Analysis
No ratings yet
Sparse Principal Component Analysis
23 pages
Sparse Principal Component Analysis
No ratings yet
Sparse Principal Component Analysis
30 pages
The Bayesian Elastic Net
No ratings yet
The Bayesian Elastic Net
20 pages
Machine L in China Appendix
No ratings yet
Machine L in China Appendix
43 pages
zouhastie05
No ratings yet
zouhastie05
20 pages
Some Comments on Cp
No ratings yet
Some Comments on Cp
9 pages
Pca Ica
No ratings yet
Pca Ica
34 pages
Journal of Statistical Software: Regularization Paths For Generalized Linear Models Via Coordinate Descent
No ratings yet
Journal of Statistical Software: Regularization Paths For Generalized Linear Models Via Coordinate Descent
22 pages
BMAA15-4-3
No ratings yet
BMAA15-4-3
14 pages
A Projected Semismooth Newton Method For A Class of Nonconvex Composite Programs With Strong Prox-Regularity
No ratings yet
A Projected Semismooth Newton Method For A Class of Nonconvex Composite Programs With Strong Prox-Regularity
32 pages
Principal Component Analysis As A Tool F
No ratings yet
Principal Component Analysis As A Tool F
13 pages
A Bayesian Perspective On Generalization and Stochastic Gradient Descent
No ratings yet
A Bayesian Perspective On Generalization and Stochastic Gradient Descent
4 pages
Asr 013
No ratings yet
Asr 013
16 pages
Some Analysis of The Knockoff Filter and Its Variants: Jiajie Chen, Anthony Hou, Thomas Y. Hou June 6, 2017
No ratings yet
Some Analysis of The Knockoff Filter and Its Variants: Jiajie Chen, Anthony Hou, Thomas Y. Hou June 6, 2017
25 pages
NIPS 2006 Efficient Sparse Coding Algorithms Paper
No ratings yet
NIPS 2006 Efficient Sparse Coding Algorithms Paper
8 pages
Dimension Reduction
No ratings yet
Dimension Reduction
23 pages
Ahmed Rebai PCA-ICA
No ratings yet
Ahmed Rebai PCA-ICA
34 pages
The Geometry of Partial Least Squares
No ratings yet
The Geometry of Partial Least Squares
28 pages
Johansen 1988
No ratings yet
Johansen 1988
24 pages
Bayesian Regularized Quantile Regression
No ratings yet
Bayesian Regularized Quantile Regression
24 pages
Clustering and Feature Selection Using Sparse Principal Component Analysis
No ratings yet
Clustering and Feature Selection Using Sparse Principal Component Analysis
13 pages
Day 1
No ratings yet
Day 1
41 pages
Probabilidade Um Curso Moderno Com Aplicacoes Sheldon Ross
0% (1)
Probabilidade Um Curso Moderno Com Aplicacoes Sheldon Ross
11 pages
19
No ratings yet
19
22 pages
Continuous L1 Norm Estimation of Lorenz Curve
No ratings yet
Continuous L1 Norm Estimation of Lorenz Curve
9 pages
A Mathematical Programming Approach For Improving The Robustness of LAD Regression
No ratings yet
A Mathematical Programming Approach For Improving The Robustness of LAD Regression
22 pages
Regression Shrinkage and Selection Via The Lasso
No ratings yet
Regression Shrinkage and Selection Via The Lasso
22 pages
Stochastic Approx and Simulated Annealing Leo Sakalauskas Euro Working Group On Cont Opt Aug 2010
No ratings yet
Stochastic Approx and Simulated Annealing Leo Sakalauskas Euro Working Group On Cont Opt Aug 2010
49 pages
Principles of Statistics
No ratings yet
Principles of Statistics
113 pages
1 s2.0 0047259X75900421 Main
No ratings yet
1 s2.0 0047259X75900421 Main
17 pages
CS 229, Public Course Problem Set #4: Unsupervised Learning and Re-Inforcement Learning
No ratings yet
CS 229, Public Course Problem Set #4: Unsupervised Learning and Re-Inforcement Learning
5 pages
Indefinite Symmetric System
No ratings yet
Indefinite Symmetric System
18 pages
Empirical Finance8
No ratings yet
Empirical Finance8
11 pages
hinkley1975
No ratings yet
hinkley1975
12 pages
1 s2.0 S0034487720300756 Main
No ratings yet
1 s2.0 S0034487720300756 Main
8 pages
PrincipalComponentAnalysisofBinaryData - Applicationstoroll Call Analysis
No ratings yet
PrincipalComponentAnalysisofBinaryData - Applicationstoroll Call Analysis
32 pages
On The Eigenvalue of $P (X) $-Laplace Equation: Article
No ratings yet
On The Eigenvalue of $P (X) $-Laplace Equation: Article
37 pages
Comparison of Estimates Using Record Statistics From Weibull Model: Bayesian and Non-Bayesian Approaches
No ratings yet
Comparison of Estimates Using Record Statistics From Weibull Model: Bayesian and Non-Bayesian Approaches
13 pages
1 s2.0 S089396592300321X Main
No ratings yet
1 s2.0 S089396592300321X Main
7 pages
A Convenient Way of Generating Normal Random Variables Using Generalized Exponential Distribution
No ratings yet
A Convenient Way of Generating Normal Random Variables Using Generalized Exponential Distribution
11 pages
Provable Benefits of Annealing
No ratings yet
Provable Benefits of Annealing
26 pages
Data Pre-Processing-IV (Feature Extraction-PCA)_7c5a4c5da931f4f69a14c94e7e8b9062
No ratings yet
Data Pre-Processing-IV (Feature Extraction-PCA)_7c5a4c5da931f4f69a14c94e7e8b9062
23 pages
18 022 PDF
No ratings yet
18 022 PDF
33 pages
Tibshirani Lasso
No ratings yet
Tibshirani Lasso
22 pages
hw1 PDF
No ratings yet
hw1 PDF
3 pages
On The Convergence of Projected Alternating Maximization For Equitable and Optimal Transport
No ratings yet
On The Convergence of Projected Alternating Maximization For Equitable and Optimal Transport
33 pages
Thesis Proposal: Graph Structured Statistical Inference: James Sharpnack
No ratings yet
Thesis Proposal: Graph Structured Statistical Inference: James Sharpnack
20 pages
Andersson Djehiche - AMO 2011
No ratings yet
Andersson Djehiche - AMO 2011
16 pages
Remezr2 Arxiv
No ratings yet
Remezr2 Arxiv
30 pages
# Loop Over Classes: 6.2 Principal Components Analysis (Pca)
No ratings yet
# Loop Over Classes: 6.2 Principal Components Analysis (Pca)
10 pages
Qrm2024 Topic5 Pca Fa
No ratings yet
Qrm2024 Topic5 Pca Fa
67 pages
Bubenik, Dłotko - 2017 - A persistence landscapes toolbox for topological statistics
No ratings yet
Bubenik, Dłotko - 2017 - A persistence landscapes toolbox for topological statistics
24 pages
On The Eigenspectrum of The Gram Matrix and Its Relationship To The Operator Eigenspectrum
No ratings yet
On The Eigenspectrum of The Gram Matrix and Its Relationship To The Operator Eigenspectrum
18 pages
Studia-2019-Dyach-Mukan-Tikh-last
No ratings yet
Studia-2019-Dyach-Mukan-Tikh-last
19 pages
A BAYESIAN APPROACH TO NONPARAMETRIC TEST PROBLEMS
No ratings yet
A BAYESIAN APPROACH TO NONPARAMETRIC TEST PROBLEMS
16 pages
Pca PDF
No ratings yet
Pca PDF
33 pages
A Short Course in Discrete Mathematics
From Everand
A Short Course in Discrete Mathematics
Edward A. Bender
3/5 (1)
Lecture 4
No ratings yet
Lecture 4
28 pages
Lecture 5
No ratings yet
Lecture 5
19 pages
Lecture_2[1]
No ratings yet
Lecture_2[1]
22 pages
Lecture 1
No ratings yet
Lecture 1
30 pages
Neubert2019 Article AnIntroductionToHyperdimension
No ratings yet
Neubert2019 Article AnIntroductionToHyperdimension
12 pages
VV Pad GR Maths Model QP 12
No ratings yet
VV Pad GR Maths Model QP 12
6 pages
Course Content Form: MAT 220 Calculus I
No ratings yet
Course Content Form: MAT 220 Calculus I
3 pages
Markov Chains and Boardgames Like Monopoly With Mathematica 4-5-2008
No ratings yet
Markov Chains and Boardgames Like Monopoly With Mathematica 4-5-2008
39 pages
5 Simplex Method - Lecture
No ratings yet
5 Simplex Method - Lecture
50 pages
Lesson Plan 2009
80% (10)
Lesson Plan 2009
11 pages
Trampita PDF
No ratings yet
Trampita PDF
105 pages
Properties of Laplace Transform - Advance Engineering Mathematics Review
No ratings yet
Properties of Laplace Transform - Advance Engineering Mathematics Review
4 pages
Week 2 Notes
No ratings yet
Week 2 Notes
23 pages
Cayley Hamilton 2011
No ratings yet
Cayley Hamilton 2011
10 pages
Textbook of Vector Analysis and Coordinate Geometry 9350843145 9789350843147 Compress
No ratings yet
Textbook of Vector Analysis and Coordinate Geometry 9350843145 9789350843147 Compress
204 pages
Algebraic Dual Space: September 2015
No ratings yet
Algebraic Dual Space: September 2015
12 pages
Introduction To MATLAB 7 For Engineers
No ratings yet
Introduction To MATLAB 7 For Engineers
22 pages
Corepure1 Chapter 3::: Series
No ratings yet
Corepure1 Chapter 3::: Series
15 pages
Simplex Algorithm
No ratings yet
Simplex Algorithm
26 pages
Maths 1 Mock 3 (Week 1-4) Sols ?
No ratings yet
Maths 1 Mock 3 (Week 1-4) Sols ?
15 pages
Ued 490 Instructional Unit
No ratings yet
Ued 490 Instructional Unit
20 pages
GEC 104 Final Week - Function and Relations
No ratings yet
GEC 104 Final Week - Function and Relations
7 pages
Taligaman National High School Taligaman, Butuan City
No ratings yet
Taligaman National High School Taligaman, Butuan City
3 pages
Assignment 01
No ratings yet
Assignment 01
9 pages
2.1 Properties of Linear Functions
No ratings yet
2.1 Properties of Linear Functions
8 pages
A Guide To Fractions
No ratings yet
A Guide To Fractions
18 pages
Real Analysis HW2
No ratings yet
Real Analysis HW2
6 pages
Chapter 11 - Linear Equations II
No ratings yet
Chapter 11 - Linear Equations II
6 pages
Kasami and Gold Sequences
50% (2)
Kasami and Gold Sequences
21 pages
CPM CA Algebra Connections Overview
No ratings yet
CPM CA Algebra Connections Overview
28 pages
Instructors Solutions Manual For Elementary Linear Algebra With Applications 9th Edition Ebook PDF
No ratings yet
Instructors Solutions Manual For Elementary Linear Algebra With Applications 9th Edition Ebook PDF
89 pages
Balki QM Problem Set PDF
No ratings yet
Balki QM Problem Set PDF
50 pages
Linear Equation in Two Variables
No ratings yet
Linear Equation in Two Variables
14 pages
Local Media3435115098460605877
No ratings yet
Local Media3435115098460605877
10 pages
SM 38
No ratings yet
SM 38
16 pages