Sparse principal component analysis subject to prespecified cardinality of loadings

Adachi, Kohei; Trendafilov, Nickolay T.

doi:10.1007/s00180-015-0608-4

Sparse principal component analysis subject to prespecified cardinality of loadings

Original Paper
Published: 22 July 2015

Volume 31, pages 1403–1427, (2016)
Cite this article

Computational Statistics Aims and scope Submit manuscript

Kohei Adachi¹ &
Nickolay T. Trendafilov²

859 Accesses
Explore all metrics

Abstract

Most of the existing procedures for sparse principal component analysis (PCA) use a penalty function to obtain a sparse matrix of weights by which a data matrix is post-multiplied to produce PC scores. In this paper, we propose a new sparse PCA procedure which differs from the existing ones in two ways. First, the new procedure does not sparsify the weight matrix. Instead, the so-called loadings matrix is sparsified by which the score matrix is post-multiplied to approximate the data matrix. Second, the cardinality of the loading matrix i.e., the total number of nonzero loadings, is pre-specified to be an integer without using penalty functions. The procedure is called unpenalized sparse loading PCA (USLPCA). A desirable property of USLPCA is that the indices for the percentages of explained variances can be defined in the same form as in the standard PCA. We develop an alternate least squares algorithm for USLPCA which uses the fact that the PCA loss function can be decomposed as a sum of a term irrelevant to the loadings, and another one being easily minimized under cardinality constraints. A procedure is also presented for selecting the best cardinality using information criteria. The procedures are assessed in a simulation study and illustrated with real data examples.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Solving sparse principal component analysis with global support

Article 19 July 2022

A General Null Space Property for Sparse Principal Component Analysis

Article 02 March 2022

Principal component selection via adaptive regularization method and generalized information criterion

Article 07 July 2015

References

Akaike H (1974) A new look at the statistical model identification. IEEE Trans Automat Contr 19:716–723
Article MathSciNet MATH Google Scholar
d’Aspremont A, Bach F, Ghaoui LE (2008) Optimal solutions for sparse principal component analysis. J Mach Learn Res 9:1269–1294
MathSciNet MATH Google Scholar
Eckart C, Young G (1936) The approximation of one matrix by another of lower rank. Psychometrika 1:211–218
Article MATH Google Scholar
Enki DG, Trendafilov NT (2012) Sparse principal components by semi-partition clustering. Comput Stat 27:605–626
Article MathSciNet MATH Google Scholar
Izenman AJ (2008) Modern multivariate statistical techniques: regression, classification, and manifold learning. Springer, New York
Book MATH Google Scholar
Jeffers JNR (1967) Two case studies in the application of principal component analysis. Appl Stat 16:225–236
Article Google Scholar
Jolliffe IT (2002) Principal component analysis, 2nd edn. Springer, New York
MATH Google Scholar
Jolliffe IT, Trendafilov NT, Uddin M (2003) A modified principal component technique based on the LASSO. J Comput Graph Stat 12:531–547
Article MathSciNet Google Scholar
Journée M, Nesterov Y, Richtárik P, Sepulchre R (2010) Generalized power method for sparse principal component analysis. J Mach Learn Res 11:517–553
MathSciNet MATH Google Scholar
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464
Article MathSciNet MATH Google Scholar
Seber GAF (2008) A matrix handbook for statisticians. Wiley, Hoboken
MATH Google Scholar
Shen H, Huang JZ (2008) Sparse principal component analysis via regularized low rank matrix approximation. J Multivar Anal 99:1015–1034
Article MathSciNet MATH Google Scholar
SPSS Inc (1997) SPSS 7.5 statistical algorithms. SPSS Inc, Chicago
Takane Y (2014) Constrained principal component analysis and related techniques. CRC Press, Boca Raton
MATH Google Scholar
Trendafilov NT (2014) From simple structure to sparse components: a review. Comput Stat 29:431–454
Article MathSciNet MATH Google Scholar
Trendafilov NT, Adachi K (2015) Sparse versus simple structure loadings. Psychometrika. doi:10.1007/s11336-014-9416-y
Van Deun K, Wilderjans TF, van den Berg RA, Antoiadis A, Van Mechelen I (2011) A flexible framework for sparse simultaneous component based data integration. BMC Bioinformatics 12:448–464
Article Google Scholar
Witten DM, Tibshirani R, Hastie T (2009) A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 10:515–534
Article Google Scholar
Yeung KY, Ruzzo WL (2001) Principal component analysis for clustering gene expression data. Bioinformatics 17:763–774
Article Google Scholar
Zou DM, Hastie T, Tibshirani R (2006) Sparse principal component analysis. J Comput Graph Stat 15:265–286
Article MathSciNet Google Scholar

Download references

Acknowledgments

This work is supported by a Grant RPG-2013-211 from the Leverhulme Trust, UK and Grant (C)-26330039 from the Japan Society for the Promotion of Science.

Author information

Authors and Affiliations

Graduate School of Human Sciences, Osaka University, 1-2 Yamadaoka, Suita, Osaka, 565-0871, Japan
Kohei Adachi
Department of Mathematics and Statistics, Open University, Walton Hall, Milton Keynes, MK7 6AA, UK
Nickolay T. Trendafilov

Authors

Kohei Adachi
View author publications
You can also search for this author in PubMed Google Scholar
Nickolay T. Trendafilov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kohei Adachi.

Appendices

1.1 Multiple-runs procedure

To choose the number of the runs of the USLPCA algorithm, we assume that the value of the objective function (3.12) should decrease monotonically with the increase of $\hbox {Card}(\mathbf{A}) = c$ in the constraint (1.3):

$$\begin{aligned} LS_{\mathrm{N}} \left( {\mathbf{A}_c } \right) \le LS_{\mathrm{N}} \left( {\mathbf{A}_{c -1} } \right) \end{aligned}$$

(9.1)

with $\mathbf{A}_{c}$ and $\mathbf{A}_{c-1}$ being the optimal solutions with $\hbox {Card}(\mathbf{A}) = c$ and $\hbox {Card}(\mathbf{A}) =c - 1$ respectively. That is, we run the algorithm until $\mathbf{A}_{c}$ satisfying (9.1) is found. Let $\mathbf{A}_{ck}$ denote the solution of A resulting in the $k\hbox {th}$ run. Then, our multiple-runs procedure for $\hbox {Card}(\mathbf{A}) = c$ is described as follows:

1.
Run the algorithm $K_{c} = 50$ times and set $\mathbf{A}_{c} = \hbox {argmin}_{1\le k\le K_{c}} \, LS_{\mathrm{N}}(\mathbf{A}_{ck})$.
2.
Finish if $\mathbf{A}_{c}$ satisfies (9.1); otherwise go to 3.
3.
Increase $K_{c}$ by one, run the algorithm, and set $\mathbf{A}_{c} = \hbox {argmin}_{1\le k\le K_{c}}\, LS_{\mathrm{N}}(\mathbf{A}_{ck})$.
4.
Finish if $\mathbf{A}_{c}$ satisfies (9.1) or $K_{c} = 1000$; otherwise back to 3.

Here, the number of runs is denoted by $K_{c}$ with subscript c, as it cannot be the same among different c values.

When $K_{c}= 1$, the initial loading matrix A is taken to be the matrix of the standard PCA loadings. For $K_{c} > 1$, each element A is set to $a_{\max }\times u_{[-1, 1]}$, with $u_{[-1, 1]}$ a variable following the uniform distribution over the range [$-$1, 1] and $a_{\max }$ being the maximum absolute value of the elements in the initial A for the first run.

The value of $LS_{\mathrm{N}}(\mathbf{A}_{c-1})$ must be known before the above multiple-runs procedure is carried out with $\hbox {Card}(\mathbf{A}) = c$. Thus, the procedure (2.5) should be applied for increasing sequence of values $c=c_{\min }, {\ldots }, c_{\max }$. We thus evaluate I(c) with increasing c from $c_{\min }$ to $c_{\max }$ one by one. Only when $c=c_{\min }$, the multiple-runs procedure only consist of 1 and 2.

1.2 No correlation for components and errors

Here, we prove that no correlation exists between the two terms in the right-side hand of (4.4), i.e. the columns of $\mathbf{f}_{j}\mathbf{a}_{j}^{\prime }$ is uncorrelated with those of $\mathbf{H}_{[j]}=\Sigma _{k \ne j}\mathbf{f}_{k}\mathbf{a}_{k}^{\prime } + \mathbf{E}$.

The proof can be attained by showing that

[1]
$\mathbf{f}_{j}\mathbf{a}_{j}^{\prime }$ and $\mathbf{H}_{[j]}$ are column-centered;
[2]
$(\mathbf{f}_{j}\mathbf{a}_{j}^{\prime })^{\prime }\mathbf{H}_{[j]} = \mathbf{O}$.

First, [1] follows from that X and F being column-centered implies $\mathbf{f}_{j}\mathbf{a}_{j}^{\prime },\, \Sigma _{k\ne j}\mathbf{f}_{k}\mathbf{a}_{k}^{\prime }$, and $\mathbf{E} = \mathbf{X} - \mathbf{FA}^{\prime }$ being also column-centered.

Next, [2] can be proved by showing that $(\mathbf{f}_{j}\mathbf{a}_{j}^{\prime })^{\prime }\Sigma _{k\ne j}\mathbf{f}_{k}\mathbf{a}_{k}^{\prime } = \mathbf{O}$ and $(\mathbf{f}_{j}\mathbf{a}_{j}^{\prime })^{\prime }\mathbf{E} = \mathbf{O}$. The former equality follows from that (1.2) implies $\mathbf{f}_{j}^{\prime }\mathbf{f}_{k} = 0 (k\ne j)$. The left-side hand of the latter equality is expanded as

$$\begin{aligned} (\mathbf{f}_j \mathbf{a}_j ^{\prime } )^{\prime } \mathbf{E}=\hbox { }(\mathbf{f}_j \mathbf{a}_j ^{\prime } )^{\prime } (\mathbf{X}-\hbox { }\mathbf{FA}^{\prime } )=\mathbf{a}_j \mathbf{f}_j ^{\prime } \mathbf{X}\hbox { }-\hbox { }\mathbf{a}_j \mathbf{f}_j ^{\prime } \mathbf{FA}^{\prime } . \end{aligned}$$

(9.2)

Here, $\mathbf{a}_{j}\mathbf{f}_{j}^{\prime }\mathbf{X}$ can be rewritten into

$$\begin{aligned} \mathbf{a}_j \mathbf{f}_j ^{\prime } \mathbf{X}=n\mathbf{a}_j \mathbf{b}_j ^{\prime } \end{aligned}$$

(9.3)

with $\mathbf{b}_{j} = \mathbf{X}^{\prime }\mathbf{f}_{j}$ the $j\hbox {th}$ column of B defined in (2.3), while $\mathbf{a}_{j}\mathbf{f}_{j}^{\prime }\mathbf{FA}^{\prime }$ can be expressed as

$$\begin{aligned} \mathbf{a}_j \mathbf{f}_j^{\prime } \mathbf{FA}^{\prime }=n\mathbf{a}_j \mathbf{a}_j^{\prime } \end{aligned}$$

(9.4)

using (1.2). The Eq. (3.10) implies the equality between (9.2) and (9.3), which leads to that (9.2) equals O. It completes the proof.

1.3 Likelihood for PCA model

Normality assumption (5.1) implies that the log likelihood for X is expressed as

$$\begin{aligned} l(\mathbf{F},\mathbf{A},\sigma ^{2})= -\frac{np}{2}\log 2\pi -\frac{np}{2}\log \sigma ^{2}-\frac{1}{2\sigma ^{2}}\Vert \mathbf{X}-\mathbf{FA}^{\prime } \Vert ^{2} \end{aligned}$$

(9.5)

By solving the equation ${ dl}(\mathbf{F}, \mathbf{A}, \sigma ^{2})/d\sigma ^{2} = 0$, we can find the ML estimate of $\sigma ^{2}$ must satisfy $\sigma ^{2} = ( np)^{-1}\Vert \mathbf{X} -\mathbf{FA}^{\prime }\Vert ^{2 }$. Substituting this into (9.5) lead to

$$\begin{aligned} l\left( {\mathbf{F},\mathbf{A}} \right) = -\frac{np}{2}\log 2\pi +\frac{np}{2}\log np-\frac{np}{2}\log \Vert \mathbf{X}-\mathbf{FA}^{\prime } \Vert ^{2}-\frac{np}{2}, \end{aligned}$$

(9.6)

whose part relevant to F and A is expressed as (5.2).

1.4 Component-wise constrained USLPCA

In the preliminary analysis of Yeung and Ruzzo’s (2001) data, we used a version of USLPCA in which (1.3) is replaced by

$$\begin{aligned} \hbox {card}(\mathbf{a}_{j})=c_{j} \end{aligned}$$

(9.7)

for $j = 1, {\ldots }, m$, with $\hbox {card}(\mathbf{a}_{j})$ the cardinality of the $j\hbox {th}$ column of A and $c_{j}$ an integer. The algorithm for this version is the same as in Sect. 3, except that (3.9) is replaced by setting

$$\begin{aligned} a_{ij} =\left\{ {\begin{array}{l@{\quad }l} 0 &{} iff\;b_{ij}^2 \le b_{[q]j}^2 \\ b_{ij} &{} \hbox {otherwise} \\ \end{array}}\right. \end{aligned}$$

(9.8)

over $j = 1, {\ldots }, m$. Here, $b_{[q]j}^{2}$ denotes the $q_{j}\hbox {th}$ smallest value among the squares of the elements in $\mathbf{b}_{j}$.

As described in Sect. 7.2, the 384 variables in the data are classified into the five clusters (genes). We performed the above version of USLPCA with $m = 5$ by setting the cardinality $c_{j}$ in (9.6) to the number of the variables belonging to each cluster. If the five clusters correspond to the components columns, the nonzero loadings would be obtained that stand for the cluster memberships of the variables. However, the resulting solution did not have such a feature, with including a trivial component whose PEV was very low (3.2 %).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Adachi, K., Trendafilov, N.T. Sparse principal component analysis subject to prespecified cardinality of loadings. Comput Stat 31, 1403–1427 (2016). https://doi.org/10.1007/s00180-015-0608-4

Download citation

Received: 17 February 2015
Accepted: 06 July 2015
Published: 22 July 2015
Issue Date: December 2016
DOI: https://doi.org/10.1007/s00180-015-0608-4

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sparse principal component analysis subject to prespecified cardinality of loadings

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Solving sparse principal component analysis with global support

A General Null Space Property for Sparse Principal Component Analysis

Principal component selection via adaptive regularization method and generalized information criterion

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendices

1.1 Multiple-runs procedure

1.2 No correlation for components and errors

1.3 Likelihood for PCA model

1.4 Component-wise constrained USLPCA

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Sparse principal component analysis subject to prespecified cardinality of loadings

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Solving sparse principal component analysis with global support

A General Null Space Property for Sparse Principal Component Analysis

Principal component selection via adaptive regularization method and generalized information criterion

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendices

1.1 Multiple-runs procedure

1.2 No correlation for components and errors

1.3 Likelihood for PCA model

1.4 Component-wise constrained USLPCA

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation