Abstract
Most of the existing procedures for sparse principal component analysis (PCA) use a penalty function to obtain a sparse matrix of weights by which a data matrix is post-multiplied to produce PC scores. In this paper, we propose a new sparse PCA procedure which differs from the existing ones in two ways. First, the new procedure does not sparsify the weight matrix. Instead, the so-called loadings matrix is sparsified by which the score matrix is post-multiplied to approximate the data matrix. Second, the cardinality of the loading matrix i.e., the total number of nonzero loadings, is pre-specified to be an integer without using penalty functions. The procedure is called unpenalized sparse loading PCA (USLPCA). A desirable property of USLPCA is that the indices for the percentages of explained variances can be defined in the same form as in the standard PCA. We develop an alternate least squares algorithm for USLPCA which uses the fact that the PCA loss function can be decomposed as a sum of a term irrelevant to the loadings, and another one being easily minimized under cardinality constraints. A procedure is also presented for selecting the best cardinality using information criteria. The procedures are assessed in a simulation study and illustrated with real data examples.




Similar content being viewed by others
References
Akaike H (1974) A new look at the statistical model identification. IEEE Trans Automat Contr 19:716–723
d’Aspremont A, Bach F, Ghaoui LE (2008) Optimal solutions for sparse principal component analysis. J Mach Learn Res 9:1269–1294
Eckart C, Young G (1936) The approximation of one matrix by another of lower rank. Psychometrika 1:211–218
Enki DG, Trendafilov NT (2012) Sparse principal components by semi-partition clustering. Comput Stat 27:605–626
Izenman AJ (2008) Modern multivariate statistical techniques: regression, classification, and manifold learning. Springer, New York
Jeffers JNR (1967) Two case studies in the application of principal component analysis. Appl Stat 16:225–236
Jolliffe IT (2002) Principal component analysis, 2nd edn. Springer, New York
Jolliffe IT, Trendafilov NT, Uddin M (2003) A modified principal component technique based on the LASSO. J Comput Graph Stat 12:531–547
Journée M, Nesterov Y, Richtárik P, Sepulchre R (2010) Generalized power method for sparse principal component analysis. J Mach Learn Res 11:517–553
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464
Seber GAF (2008) A matrix handbook for statisticians. Wiley, Hoboken
Shen H, Huang JZ (2008) Sparse principal component analysis via regularized low rank matrix approximation. J Multivar Anal 99:1015–1034
SPSS Inc (1997) SPSS 7.5 statistical algorithms. SPSS Inc, Chicago
Takane Y (2014) Constrained principal component analysis and related techniques. CRC Press, Boca Raton
Trendafilov NT (2014) From simple structure to sparse components: a review. Comput Stat 29:431–454
Trendafilov NT, Adachi K (2015) Sparse versus simple structure loadings. Psychometrika. doi:10.1007/s11336-014-9416-y
Van Deun K, Wilderjans TF, van den Berg RA, Antoiadis A, Van Mechelen I (2011) A flexible framework for sparse simultaneous component based data integration. BMC Bioinformatics 12:448–464
Witten DM, Tibshirani R, Hastie T (2009) A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 10:515–534
Yeung KY, Ruzzo WL (2001) Principal component analysis for clustering gene expression data. Bioinformatics 17:763–774
Zou DM, Hastie T, Tibshirani R (2006) Sparse principal component analysis. J Comput Graph Stat 15:265–286
Acknowledgments
This work is supported by a Grant RPG-2013-211 from the Leverhulme Trust, UK and Grant (C)-26330039 from the Japan Society for the Promotion of Science.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendices
1.1 Multiple-runs procedure
To choose the number of the runs of the USLPCA algorithm, we assume that the value of the objective function (3.12) should decrease monotonically with the increase of \(\hbox {Card}(\mathbf{A}) = c\) in the constraint (1.3):
with \(\mathbf{A}_{c}\) and \(\mathbf{A}_{c-1}\) being the optimal solutions with \(\hbox {Card}(\mathbf{A}) = c\) and \(\hbox {Card}(\mathbf{A}) =c - 1\) respectively. That is, we run the algorithm until \(\mathbf{A}_{c}\) satisfying (9.1) is found. Let \(\mathbf{A}_{ck}\) denote the solution of A resulting in the \(k\hbox {th}\) run. Then, our multiple-runs procedure for \(\hbox {Card}(\mathbf{A}) = c\) is described as follows:
-
1.
Run the algorithm \(K_{c} = 50\) times and set \(\mathbf{A}_{c} = \hbox {argmin}_{1\le k\le K_{c}} \, LS_{\mathrm{N}}(\mathbf{A}_{ck})\).
-
2.
Finish if \(\mathbf{A}_{c}\) satisfies (9.1); otherwise go to 3.
-
3.
Increase \(K_{c}\) by one, run the algorithm, and set \(\mathbf{A}_{c} = \hbox {argmin}_{1\le k\le K_{c}}\, LS_{\mathrm{N}}(\mathbf{A}_{ck})\).
-
4.
Finish if \(\mathbf{A}_{c}\) satisfies (9.1) or \(K_{c} = 1000\); otherwise back to 3.
Here, the number of runs is denoted by \(K_{c}\) with subscript c, as it cannot be the same among different c values.
When \(K_{c}= 1\), the initial loading matrix A is taken to be the matrix of the standard PCA loadings. For \(K_{c} > 1\), each element A is set to \(a_{\max }\times u_{[-1, 1]}\), with \(u_{[-1, 1]}\) a variable following the uniform distribution over the range [\(-\)1, 1] and \(a_{\max }\) being the maximum absolute value of the elements in the initial A for the first run.
The value of \(LS_{\mathrm{N}}(\mathbf{A}_{c-1})\) must be known before the above multiple-runs procedure is carried out with \(\hbox {Card}(\mathbf{A}) = c\). Thus, the procedure (2.5) should be applied for increasing sequence of values \(c=c_{\min }, {\ldots }, c_{\max }\). We thus evaluate I(c) with increasing c from \(c_{\min }\) to \(c_{\max }\) one by one. Only when \(c=c_{\min }\), the multiple-runs procedure only consist of 1 and 2.
1.2 No correlation for components and errors
Here, we prove that no correlation exists between the two terms in the right-side hand of (4.4), i.e. the columns of \(\mathbf{f}_{j}\mathbf{a}_{j}^{\prime }\) is uncorrelated with those of \(\mathbf{H}_{[j]}=\Sigma _{k \ne j}\mathbf{f}_{k}\mathbf{a}_{k}^{\prime } + \mathbf{E}\).
The proof can be attained by showing that
-
[1]
\(\mathbf{f}_{j}\mathbf{a}_{j}^{\prime }\) and \(\mathbf{H}_{[j]}\) are column-centered;
-
[2]
\((\mathbf{f}_{j}\mathbf{a}_{j}^{\prime })^{\prime }\mathbf{H}_{[j]} = \mathbf{O}\).
First, [1] follows from that X and F being column-centered implies \(\mathbf{f}_{j}\mathbf{a}_{j}^{\prime },\, \Sigma _{k\ne j}\mathbf{f}_{k}\mathbf{a}_{k}^{\prime }\), and \(\mathbf{E} = \mathbf{X} - \mathbf{FA}^{\prime }\) being also column-centered.
Next, [2] can be proved by showing that \((\mathbf{f}_{j}\mathbf{a}_{j}^{\prime })^{\prime }\Sigma _{k\ne j}\mathbf{f}_{k}\mathbf{a}_{k}^{\prime } = \mathbf{O}\) and \((\mathbf{f}_{j}\mathbf{a}_{j}^{\prime })^{\prime }\mathbf{E} = \mathbf{O}\). The former equality follows from that (1.2) implies \(\mathbf{f}_{j}^{\prime }\mathbf{f}_{k} = 0 (k\ne j)\). The left-side hand of the latter equality is expanded as
Here, \(\mathbf{a}_{j}\mathbf{f}_{j}^{\prime }\mathbf{X}\) can be rewritten into
with \(\mathbf{b}_{j} = \mathbf{X}^{\prime }\mathbf{f}_{j}\) the \(j\hbox {th}\) column of B defined in (2.3), while \(\mathbf{a}_{j}\mathbf{f}_{j}^{\prime }\mathbf{FA}^{\prime }\) can be expressed as
using (1.2). The Eq. (3.10) implies the equality between (9.2) and (9.3), which leads to that (9.2) equals O. It completes the proof.
1.3 Likelihood for PCA model
Normality assumption (5.1) implies that the log likelihood for X is expressed as
By solving the equation \({ dl}(\mathbf{F}, \mathbf{A}, \sigma ^{2})/d\sigma ^{2} = 0\), we can find the ML estimate of \(\sigma ^{2}\) must satisfy \(\sigma ^{2} = ( np)^{-1}\Vert \mathbf{X} -\mathbf{FA}^{\prime }\Vert ^{2 }\). Substituting this into (9.5) lead to
whose part relevant to F and A is expressed as (5.2).
1.4 Component-wise constrained USLPCA
In the preliminary analysis of Yeung and Ruzzo’s (2001) data, we used a version of USLPCA in which (1.3) is replaced by
for \(j = 1, {\ldots }, m\), with \(\hbox {card}(\mathbf{a}_{j})\) the cardinality of the \(j\hbox {th}\) column of A and \(c_{j}\) an integer. The algorithm for this version is the same as in Sect. 3, except that (3.9) is replaced by setting
over \(j = 1, {\ldots }, m\). Here, \(b_{[q]j}^{2}\) denotes the \(q_{j}\hbox {th}\) smallest value among the squares of the elements in \(\mathbf{b}_{j}\).
As described in Sect. 7.2, the 384 variables in the data are classified into the five clusters (genes). We performed the above version of USLPCA with \(m = 5\) by setting the cardinality \(c_{j}\) in (9.6) to the number of the variables belonging to each cluster. If the five clusters correspond to the components columns, the nonzero loadings would be obtained that stand for the cluster memberships of the variables. However, the resulting solution did not have such a feature, with including a trivial component whose PEV was very low (3.2 %).
Rights and permissions
About this article
Cite this article
Adachi, K., Trendafilov, N.T. Sparse principal component analysis subject to prespecified cardinality of loadings. Comput Stat 31, 1403–1427 (2016). https://doi.org/10.1007/s00180-015-0608-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-015-0608-4