Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Sparse principal component analysis subject to prespecified cardinality of loadings

  • Original Paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

Most of the existing procedures for sparse principal component analysis (PCA) use a penalty function to obtain a sparse matrix of weights by which a data matrix is post-multiplied to produce PC scores. In this paper, we propose a new sparse PCA procedure which differs from the existing ones in two ways. First, the new procedure does not sparsify the weight matrix. Instead, the so-called loadings matrix is sparsified by which the score matrix is post-multiplied to approximate the data matrix. Second, the cardinality of the loading matrix i.e., the total number of nonzero loadings, is pre-specified to be an integer without using penalty functions. The procedure is called unpenalized sparse loading PCA (USLPCA). A desirable property of USLPCA is that the indices for the percentages of explained variances can be defined in the same form as in the standard PCA. We develop an alternate least squares algorithm for USLPCA which uses the fact that the PCA loss function can be decomposed as a sum of a term irrelevant to the loadings, and another one being easily minimized under cardinality constraints. A procedure is also presented for selecting the best cardinality using information criteria. The procedures are assessed in a simulation study and illustrated with real data examples.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Akaike H (1974) A new look at the statistical model identification. IEEE Trans Automat Contr 19:716–723

    Article  MathSciNet  MATH  Google Scholar 

  • d’Aspremont A, Bach F, Ghaoui LE (2008) Optimal solutions for sparse principal component analysis. J Mach Learn Res 9:1269–1294

    MathSciNet  MATH  Google Scholar 

  • Eckart C, Young G (1936) The approximation of one matrix by another of lower rank. Psychometrika 1:211–218

    Article  MATH  Google Scholar 

  • Enki DG, Trendafilov NT (2012) Sparse principal components by semi-partition clustering. Comput Stat 27:605–626

    Article  MathSciNet  MATH  Google Scholar 

  • Izenman AJ (2008) Modern multivariate statistical techniques: regression, classification, and manifold learning. Springer, New York

    Book  MATH  Google Scholar 

  • Jeffers JNR (1967) Two case studies in the application of principal component analysis. Appl Stat 16:225–236

    Article  Google Scholar 

  • Jolliffe IT (2002) Principal component analysis, 2nd edn. Springer, New York

    MATH  Google Scholar 

  • Jolliffe IT, Trendafilov NT, Uddin M (2003) A modified principal component technique based on the LASSO. J Comput Graph Stat 12:531–547

    Article  MathSciNet  Google Scholar 

  • Journée M, Nesterov Y, Richtárik P, Sepulchre R (2010) Generalized power method for sparse principal component analysis. J Mach Learn Res 11:517–553

    MathSciNet  MATH  Google Scholar 

  • Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464

    Article  MathSciNet  MATH  Google Scholar 

  • Seber GAF (2008) A matrix handbook for statisticians. Wiley, Hoboken

    MATH  Google Scholar 

  • Shen H, Huang JZ (2008) Sparse principal component analysis via regularized low rank matrix approximation. J Multivar Anal 99:1015–1034

    Article  MathSciNet  MATH  Google Scholar 

  • SPSS Inc (1997) SPSS 7.5 statistical algorithms. SPSS Inc, Chicago

  • Takane Y (2014) Constrained principal component analysis and related techniques. CRC Press, Boca Raton

    MATH  Google Scholar 

  • Trendafilov NT (2014) From simple structure to sparse components: a review. Comput Stat 29:431–454

    Article  MathSciNet  MATH  Google Scholar 

  • Trendafilov NT, Adachi K (2015) Sparse versus simple structure loadings. Psychometrika. doi:10.1007/s11336-014-9416-y

  • Van Deun K, Wilderjans TF, van den Berg RA, Antoiadis A, Van Mechelen I (2011) A flexible framework for sparse simultaneous component based data integration. BMC Bioinformatics 12:448–464

    Article  Google Scholar 

  • Witten DM, Tibshirani R, Hastie T (2009) A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 10:515–534

    Article  Google Scholar 

  • Yeung KY, Ruzzo WL (2001) Principal component analysis for clustering gene expression data. Bioinformatics 17:763–774

    Article  Google Scholar 

  • Zou DM, Hastie T, Tibshirani R (2006) Sparse principal component analysis. J Comput Graph Stat 15:265–286

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgments

This work is supported by a Grant RPG-2013-211 from the Leverhulme Trust, UK and Grant (C)-26330039 from the Japan Society for the Promotion of Science.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kohei Adachi.

Appendices

Appendices

1.1 Multiple-runs procedure

To choose the number of the runs of the USLPCA algorithm, we assume that the value of the objective function (3.12) should decrease monotonically with the increase of \(\hbox {Card}(\mathbf{A}) = c\) in the constraint (1.3):

$$\begin{aligned} LS_{\mathrm{N}} \left( {\mathbf{A}_c } \right) \le LS_{\mathrm{N}} \left( {\mathbf{A}_{c -1} } \right) \end{aligned}$$
(9.1)

with \(\mathbf{A}_{c}\) and \(\mathbf{A}_{c-1}\) being the optimal solutions with \(\hbox {Card}(\mathbf{A}) = c\) and \(\hbox {Card}(\mathbf{A}) =c - 1\) respectively. That is, we run the algorithm until \(\mathbf{A}_{c}\) satisfying (9.1) is found. Let \(\mathbf{A}_{ck}\) denote the solution of A resulting in the \(k\hbox {th}\) run. Then, our multiple-runs procedure for \(\hbox {Card}(\mathbf{A}) = c\) is described as follows:

  1. 1.

    Run the algorithm \(K_{c} = 50\) times and set \(\mathbf{A}_{c} = \hbox {argmin}_{1\le k\le K_{c}} \, LS_{\mathrm{N}}(\mathbf{A}_{ck})\).

  2. 2.

    Finish if \(\mathbf{A}_{c}\) satisfies (9.1); otherwise go to 3.

  3. 3.

    Increase \(K_{c}\) by one, run the algorithm, and set \(\mathbf{A}_{c} = \hbox {argmin}_{1\le k\le K_{c}}\, LS_{\mathrm{N}}(\mathbf{A}_{ck})\).

  4. 4.

    Finish if \(\mathbf{A}_{c}\) satisfies (9.1) or \(K_{c} = 1000\); otherwise back to 3.

Here, the number of runs is denoted by \(K_{c}\) with subscript c, as it cannot be the same among different c values.

When \(K_{c}= 1\), the initial loading matrix A is taken to be the matrix of the standard PCA loadings. For \(K_{c} > 1\), each element A is set to \(a_{\max }\times u_{[-1, 1]}\), with \(u_{[-1, 1]}\) a variable following the uniform distribution over the range [\(-\)1, 1] and \(a_{\max }\) being the maximum absolute value of the elements in the initial A for the first run.

The value of \(LS_{\mathrm{N}}(\mathbf{A}_{c-1})\) must be known before the above multiple-runs procedure is carried out with \(\hbox {Card}(\mathbf{A}) = c\). Thus, the procedure (2.5) should be applied for increasing sequence of values \(c=c_{\min }, {\ldots }, c_{\max }\). We thus evaluate I(c) with increasing c from \(c_{\min }\) to \(c_{\max }\) one by one. Only when \(c=c_{\min }\), the multiple-runs procedure only consist of 1 and 2.

1.2 No correlation for components and errors

Here, we prove that no correlation exists between the two terms in the right-side hand of (4.4), i.e. the columns of \(\mathbf{f}_{j}\mathbf{a}_{j}^{\prime }\) is uncorrelated with those of \(\mathbf{H}_{[j]}=\Sigma _{k \ne j}\mathbf{f}_{k}\mathbf{a}_{k}^{\prime } + \mathbf{E}\).

The proof can be attained by showing that

  1. [1]

    \(\mathbf{f}_{j}\mathbf{a}_{j}^{\prime }\) and \(\mathbf{H}_{[j]}\) are column-centered;

  2. [2]

    \((\mathbf{f}_{j}\mathbf{a}_{j}^{\prime })^{\prime }\mathbf{H}_{[j]} = \mathbf{O}\).

First, [1] follows from that X and F being column-centered implies \(\mathbf{f}_{j}\mathbf{a}_{j}^{\prime },\, \Sigma _{k\ne j}\mathbf{f}_{k}\mathbf{a}_{k}^{\prime }\), and \(\mathbf{E} = \mathbf{X} - \mathbf{FA}^{\prime }\) being also column-centered.

Next, [2] can be proved by showing that \((\mathbf{f}_{j}\mathbf{a}_{j}^{\prime })^{\prime }\Sigma _{k\ne j}\mathbf{f}_{k}\mathbf{a}_{k}^{\prime } = \mathbf{O}\) and \((\mathbf{f}_{j}\mathbf{a}_{j}^{\prime })^{\prime }\mathbf{E} = \mathbf{O}\). The former equality follows from that (1.2) implies \(\mathbf{f}_{j}^{\prime }\mathbf{f}_{k} = 0 (k\ne j)\). The left-side hand of the latter equality is expanded as

$$\begin{aligned} (\mathbf{f}_j \mathbf{a}_j ^{\prime } )^{\prime } \mathbf{E}=\hbox { }(\mathbf{f}_j \mathbf{a}_j ^{\prime } )^{\prime } (\mathbf{X}-\hbox { }\mathbf{FA}^{\prime } )=\mathbf{a}_j \mathbf{f}_j ^{\prime } \mathbf{X}\hbox { }-\hbox { }\mathbf{a}_j \mathbf{f}_j ^{\prime } \mathbf{FA}^{\prime } . \end{aligned}$$
(9.2)

Here, \(\mathbf{a}_{j}\mathbf{f}_{j}^{\prime }\mathbf{X}\) can be rewritten into

$$\begin{aligned} \mathbf{a}_j \mathbf{f}_j ^{\prime } \mathbf{X}=n\mathbf{a}_j \mathbf{b}_j ^{\prime } \end{aligned}$$
(9.3)

with \(\mathbf{b}_{j} = \mathbf{X}^{\prime }\mathbf{f}_{j}\) the \(j\hbox {th}\) column of B defined in (2.3), while \(\mathbf{a}_{j}\mathbf{f}_{j}^{\prime }\mathbf{FA}^{\prime }\) can be expressed as

$$\begin{aligned} \mathbf{a}_j \mathbf{f}_j^{\prime } \mathbf{FA}^{\prime }=n\mathbf{a}_j \mathbf{a}_j^{\prime } \end{aligned}$$
(9.4)

using (1.2). The Eq. (3.10) implies the equality between (9.2) and (9.3), which leads to that (9.2) equals O. It completes the proof.

1.3 Likelihood for PCA model

Normality assumption (5.1) implies that the log likelihood for X is expressed as

$$\begin{aligned} l(\mathbf{F},\mathbf{A},\sigma ^{2})= -\frac{np}{2}\log 2\pi -\frac{np}{2}\log \sigma ^{2}-\frac{1}{2\sigma ^{2}}\Vert \mathbf{X}-\mathbf{FA}^{\prime } \Vert ^{2} \end{aligned}$$
(9.5)

By solving the equation \({ dl}(\mathbf{F}, \mathbf{A}, \sigma ^{2})/d\sigma ^{2} = 0\), we can find the ML estimate of \(\sigma ^{2}\) must satisfy \(\sigma ^{2} = ( np)^{-1}\Vert \mathbf{X} -\mathbf{FA}^{\prime }\Vert ^{2 }\). Substituting this into (9.5) lead to

$$\begin{aligned} l\left( {\mathbf{F},\mathbf{A}} \right) = -\frac{np}{2}\log 2\pi +\frac{np}{2}\log np-\frac{np}{2}\log \Vert \mathbf{X}-\mathbf{FA}^{\prime } \Vert ^{2}-\frac{np}{2}, \end{aligned}$$
(9.6)

whose part relevant to F and A is expressed as (5.2).

1.4 Component-wise constrained USLPCA

In the preliminary analysis of Yeung and Ruzzo’s (2001) data, we used a version of USLPCA in which (1.3) is replaced by

$$\begin{aligned} \hbox {card}(\mathbf{a}_{j})=c_{j} \end{aligned}$$
(9.7)

for \(j = 1, {\ldots }, m\), with \(\hbox {card}(\mathbf{a}_{j})\) the cardinality of the \(j\hbox {th}\) column of A and \(c_{j}\) an integer. The algorithm for this version is the same as in Sect. 3, except that (3.9) is replaced by setting

$$\begin{aligned} a_{ij} =\left\{ {\begin{array}{l@{\quad }l} 0 &{} iff\;b_{ij}^2 \le b_{[q]j}^2 \\ b_{ij} &{} \hbox {otherwise} \\ \end{array}}\right. \end{aligned}$$
(9.8)

over \(j = 1, {\ldots }, m\). Here, \(b_{[q]j}^{2}\) denotes the \(q_{j}\hbox {th}\) smallest value among the squares of the elements in \(\mathbf{b}_{j}\).

As described in Sect. 7.2, the 384 variables in the data are classified into the five clusters (genes). We performed the above version of USLPCA with \(m = 5\) by setting the cardinality \(c_{j}\) in (9.6) to the number of the variables belonging to each cluster. If the five clusters correspond to the components columns, the nonzero loadings would be obtained that stand for the cluster memberships of the variables. However, the resulting solution did not have such a feature, with including a trivial component whose PEV was very low (3.2 %).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Adachi, K., Trendafilov, N.T. Sparse principal component analysis subject to prespecified cardinality of loadings. Comput Stat 31, 1403–1427 (2016). https://doi.org/10.1007/s00180-015-0608-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-015-0608-4

Keywords