Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

A plug-in approach to sparse and robust principal component analysis

  • Original Paper
  • Published:
TEST Aims and scope Submit manuscript

Abstract

We propose a method for sparse and robust principal component analysis. The methodology is structured in two steps: first, a robust estimate of the covariance matrix is obtained, then this estimate is plugged-in into an elastic-net regression which enforces sparseness. Our approach provides an intuitive, general and flexible extension of sparse principal component analysis to the robust setting. We also show how to implement the algorithm when the dimensionality exceeds the number of observations by adapting the approach to the use of robust loadings from ROBPCA. The proposed technique is seen to compare well for simulated and real datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

References

  • Cadima J, Jolliffe I (1995) Loading and correlations in the interpretation of principal components. J Appl Stat 22(2):203–214

    Article  MathSciNet  Google Scholar 

  • Cerioli A, Farcomeni A (2011) Error rates for multivariate outlier detection. Comput Stat Data Anal 55:544–553

    Article  MathSciNet  MATH  Google Scholar 

  • Croux C, Haesbroeck G (2000) Principal component analysis based on robust estimators of the covariance or correlation matrix: influence functions and efficiencies. Biometrika 87(3):603–618

    Article  MathSciNet  MATH  Google Scholar 

  • Croux C, Ruiz-Gazen A (2005) High breakdown estimators for principal components: the projection-pursuit approach revisited. J Multivar Anal 95(1):206–226

    Article  MathSciNet  MATH  Google Scholar 

  • Croux C, Filzmoser P, Oliveira MR (2007) Algorithms for projection-pursuit robust principal component analysis. Chemometr Intell Lab 87(2):218–225

    Article  Google Scholar 

  • Croux C, Filzmoser P, Fritz H (2013) Robust sparse principal component analysis. Technometrics 55(2):202–214

    Article  MathSciNet  Google Scholar 

  • Engelen S, Hubert M, Branden K (2005) A comparison of three procedures for robust PCA in high dimensions. Aust J Stat 34:117–126

    Google Scholar 

  • Farcomeni A (2009) An exact approach to sparse principal component analysis. Comput Stat 24(4):583–604

    Article  MathSciNet  MATH  Google Scholar 

  • Farcomeni A, Ventura L (2012) An overview of robust methods in medical research. Stat Med Res 21:111–133

    Article  MathSciNet  Google Scholar 

  • Farcomeni A, Greco L (2015) Robust methods for data reduction. Chapman & Hall/CRC Press, Boca Raton

    Book  MATH  Google Scholar 

  • Friedman J, Hastie T, Höfling H, Tibshirani R et al (2007) Pathwise coordinate optimization. Ann Appl Stat 1(2):302–332

    Article  MathSciNet  MATH  Google Scholar 

  • Guo J, James G, Levina E, Michailidis G, Zhu J (2010) Principal component analysis with sparse fused loadings. J Comput Graph Stat 19(4):930–946

    Article  MathSciNet  Google Scholar 

  • Heritier S, Cantoni E, Copt S, Victoria-Feser MP (2009) Robust methods in biostatistics. Wiley, Chichester

    Book  MATH  Google Scholar 

  • Hubert M, Rousseeuw P, Branden K (2005) ROBPCA: a new approach to robust principal component analysis. Technometrics 47(1):64–79

    Article  MathSciNet  Google Scholar 

  • Hubert M, Rousseeuw P, Van Aelst S (2008) High-breakdown robust multivariate methods. Stat Sci 23:92–119

    Article  MathSciNet  MATH  Google Scholar 

  • Hubert M, Reynkens T, Schmitt E, Verdonck T (2015) Sparse PCA for high-dimensional data with outliers. Technometrics (to appear)

  • Jolliffe I (2005) Principal component analysis. Wiley Online Library, New York

    Book  MATH  Google Scholar 

  • Jolliffe I, Trendafilov N, Uddin M (2003) A modified principal component technique based on the LASSO. J Comput Graph Stat 12(3):531–547

    Article  MathSciNet  Google Scholar 

  • Leng C, Wang H (2009) On general adaptive sparse principal component analysis. J Comput Graph Stat 18(1):201–215

    Article  MathSciNet  Google Scholar 

  • Locantore N, Marron J, Simpson D, Tripoli N, Zhang J, Cohen K (1999) Robust principal component analysis for functional data. TEST 8(1):1–73

    Article  MathSciNet  MATH  Google Scholar 

  • Maronna R (2005) Principal components and orthogonal regression based on robust scales. Technometrics 47(3):264–273

    Article  MathSciNet  Google Scholar 

  • Maronna RA, Martin RD, Yohai VJ (2006) Robust statistics: theory and methods. Wiley, New York

    Book  MATH  Google Scholar 

  • Pison G, Van Aelst S, Willems G (2002) Small sample corrections for LTS and MCD. Metrika 55:111–123

    Article  MathSciNet  MATH  Google Scholar 

  • Rousseeuw PJ (1984) Least median of squares regression. J Am Stat Assoc 79:851–857

    Article  MathSciNet  MATH  Google Scholar 

  • Rousseeuw P, Croux C (1993) Alternatives to the median absolute deviation. J Am Statis Assoc 88(424):1273–1283

    Article  MathSciNet  MATH  Google Scholar 

  • Rousseeuw P, Leroy A (1987) Robust regression and outlier detection. Wiley-Interscience, New York

    Book  MATH  Google Scholar 

  • Rousseeuw P, Van Driessen K (1999) A fast algorithm for the minimum covariance determinant estimator. Technometrics 41:212–223

    Article  Google Scholar 

  • Salibian-Barrera M, Yohai VJ (2006) A fast algorithm for S-regression estimates. J Comput Graph Stat 15:414–427

    Article  MathSciNet  Google Scholar 

  • Salibian-Barrera M, Van Aelst S, Willems G (2006) Principal components analysis based on multivariate MM estimators with fast and robust bootstrap. J Am Stat Assoc 101(475):1198–1211

    Article  MathSciNet  MATH  Google Scholar 

  • Tatsuoka K, Tyler D (2000) On the uniqueness of S-functionals and M-functionals under nonelliptical distributions. Ann Statist 28(4):1219–1243

    Article  MathSciNet  MATH  Google Scholar 

  • Varmuza K, Filzmoser P (2008) Introduction to multivariate statistical analysis in chemometrics. Chapman & Hall/CRC Press, Boca Raton

    Google Scholar 

  • Witten DM, Tibshirani R, Hastie T (2009) A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostat 10(3):515–534

    Article  Google Scholar 

  • Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B 67:301–320

    Article  MathSciNet  MATH  Google Scholar 

  • Zou H, Hastie T, Tibshirani R (2006) Sparse principal component analysis. J Comput Graph Stat 15(2):265–286

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgments

The authors want to thank two anonymous reviewers whose stimulating comments were helpful in improving this work and the understanding of the problem. The authors are also grateful to Professor Mia Hubert who kindly shared the R code for ROSPCA.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Luca Greco.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Greco, L., Farcomeni, A. A plug-in approach to sparse and robust principal component analysis. TEST 25, 449–481 (2016). https://doi.org/10.1007/s11749-015-0464-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11749-015-0464-0

Keywords

Mathematics Subject Classification