Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Deep Kernel Principal Component Analysis for multi-level feature learning

Published: 12 April 2024 Publication History

Abstract

Principal Component Analysis (PCA) and its nonlinear extension Kernel PCA (KPCA) are widely used across science and industry for data analysis and dimensionality reduction. Modern deep learning tools have achieved great empirical success, but a framework for deep principal component analysis is still lacking. Here we develop a deep kernel PCA methodology (DKPCA) to extract multiple levels of the most informative components of the data. Our scheme can effectively identify new hierarchical variables, called deep principal components, capturing the main characteristics of high-dimensional data through a simple and interpretable numerical optimization. We couple the principal components of multiple KPCA levels, theoretically showing that DKPCA creates both forward and backward dependency across levels, which has not been explored in kernel methods and yet is crucial to extract more informative features. Various experimental evaluations on multiple data types show that DKPCA finds more efficient and disentangled representations with higher explained variance in fewer principal components, compared to the shallow KPCA. We demonstrate that our method allows for effective hierarchical data exploration, with the ability to separate the key generative factors of the input data both for large datasets and when few training samples are available. Overall, DKPCA can facilitate the extraction of useful patterns from high-dimensional data by learning more informative features organized in different levels, giving diversified aspects to explore the variation factors in the data, while maintaining a simple mathematical formulation.

Highlights

A deep kernel principal component analysis (DKPCA) framework is proposed.
Forward and backward couplings between the levels are identified.
Theoretical analysis presents error bounds and higher explained variance than KPCA.
Generative DKPCA is introduced for the pre-image problem in deep kernel methods.
DKPCA is competitive in learning informative multi-level disentangled features.

References

[1]
Allen-Zhu Z., Li Y., Backward feature correction: How deep learning performs deep (hierarchical) learning, in: Neu G., Rosasco L. (Eds.), Proceedings of thirty sixth conference on learning theory, in: Proceedings of machine learning research, 195, PMLR, 2023, p. 4598.
[2]
Becigneul, G., & Ganea, O.-E. (2019). Riemannian Adaptive Optimization Methods. In International conference on learning representations.
[3]
Bengio Y., Learning deep architectures for AI, Foundations and trends in Machine Learning 2 (1) (2009) 1–127.
[4]
Bengio Y., Courville A., Vincent P., Representation learning: A review and new perspectives, IEEE Transactions on Pattern Analysis and Machine Intelligence 35 (8) (2013) 1798–1828.
[5]
Bohn B., Rieger C., Griebel M., A representer theorem for deep kernel learning, Journal of Machine Learning Research 20 (1) (2019) 2302–2333.
[6]
Chen X., Duan Y., Houthooft R., Schulman J., Sutskever I., Abbeel P., InfoGAN: Interpretable representation learning by information maximizing generative adversarial nets, in: Advances in neural information processing systems, Curran Associates Inc., 2016.
[7]
Chen T.Q., Li X., Grosse R.B., Duvenaud D.K., Isolating sources of disentanglement in variational autoencoders, in: Advances in neural information processing systems, Curran Associates Inc., 2018.
[8]
Deng X., Tian X., Chen S., Harris C.J., Deep principal component analysis based on layerwise feature extraction and its application to nonlinear process monitoring, IEEE Transactions on Control Systems Technology 27 (6) (2019) 2526–2540,.
[9]
Do K., Tran T., Theory and evaluation metrics for learning disentangled representations, in: International conference on learning representations, 2022, URL: https://openreview.net/forum?id=HJgK0h4Ywr.
[10]
Dua D., Graff C., UCI machine learning repository, University of California, Irvine, School of Information and Computer Sciences, 2017, URL: http://archive.ics.uci.edu/ml.
[11]
Eckart C., Young G., The approximation of one matrix by another of lower rank, Psychometrika 1 (3) (1936) 211–218.
[12]
Espinoza M., Suykens J.A., De Moor B., Least squares support vector machines and primal space estimation, in: IEEE international conference on decision and control, vol. 4, IEEE, 2003, pp. 3451–3456.
[13]
Fanuel M., Schreurs J., Suykens J., Diversity sampling is an implicit regularization for kernel methods, SIAM Journal on Mathematics of Data Science 3 (1) (2021) 280–297.
[14]
Fischer A., Igel C., Training restricted Boltzmann machines: an introduction, Pattern Recognition 47 (1) (2014) 25–39.
[15]
Girolami M., Orthogonal series density estimation and the kernel eigenvalue problem, Neural Computation 14 (3) (2002) 669–688,.
[16]
Goodfellow I., Bengio Y., Courville A., Deep learning, MIT Press, 2016.
[17]
Hastie T., Tibshirani R., Friedman J., The elements of statistical learning: Data mining, inference, and prediction, Springer Science & Business Media, 2009.
[18]
Higgins, I., Matthey, L., Pal, A., Burgess, C., Glorot, X., Botvinick, M., Mohamed, S., & Lerchner, A. (2017). Beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework. In International conference on learning representations.
[19]
Hinton G.E., Osindero S., Teh Y.-W., A fast learning algorithm for deep belief nets, Neural Computation 18 (7) (2006) 1527–1554,.
[20]
Holzinger A., Langs G., Denk H., Zatloukal K., Müller H., Causability and explainability of artificial intelligence in medicine, Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 9 (4) (2019).
[21]
Jolliffe I.T., Principal component analysis, Springer, 1986.
[22]
Kim H., Mnih A., Disentangling by factorising, in: International conference on machine learning, PMLR, 2018.
[23]
Kingma, D., & Ba, J. L. (2015). Adam: A Method for Stochastic Optimization. In International conference on learning representations.
[24]
Kingma, D., & Welling, M. (2014). Auto-Encoding Variational Bayes. In International conference on learning representations.
[25]
Laforgue P., Clémençon S., d’Alché-Buc F., Autoencoding any data through kernel autoencoders, in: International conference on artificial intelligence and statistics, PMLR, 2019.
[26]
LeCun Y., Bottou L., Bengio Y., Haffner P., Gradient-based learning applied to document recognition, Proceedings of the IEEE 86 (11) (1998) 2278–2324.
[27]
LeCun Y., Cortes C., Burges C., MNIST handwritten digit database, 2010, URL: http://yann.lecun.com/exdb/mnist.
[28]
LeCun, Y., Huang, F. J., & Bottou, L. (2004). Learning Methods for Generic Object Recognition with Invariance to Pose and Lighting. In IEEE computer society conference on computer vision and pattern recognition, vol. 2.
[29]
Lever J., Krzywinski M., Altman N., Principal component analysis, Nature Methods 14 (7) (2017) 641–643.
[30]
Locatello F., Bauer S., Lucic M., Raetsch G., Gelly S., Schölkopf B., Bachem O., Challenging common assumptions in the unsupervised learning of disentangled representations, in: International conference on machine learning, vol. 97, PMLR, 2019.
[31]
Mercer J., Functions of positive and negative type, and their connection with the theory of integral equations, Philosophical Transactions of the Royal Society, Series A 209 (441–458) (1909) 415–446.
[32]
Mika S., Schölkopf B., Smola A., Müller K.-R., Scholz M., Rätsch G., Kernel PCA and de-noising in feature spaces, in: Advances in neural information processing systems, MIT Press, 1999.
[33]
Pandey A., Schreurs J., Suykens J.A.K., Generative restricted kernel machines: A framework for multi-view generation and disentangled feature learning, Neural Networks 135 (2021) 177–191,.
[34]
Reed S., Sohn K., Zhang Y., Lee H., Learning to disentangle factors of variation with manifold interaction, in: International conference on machine learning, PMLR, 2014.
[35]
Reed S.E., Zhang Y., Zhang Y., Lee H., Deep visual analogy-making, in: Advances in neural information processing systems, vol. 1, MIT Press, 2015.
[36]
Ringnér M., What is principal component analysis?, Nature biotechnology 26 (3) (2008) 303–304.
[37]
Rudi A., Calandriello D., Carratino L., Rosasco L., On fast leverage score sampling and optimal learning, Advances in Neural Information Processing Systems (2018).
[38]
Salakhutdinov R., Learning deep generative models, Annual Review of Statistics and Its Application 2 (2015) 361–385.
[39]
Salakhutdinov, R., & Hinton, G. (2009). Deep Boltzmann Machines. In International conference on artificial intelligence and statistics.
[40]
Sarhan M.H., Eslami A., Navab N., Albarqouni S., Learning interpretable disentangled representations using adversarial VAEs, in: Domain adaptation and representation transfer and medical image learning with less labels and imperfect data, Springer, 2019, pp. 37–44.
[41]
Schölkopf B., Smola A., Müller K.R., Nonlinear component analysis as a kernel eigenvalue problem, Neural Computation 10 (5) (1998) 1299–1319,.
[42]
Suter R., Miladinovic D., Schölkopf B., Bauer S., Robustly disentangled causal mechanisms: validating deep representations for interventional robustness, in: International conference on machine learning, vol. 97, PMLR, 2019.
[43]
Suykens J.A.K., Deep restricted kernel machines using conjugate feature duality, Neural Computation 29 (8) (2017) 2123–2163,.
[44]
Suykens J.A.K., Van Gestel T., De Brabanter J., De Moor B., Vandewalle J., Least squares support vector machines, World Scientific, 2002,.
[45]
Suykens J.A.K., Van Gestel T., Vandewalle J., De Moor B., A support vector machine formulation to PCA analysis and its kernel version, IEEE Transactions on Neural Networks 14 (2) (2003) 447–450.
[46]
Tonin F., Patrinos P., Suykens J.A.K., Unsupervised learning of disentangled representations in deep restricted kernel machines with orthogonality constraints, Neural Networks 142 (2021) 661–679,.
[47]
Wright J., Ganesh A., Rao S., Peng Y., Ma Y., Robust principal component analysis: exact recovery of corrupted low-rank matrices via convex optimization, in: Advances in neural information processing systems, 2009.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Neural Networks
Neural Networks  Volume 170, Issue C
Feb 2024
662 pages

Publisher

Elsevier Science Ltd.

United Kingdom

Publication History

Published: 12 April 2024

Author Tags

  1. Kernel Principal Component Analysis
  2. Deep learning
  3. Generative models
  4. Manifold optimization

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media