research-article

Public Access

Communication Efficient Distributed Kernel Principal Component Analysis

Authors:

Maria Florina Balcan,

David Woodruff,

Bo XieAuthors Info & Claims

KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Pages 725 - 734

https://doi.org/10.1145/2939672.2939796

Published: 13 August 2016 Publication History

Abstract

Kernel Principal Component Analysis (KPCA) is a key machine learning algorithm for extracting nonlinear features from data. In the presence of a large volume of high dimensional data collected in a distributed fashion, it becomes very costly to communicate all of this data to a single data center and then perform kernel PCA. Can we perform kernel PCA on the entire dataset in a distributed and communication efficient fashion while maintaining provable and strong guarantees in solution quality?

In this paper, we give an affirmative answer to the question by developing a communication efficient algorithm to perform kernel PCA in the distributed setting. The algorithm is a clever combination of subspace embedding and adaptive sampling techniques, and we show that the algorithm can take as input an arbitrary configuration of distributed datasets, and compute a set of global kernel principal components with relative error guarantees independent of the dimension of the feature space or the total number of data points. In particular, computing k principal components with relative error ε over s workers has communication cost Õ(spk/ε+sk²/ε³) words, where p is the average number of nonzero entries in each data point. Furthermore, we experimented the algorithm with large-scale real world datasets and showed that the algorithm produces a high quality kernel PCA solution while using significantly less communication than alternative approaches.

Supplementary Material

MP4 File (kdd2016_liang_component_analysis_01-acm.mp4)

Download
346.58 MB

References

[1]

N. Ailon and B. Chazelle. The fast johnson-lindenstrauss transform and approximate nearest neighbors. SIAM Journal on Computing, 39(1):302--322, 2009.

Digital Library

[2]

H. Avron, H. Nguyen, and D. Woodruff. Subspace embeddings for the polynomial kernel. In Advances in Neural Information Processing Systems, pages 2258--2266, 2014.

Digital Library

[3]

F. Bach and M. Jordan. Predictive low-rank decomposition for kernel methods. In Proceedings of the International Conference on Machine Learning, 2005.

Digital Library

[4]

K. Bache and M. Lichman. UCI machine learning repository, 2013.

[5]

M.-F. Balcan, A. Blum, S. Fine, and Y. Mansour. Distributed learning, communication complexity and privacy. COLT, 2012.

[6]

M.-F. Balcan, V. Kanchanapally, Y. Liang, and D. Woodruff. Improved distributed principal component analysis. In Advances in Neural Information Processing Systems 27, pages 3113--3121. Curran Associates, Inc., 2014.

Digital Library

[7]

P. Baldi, P. Sadowski, and D. Whiteson. Searching for exotic particles in high-energy physics with deep learning. Nature Communications, 2014.

[8]

C. Boutsidis, M. Sviridenko, and D. P. Woodruff. Optimal distributed principal component analysis. In manuscript, 2015.

[9]

C. Boutsidis and D. P. Woodruff. Optimal cur matrix decompositions. arXiv preprint arXiv:1405.7910, 2014.

[10]

C. Boutsidis and D. P. Woodruff. Communication-optimal distributed principal component analysis in the column-partition model. CoRR, abs/1504.06729, 2015.

[11]

C. Boutsidis, D. P. Woodruff, and P. Zhong. Communication-optimal distributed principal component analysis in the column-partition model. In ACM Symposium on Theory of Computing, 2015.

[12]

Y. Cho and L. K. Saul. Kernel methods for deep learning. In NIPS, pages 342--350, 2009.

Digital Library

[13]

K. L. Clarkson and D. P. Woodruff. Low rank approximation and regression in input sparsity time. In Proceedings of the Annual ACM Symposium on Theory of Computing, 2013.

Digital Library

[14]

A. Clauset, C. R. Shalizi, and M. E. Newman. Power-law distributions in empirical data. SIAM review, 51(4):661--703, 2009.

Digital Library

[15]

B. Dai, B. Xie, N. He, Y. Liang, A. Raj, M.-F. F. Balcan, and L. Song. Scalable kernel methods via doubly stochastic gradients. In Advances in Neural Information Processing Systems, pages 3041--3049, 2014.

Digital Library

[16]

A. Deshpande and S. Vempala. Adaptive sampling and fast low-rank matrix approximation. Algorithms and Techniques in Approximation, Randomization, and Combinatorial Optimization, pages 292--303, 2006.

Digital Library

[17]

I. S. Dhillon, Y. Guan, and B. Kulis. Kernel kmeans, spectral clustering and normalized cuts. In Conference on Knowledge Discovery and Data Mining, 2004.

Digital Library

[18]

P. Drineas, M. Magdon-Ismail, M. Mahoney, and D. Woodruff. Fast approximation of matrix coherence and statistical leverage. The Journal of Machine Learning Research, 13(1):3475--3506, 2012.

Digital Library

[19]

P. Drineas, M. W. Mahoney, and S. Muthukrishnan. Relative-error cur matrix decompositions. SIAM Journal on Matrix Analysis and Applications, 30(2):844--881, 2008.

Digital Library

[20]

R. Kannan, S. Vempala, and D. Woodruff. Principal component analysis and higher correlations for distributed data. In Proceedings of The 27th Conference on Learning Theory, pages 1040--1057, 2014.

[21]

S. Kumar, M. Mohri, and A. Talwalkar. Sampling methods for the nyström method. Journal of Machine Learning Research, 13:981--1006, 2012.

Digital Library

[22]

X. Meng and M. W. Mahoney. Low-distortion subspace embeddings in input-sparsity time and applications to robust linear regression. In Proceedings of the Annual ACM symposium on Symposium on Theory of Computing, 2013.

Digital Library

[23]

J. Nelson and H. L. Nguyên. Osnap: Faster numerical linear algebra algorithms via sparser subspace embeddings. In IEEE Annual Symposium on Foundations of Computer Science, 2013.

Digital Library

[24]

A. Rahimi and B. Recht. Random features for large-scale kernel machines. In J. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20. MIT Press, Cambridge, MA, 2008.

Digital Library

[25]

T. Sarlós. Improved approximation algorithms for large matrices via random projections. In IEEE Symposium on Foundations of Computer Science, 2006.

Digital Library

[26]

B. Schölkopf and A. J. Smola. Learning with Kernels. MIT Press, Cambridge, MA, 2002.

[27]

B. Schölkopf, A. J. Smola, and K.-R. Müller. Kernel principal component analysis. In Artificial Neural Networks ICANN'97, volume 1327, pages 583--588, Berlin, 1997.

Digital Library

[28]

B. Schölkopf, K. Tsuda, and J.-P. Vert. Kernel Methods in Computational Biology. MIT Press, Cambridge, MA, 2004.

[29]

D. P. Woodruff. Sketching as a tool for numerical linear algebra. Theoretical Computer Science, 10(1--2):1--157, 2014.

Digital Library

[30]

Y. Zhang, M. J. Wainwright, and J. C. Duchi. Communication-efficient algorithms for statistical optimization. In Advance in Neural Information Processing Systems, 2012.

Cited By

Zhou XWang X(2024) Memory and Communication Efficient Federated Kernel k -Means IEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.321377735:5(7114-7125)Online publication date: May-2024
https://doi.org/10.1109/TNNLS.2022.3213777
Jayaram RWoodruff D(2023)Towards Optimal Moment Estimation in Streaming and Distributed ModelsACM Transactions on Algorithms10.1145/359649419:3(1-35)Online publication date: 24-Jun-2023
https://dl.acm.org/doi/10.1145/3596494
Yu BHe XDai J(2023)CLIG: A classification method based on bidirectional layer information granularityInformation Sciences10.1016/j.ins.2023.119662(119662)Online publication date: Sep-2023
https://doi.org/10.1016/j.ins.2023.119662
Show More Cited By

Index Terms

Communication Efficient Distributed Kernel Principal Component Analysis
1. Theory of computation

Recommendations

Principal Component Analysis: A Natural Approach to Data Exploration

Principal component analysis (PCA) is often applied for analyzing data in the most diverse areas. This work reports, in an accessible and integrated manner, several theoretical and practical aspects of PCA. The basic principles underlying PCA, data ...
Online identification of nonlinear system using reduced kernel principal component analysis

The Principal Component Analysis (PCA) is a powerful technique for extracting structure from possibly high-dimensional data sets. It is readily performed by solving an eigenvalue problem, or by using iterative algorithms that estimate principal ...
Online Kernel Principal Component Analysis: A Reduced-Order Model

Kernel principal component analysis (kernel-PCA) is an elegant nonlinear extension of one of the most used data analysis and dimensionality reduction techniques, the principal component analysis. In this paper, we propose an online algorithm for kernel-...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 2016

2176 pages

ISBN:9781450342322

DOI:10.1145/2939672

General Chairs:
Balaji Krishnapuram
IBM
,
Mohak Shah
Bosch
,
Program Chairs:
Alex Smola
Amazon
,
Charu Aggarwal
IBM
,
Dou Shen
Baidu
,
Rajeev Rastogi
Amazon

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 August 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

ONR
AFOSR
DARPA
NSF
Microsoft
NSF CAREER
NSF/NIH
Google

Conference

KDD '16

Sponsor:

KDD '16: The 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 13 - 17, 2016

California, San Francisco, USA

Acceptance Rates

KDD '16 Paper Acceptance Rate 66 of 1,115 submissions, 6%;

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
433
Total Downloads

Downloads (Last 12 months)74
Downloads (Last 6 weeks)12

Reflects downloads up to 13 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhou XWang X(2024) Memory and Communication Efficient Federated Kernel k -Means IEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.321377735:5(7114-7125)Online publication date: May-2024
https://doi.org/10.1109/TNNLS.2022.3213777
Jayaram RWoodruff D(2023)Towards Optimal Moment Estimation in Streaming and Distributed ModelsACM Transactions on Algorithms10.1145/359649419:3(1-35)Online publication date: 24-Jun-2023
https://dl.acm.org/doi/10.1145/3596494
Yu BHe XDai J(2023)CLIG: A classification method based on bidirectional layer information granularityInformation Sciences10.1016/j.ins.2023.119662(119662)Online publication date: Sep-2023
https://doi.org/10.1016/j.ins.2023.119662
Charisopoulos VDamle AKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)Communication-efficient distributed eigenspace estimation with arbitrary node failuresProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601593(18197-18210)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3601593
He FLv KYang JHuang X(2021)One-Shot Distributed Algorithm for PCA With RBF KernelsIEEE Signal Processing Letters10.1109/LSP.2021.309501728(1465-1469)Online publication date: 2021
https://doi.org/10.1109/LSP.2021.3095017
Zhang FWang XLi RLian H(2019)Randomized sketches for sparse additive modelsNeurocomputing10.1016/j.neucom.2019.12.012Online publication date: Dec-2019
https://doi.org/10.1016/j.neucom.2019.12.012
Shi W(2018)Iterative Kernel Principal Component for Large-Scale Data SetJournal of Testing and Evaluation10.1520/JTE2016055146:5(20160551)Online publication date: 20-Feb-2018
https://doi.org/10.1520/JTE20160551
Woodruff DZhang QVan den Bussche JArenas M(2018)Distributed Statistical Estimation of Matrix Products with ApplicationsProceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems10.1145/3196959.3196964(383-394)Online publication date: 27-May-2018
https://dl.acm.org/doi/10.1145/3196959.3196964
Kamp MBoley MMissura OGartner T(2017)Effective parallelisation for machine learningProceedings of the 31st International Conference on Neural Information Processing Systems10.5555/3295222.3295394(6480-6491)Online publication date: 4-Dec-2017
https://dl.acm.org/doi/10.5555/3295222.3295394

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents