Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2939672.2939796acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article
Public Access

Communication Efficient Distributed Kernel Principal Component Analysis

Published: 13 August 2016 Publication History

Abstract

Kernel Principal Component Analysis (KPCA) is a key machine learning algorithm for extracting nonlinear features from data. In the presence of a large volume of high dimensional data collected in a distributed fashion, it becomes very costly to communicate all of this data to a single data center and then perform kernel PCA. Can we perform kernel PCA on the entire dataset in a distributed and communication efficient fashion while maintaining provable and strong guarantees in solution quality?
In this paper, we give an affirmative answer to the question by developing a communication efficient algorithm to perform kernel PCA in the distributed setting. The algorithm is a clever combination of subspace embedding and adaptive sampling techniques, and we show that the algorithm can take as input an arbitrary configuration of distributed datasets, and compute a set of global kernel principal components with relative error guarantees independent of the dimension of the feature space or the total number of data points. In particular, computing k principal components with relative error ε over s workers has communication cost Õ(spk/ε+sk23) words, where p is the average number of nonzero entries in each data point. Furthermore, we experimented the algorithm with large-scale real world datasets and showed that the algorithm produces a high quality kernel PCA solution while using significantly less communication than alternative approaches.

Supplementary Material

MP4 File (kdd2016_liang_component_analysis_01-acm.mp4)

References

[1]
N. Ailon and B. Chazelle. The fast johnson-lindenstrauss transform and approximate nearest neighbors. SIAM Journal on Computing, 39(1):302--322, 2009.
[2]
H. Avron, H. Nguyen, and D. Woodruff. Subspace embeddings for the polynomial kernel. In Advances in Neural Information Processing Systems, pages 2258--2266, 2014.
[3]
F. Bach and M. Jordan. Predictive low-rank decomposition for kernel methods. In Proceedings of the International Conference on Machine Learning, 2005.
[4]
K. Bache and M. Lichman. UCI machine learning repository, 2013.
[5]
M.-F. Balcan, A. Blum, S. Fine, and Y. Mansour. Distributed learning, communication complexity and privacy. COLT, 2012.
[6]
M.-F. Balcan, V. Kanchanapally, Y. Liang, and D. Woodruff. Improved distributed principal component analysis. In Advances in Neural Information Processing Systems 27, pages 3113--3121. Curran Associates, Inc., 2014.
[7]
P. Baldi, P. Sadowski, and D. Whiteson. Searching for exotic particles in high-energy physics with deep learning. Nature Communications, 2014.
[8]
C. Boutsidis, M. Sviridenko, and D. P. Woodruff. Optimal distributed principal component analysis. In manuscript, 2015.
[9]
C. Boutsidis and D. P. Woodruff. Optimal cur matrix decompositions. arXiv preprint arXiv:1405.7910, 2014.
[10]
C. Boutsidis and D. P. Woodruff. Communication-optimal distributed principal component analysis in the column-partition model. CoRR, abs/1504.06729, 2015.
[11]
C. Boutsidis, D. P. Woodruff, and P. Zhong. Communication-optimal distributed principal component analysis in the column-partition model. In ACM Symposium on Theory of Computing, 2015.
[12]
Y. Cho and L. K. Saul. Kernel methods for deep learning. In NIPS, pages 342--350, 2009.
[13]
K. L. Clarkson and D. P. Woodruff. Low rank approximation and regression in input sparsity time. In Proceedings of the Annual ACM Symposium on Theory of Computing, 2013.
[14]
A. Clauset, C. R. Shalizi, and M. E. Newman. Power-law distributions in empirical data. SIAM review, 51(4):661--703, 2009.
[15]
B. Dai, B. Xie, N. He, Y. Liang, A. Raj, M.-F. F. Balcan, and L. Song. Scalable kernel methods via doubly stochastic gradients. In Advances in Neural Information Processing Systems, pages 3041--3049, 2014.
[16]
A. Deshpande and S. Vempala. Adaptive sampling and fast low-rank matrix approximation. Algorithms and Techniques in Approximation, Randomization, and Combinatorial Optimization, pages 292--303, 2006.
[17]
I. S. Dhillon, Y. Guan, and B. Kulis. Kernel kmeans, spectral clustering and normalized cuts. In Conference on Knowledge Discovery and Data Mining, 2004.
[18]
P. Drineas, M. Magdon-Ismail, M. Mahoney, and D. Woodruff. Fast approximation of matrix coherence and statistical leverage. The Journal of Machine Learning Research, 13(1):3475--3506, 2012.
[19]
P. Drineas, M. W. Mahoney, and S. Muthukrishnan. Relative-error cur matrix decompositions. SIAM Journal on Matrix Analysis and Applications, 30(2):844--881, 2008.
[20]
R. Kannan, S. Vempala, and D. Woodruff. Principal component analysis and higher correlations for distributed data. In Proceedings of The 27th Conference on Learning Theory, pages 1040--1057, 2014.
[21]
S. Kumar, M. Mohri, and A. Talwalkar. Sampling methods for the nyström method. Journal of Machine Learning Research, 13:981--1006, 2012.
[22]
X. Meng and M. W. Mahoney. Low-distortion subspace embeddings in input-sparsity time and applications to robust linear regression. In Proceedings of the Annual ACM symposium on Symposium on Theory of Computing, 2013.
[23]
J. Nelson and H. L. Nguyên. Osnap: Faster numerical linear algebra algorithms via sparser subspace embeddings. In IEEE Annual Symposium on Foundations of Computer Science, 2013.
[24]
A. Rahimi and B. Recht. Random features for large-scale kernel machines. In J. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20. MIT Press, Cambridge, MA, 2008.
[25]
T. Sarlós. Improved approximation algorithms for large matrices via random projections. In IEEE Symposium on Foundations of Computer Science, 2006.
[26]
B. Schölkopf and A. J. Smola. Learning with Kernels. MIT Press, Cambridge, MA, 2002.
[27]
B. Schölkopf, A. J. Smola, and K.-R. Müller. Kernel principal component analysis. In Artificial Neural Networks ICANN'97, volume 1327, pages 583--588, Berlin, 1997.
[28]
B. Schölkopf, K. Tsuda, and J.-P. Vert. Kernel Methods in Computational Biology. MIT Press, Cambridge, MA, 2004.
[29]
D. P. Woodruff. Sketching as a tool for numerical linear algebra. Theoretical Computer Science, 10(1--2):1--157, 2014.
[30]
Y. Zhang, M. J. Wainwright, and J. C. Duchi. Communication-efficient algorithms for statistical optimization. In Advance in Neural Information Processing Systems, 2012.

Cited By

View all
  • (2024) Memory and Communication Efficient Federated Kernel k -Means IEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.321377735:5(7114-7125)Online publication date: May-2024
  • (2023)Towards Optimal Moment Estimation in Streaming and Distributed ModelsACM Transactions on Algorithms10.1145/359649419:3(1-35)Online publication date: 24-Jun-2023
  • (2023)CLIG: A classification method based on bidirectional layer information granularityInformation Sciences10.1016/j.ins.2023.119662(119662)Online publication date: Sep-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
August 2016
2176 pages
ISBN:9781450342322
DOI:10.1145/2939672
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 August 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. distributed computing
  2. kernel method
  3. principal component analysis

Qualifiers

  • Research-article

Funding Sources

Conference

KDD '16
Sponsor:

Acceptance Rates

KDD '16 Paper Acceptance Rate 66 of 1,115 submissions, 6%;
Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)74
  • Downloads (Last 6 weeks)12
Reflects downloads up to 13 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024) Memory and Communication Efficient Federated Kernel k -Means IEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.321377735:5(7114-7125)Online publication date: May-2024
  • (2023)Towards Optimal Moment Estimation in Streaming and Distributed ModelsACM Transactions on Algorithms10.1145/359649419:3(1-35)Online publication date: 24-Jun-2023
  • (2023)CLIG: A classification method based on bidirectional layer information granularityInformation Sciences10.1016/j.ins.2023.119662(119662)Online publication date: Sep-2023
  • (2022)Communication-efficient distributed eigenspace estimation with arbitrary node failuresProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601593(18197-18210)Online publication date: 28-Nov-2022
  • (2021)One-Shot Distributed Algorithm for PCA With RBF KernelsIEEE Signal Processing Letters10.1109/LSP.2021.309501728(1465-1469)Online publication date: 2021
  • (2019)Randomized sketches for sparse additive modelsNeurocomputing10.1016/j.neucom.2019.12.012Online publication date: Dec-2019
  • (2018)Iterative Kernel Principal Component for Large-Scale Data SetJournal of Testing and Evaluation10.1520/JTE2016055146:5(20160551)Online publication date: 20-Feb-2018
  • (2018)Distributed Statistical Estimation of Matrix Products with ApplicationsProceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems10.1145/3196959.3196964(383-394)Online publication date: 27-May-2018
  • (2017)Effective parallelisation for machine learningProceedings of the 31st International Conference on Neural Information Processing Systems10.5555/3295222.3295394(6480-6491)Online publication date: 4-Dec-2017

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media