Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/3157096.3157327guideproceedingsArticle/Chapter ViewAbstractPublication PagesnipsConference Proceedingsconference-collections
Article
Free access

Coordinate-wise power method

Published: 05 December 2016 Publication History

Abstract

In this paper, we propose a coordinate-wise version of the power method from an optimization viewpoint. The vanilla power method simultaneously updates all the coordinates of the iterate, which is essential for its convergence analysis. However, different coordinates converge to the optimal value at different speeds. Our proposed algorithm, which we call coordinate-wise power method, is able to select and update the most important k coordinates in O(kn) time at each iteration, where n is the dimension of the matrix and kn is the size of the active set. Inspired by the "greedy" nature of our method, we further propose a greedy coordinate descent algorithm applied on a non-convex objective function specialized for symmetric matrices. We provide convergence analyses for both methods. Experimental results on both synthetic and real data show that our methods achieve up to 23 times speedup over the basic power method. Meanwhile, due to their coordinate-wise nature, our methods are very suitable for the important case when data cannot fit into memory. Finally, we introduce how the coordinate-wise mechanism could be applied to other iterative methods that are used in machine learning.

References

[1]
Inderjit S Dhillon, Pradeep K Ravikumar, and Ambuj Tewari. Nearest neighbor based greedy coordinate descent. In Advances in Neural Information Processing Systems, pages 2160-2168, 2011.
[2]
Nathan Halko, Per-Gunnar Martinsson, and Joel A Tropp. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM review, 53(2):217-288, 2011.
[3]
Moritz Hardt and Eric Price. The noisy power method: A meta algorithm with applications. In Advances in Neural Information Processing Systems, pages 2861-2869, 2014.
[4]
Moritz Hardt and Aaron Roth. Beyond worst-case analysis in private singular vector computation. In Proceedings of the forty-fifth annual ACM symposium on Theory of computing, pages 331-340. ACM, 2013.
[5]
Charles AR Hoare. Algorithm 65: find. Communications of the ACM, 4(7):321-322, 1961.
[6]
Cho-Jui Hsieh and Inderjit S Dhillon. Fast coordinate descent methods with variable selection for non-negative matrix factorization. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 1064-1072. ACM, 2011.
[7]
Ilse Ipsen and Rebecca M Wills. Analysis and computation of google's pagerank. In 7th IMACS international symposium on iterative methods in scientific computing, Fields Institute, Toronto, Canada, volume 5, 2005.
[8]
Prateek Jain, Praneeth Netrapalli, and Sujay Sanghavi. Low-rank matrix completion using alternating minimization. In Proceedings of the forty-fifth annual ACM symposium on Theory of computing, pages 665-674. ACM, 2013.
[9]
Michel Journée, Yurii Nesterov, Peter Richtánk, and Rodolphe Sepulchre. Generalized power method for sparse principal component analysis. The Journal of Machine Learning Research, 11:517-553, 2010.
[10]
Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. What is Twitter, a social network or a news media? Proceedings of the 19th international conference on World wide web, pages 591-600, 2010.
[11]
Deanna Needell, Rachel Ward, and Nati Srebro. Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm. In Advances in Neural Information Processing Systems, pages 1017-1025, 2014.
[12]
Yu Nesterov. Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM Journal on Optimization, 22(2):341-362, 2012.
[13]
Julie Nutini, Mark Schmidt, Issam H Laradji, Michael Friedlander, and Hoyt Koepke. Coordinate descent converges faster with the Gauss-Southwell rule than random selection. In Proceedings of the 32nd International Conference on Machine Learning (ICML-15), pages 1632-1641, 2015.
[14]
Beresford N Parlett. The Symmetric Eigenvalue Problem, volume 20. SIAM, 1998.
[15]
Yousef Saad. Iterative methods for sparse linear systems. SIAM, 2003.
[16]
Ohad Shamir. A stochastic PCA and SVD algorithm with an exponential convergence rate. In Proc. of the 32st Int. Conf. Machine Learning (ICML 2015), pages 144-152, 2015.
[17]
Si Si, Donghyuk Shin, Inderjit S Dhillon, and Beresford N Parlett. Multi-scale spectral decomposition of massive graphs. In Advances in Neural Information Processing Systems, pages 2798-2806, 2014.
[18]
Ulrike Von Luxburg. A tutorial on spectral clustering. Statistics and computing, 17(4):395-416, 2007.
[19]
Xiao-Tong Yuan and Tong Zhang. Truncated power method for sparse eigenvalue problems. The Journal of Machine Learning Research, 14(1):899-925, 2013.

Cited By

View all
  • (2022)Spectral Ranking RegressionACM Transactions on Knowledge Discovery from Data10.1145/353069316:6(1-38)Online publication date: 30-Jul-2022
  • (2018)Gradient descent meets shift-and-invert preconditioning for eigenvector computationProceedings of the 32nd International Conference on Neural Information Processing Systems10.5555/3327144.3327206(2830-2839)Online publication date: 3-Dec-2018
  • (2017)Doubly Greedy Primal-Dual Coordinate Descent for sparse empirical risk minimizationProceedings of the 34th International Conference on Machine Learning - Volume 7010.5555/3305890.3305891(2034-2042)Online publication date: 6-Aug-2017

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
NIPS'16: Proceedings of the 30th International Conference on Neural Information Processing Systems
December 2016
5100 pages

Publisher

Curran Associates Inc.

Red Hook, NY, United States

Publication History

Published: 05 December 2016

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)50
  • Downloads (Last 6 weeks)7
Reflects downloads up to 15 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Spectral Ranking RegressionACM Transactions on Knowledge Discovery from Data10.1145/353069316:6(1-38)Online publication date: 30-Jul-2022
  • (2018)Gradient descent meets shift-and-invert preconditioning for eigenvector computationProceedings of the 32nd International Conference on Neural Information Processing Systems10.5555/3327144.3327206(2830-2839)Online publication date: 3-Dec-2018
  • (2017)Doubly Greedy Primal-Dual Coordinate Descent for sparse empirical risk minimizationProceedings of the 34th International Conference on Machine Learning - Volume 7010.5555/3305890.3305891(2034-2042)Online publication date: 6-Aug-2017

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media