Article

Free access

Coordinate-wise power method

Authors:

Inderjit S. DhillonAuthors Info & Claims

NIPS'16: Proceedings of the 30th International Conference on Neural Information Processing Systems

Pages 2064 - 2072

Published: 05 December 2016 Publication History

PDF eReader Publisher Site

Abstract

In this paper, we propose a coordinate-wise version of the power method from an optimization viewpoint. The vanilla power method simultaneously updates all the coordinates of the iterate, which is essential for its convergence analysis. However, different coordinates converge to the optimal value at different speeds. Our proposed algorithm, which we call coordinate-wise power method, is able to select and update the most important k coordinates in O(kn) time at each iteration, where n is the dimension of the matrix and k ≤ n is the size of the active set. Inspired by the "greedy" nature of our method, we further propose a greedy coordinate descent algorithm applied on a non-convex objective function specialized for symmetric matrices. We provide convergence analyses for both methods. Experimental results on both synthetic and real data show that our methods achieve up to 23 times speedup over the basic power method. Meanwhile, due to their coordinate-wise nature, our methods are very suitable for the important case when data cannot fit into memory. Finally, we introduce how the coordinate-wise mechanism could be applied to other iterative methods that are used in machine learning.

References

[1]

Inderjit S Dhillon, Pradeep K Ravikumar, and Ambuj Tewari. Nearest neighbor based greedy coordinate descent. In Advances in Neural Information Processing Systems, pages 2160-2168, 2011.

Digital Library

[2]

Nathan Halko, Per-Gunnar Martinsson, and Joel A Tropp. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM review, 53(2):217-288, 2011.

Digital Library

[3]

Moritz Hardt and Eric Price. The noisy power method: A meta algorithm with applications. In Advances in Neural Information Processing Systems, pages 2861-2869, 2014.

Digital Library

[4]

Moritz Hardt and Aaron Roth. Beyond worst-case analysis in private singular vector computation. In Proceedings of the forty-fifth annual ACM symposium on Theory of computing, pages 331-340. ACM, 2013.

Digital Library

[5]

Charles AR Hoare. Algorithm 65: find. Communications of the ACM, 4(7):321-322, 1961.

Digital Library

[6]

Cho-Jui Hsieh and Inderjit S Dhillon. Fast coordinate descent methods with variable selection for non-negative matrix factorization. In Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 1064-1072. ACM, 2011.

Digital Library

[7]

Ilse Ipsen and Rebecca M Wills. Analysis and computation of google's pagerank. In 7th IMACS international symposium on iterative methods in scientific computing, Fields Institute, Toronto, Canada, volume 5, 2005.

[8]

Prateek Jain, Praneeth Netrapalli, and Sujay Sanghavi. Low-rank matrix completion using alternating minimization. In Proceedings of the forty-fifth annual ACM symposium on Theory of computing, pages 665-674. ACM, 2013.

Digital Library

[9]

Michel Journée, Yurii Nesterov, Peter Richtánk, and Rodolphe Sepulchre. Generalized power method for sparse principal component analysis. The Journal of Machine Learning Research, 11:517-553, 2010.

Digital Library

[10]

Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. What is Twitter, a social network or a news media? Proceedings of the 19th international conference on World wide web, pages 591-600, 2010.

Digital Library

[11]

Deanna Needell, Rachel Ward, and Nati Srebro. Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm. In Advances in Neural Information Processing Systems, pages 1017-1025, 2014.

Digital Library

[12]

Yu Nesterov. Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM Journal on Optimization, 22(2):341-362, 2012.

[13]

Julie Nutini, Mark Schmidt, Issam H Laradji, Michael Friedlander, and Hoyt Koepke. Coordinate descent converges faster with the Gauss-Southwell rule than random selection. In Proceedings of the 32nd International Conference on Machine Learning (ICML-15), pages 1632-1641, 2015.

Digital Library

[14]

Beresford N Parlett. The Symmetric Eigenvalue Problem, volume 20. SIAM, 1998.

Digital Library

[15]

Yousef Saad. Iterative methods for sparse linear systems. SIAM, 2003.

Digital Library

[16]

Ohad Shamir. A stochastic PCA and SVD algorithm with an exponential convergence rate. In Proc. of the 32st Int. Conf. Machine Learning (ICML 2015), pages 144-152, 2015.

Digital Library

[17]

Si Si, Donghyuk Shin, Inderjit S Dhillon, and Beresford N Parlett. Multi-scale spectral decomposition of massive graphs. In Advances in Neural Information Processing Systems, pages 2798-2806, 2014.

Digital Library

[18]

Ulrike Von Luxburg. A tutorial on spectral clustering. Statistics and computing, 17(4):395-416, 2007.

Digital Library

[19]

Xiao-Tong Yuan and Tong Zhang. Truncated power method for sparse eigenvalue problems. The Journal of Machine Learning Research, 14(1):899-925, 2013.

Digital Library

Cited By

Yıldız İDy JErdoğmuş DOstmo SCampbell JChiang MIoannidis S(2022)Spectral Ranking RegressionACM Transactions on Knowledge Discovery from Data10.1145/353069316:6(1-38)Online publication date: 30-Jul-2022
https://dl.acm.org/doi/10.1145/3530693
Xu Z(2018)Gradient descent meets shift-and-invert preconditioning for eigenvector computationProceedings of the 32nd International Conference on Neural Information Processing Systems10.5555/3327144.3327206(2830-2839)Online publication date: 3-Dec-2018
https://dl.acm.org/doi/10.5555/3327144.3327206
Lei QYen IWu CDhillon IRavikumar P(2017)Doubly Greedy Primal-Dual Coordinate Descent for sparse empirical risk minimizationProceedings of the 34th International Conference on Machine Learning - Volume 7010.5555/3305890.3305891(2034-2042)Online publication date: 6-Aug-2017
https://dl.acm.org/doi/10.5555/3305890.3305891

Recommendations

A coordinate gradient descent method for nonsmooth separable minimization

We consider the problem of minimizing the sum of a smooth function and a separable convex function. This problem includes as special cases bound-constrained optimization and smooth optimization with ℓ1-regularization. We propose a (block) coordinate ...
Grouped coordinate minimization using Newton's method for inexact minimization in one vector coordinate

Letf(x,y) be a function of the vector variablesx źRn andy źRm. The grouped (variable) coordinate minimization (GCM) method for minimizingf consists of alternating exact minimizations in either of the two vector variables, while holding the other fixed ...
A block coordinate variable metric linesearch based proximal gradient method

In this paper we propose an alternating block version of a variable metric linesearch proximal gradient method. This algorithm addresses problems where the objective function is the sum of a smooth term, whose variables may be coupled, plus a separable ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

NIPS'16: Proceedings of the 30th International Conference on Neural Information Processing Systems

December 2016

5100 pages

ISBN:9781510838819

Publisher

Curran Associates Inc.

Red Hook, NY, United States

Publication History

Published: 05 December 2016

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
126
Total Downloads

Downloads (Last 12 months)50
Downloads (Last 6 weeks)7

Reflects downloads up to 15 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Yıldız İDy JErdoğmuş DOstmo SCampbell JChiang MIoannidis S(2022)Spectral Ranking RegressionACM Transactions on Knowledge Discovery from Data10.1145/353069316:6(1-38)Online publication date: 30-Jul-2022
https://dl.acm.org/doi/10.1145/3530693
Xu Z(2018)Gradient descent meets shift-and-invert preconditioning for eigenvector computationProceedings of the 32nd International Conference on Neural Information Processing Systems10.5555/3327144.3327206(2830-2839)Online publication date: 3-Dec-2018
https://dl.acm.org/doi/10.5555/3327144.3327206
Lei QYen IWu CDhillon IRavikumar P(2017)Doubly Greedy Primal-Dual Coordinate Descent for sparse empirical risk minimizationProceedings of the 34th International Conference on Machine Learning - Volume 7010.5555/3305890.3305891(2034-2042)Online publication date: 6-Aug-2017
https://dl.acm.org/doi/10.5555/3305890.3305891

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten