research-article

Free access

Accelerating adaptive cubic regularization of Newton's method via random sampling

AUTHORs:

Shuzhong ZhangAuthors Info & Claims

The Journal of Machine Learning Research, Volume 23, Issue 1

Article No.: 90, Pages 3904 - 3941

Published: 01 January 2022 Publication History

PDF eReader Publisher Site

Abstract

In this paper, we consider an unconstrained optimization model where the objective is a sum of a large number of possibly nonconvex functions, though overall the objective is assumed to be smooth and convex. Our bid to solving such model uses the framework of cubic regularization of Newton's method. As well known, the crux in cubic regularization is its utilization of the Hessian information, which may be computationally expensive for large-scale problems. To tackle this, we resort to approximating the Hessian matrix via subsampling. In particular, we propose to compute an approximated Hessian matrix by either uniformly or non-uniformly sub-sampling the components of the objective. Based upon such sampling strategy, we develop accelerated adaptive cubic regularization approaches and provide theoretical guarantees on global iteration complexity of O(ε1/3) with high probability, which matches that of the original accelerated cubic regularization methods Jiang et al. (2020) using the full Hessian information. Interestingly, we also show that in the worst case scenario our algorithm still achieves O(ε5/6) log(ε)) iteration complexity bound. The proof techniques are new to our knowledge and can be of independent interets. Experimental results on the regularized logistic regression problems demonstrate a clear effect of acceleration on several real data sets.

References

[1]

N. Agarwal, B. Bullins, and E. Hazan. Second-order stochastic optimization for machine learning in linear time. The Journal of Machine Learning Research, 18(1):4148-4187, 2017.

[2]

Z. Allen-Zhu. Katyusha: the first direct accelera1tion of stochastic gradient methods. In STOC, pages 1200-1205. ACM, 2017.

[3]

Z. Allen-Zhu and Y. Yuan. Improved SVRG for non-strongly-convex or sum-of-nonconvex objectives. In ICML, pages 1080-1089, 2016.

[4]

Zeyuan Allen-Zhu. Natasha 2: Faster non-convex optimization than SGD. In NIPS, pages 2675-2686, 2018.

[5]

Y. Arjevani, O. Shamir, and R. Shi_. Oracle complexity of second-order methods for smooth convex optimization. Mathematical Programming, 178(1-2):327-360, 2019.

[6]

P. Baldi, P. Sadowski, and D. Whiteson. Searching for exotic particles in high-energy physics with deep learning. Nature communications, 5:4308, 2014.

[7]

S. Bellavia, G. Gurioli, B. Morini, and P. L. Toint. Adaptive regularization algorithms with inexact evaluations for nonconvex optimization. SIAM Journal on Optimization, 29(4): 2881-2915, 2019.

[8]

S. Bellavia, G. Gurioli, and B. Morini. Adaptive cubic regularization methods with dynamic inexact hessian information and applications to finite-sum minimization. IMA Journal of Numerical Analysis, 41(1):764-799, 2021.

[9]

Stefania Bellavia and Gianmarco Gurioli. Stochastic analysis of an adaptive cubic regularization method under inexact gradient evaluations and dynamic Hessian accuracy. Optimization, 71(1):227-261, 2022.

[10]

A. S. Berahas, R. Bollapragada, and J. Nocedal. An investigation of Newton-sketch and subsampled Newton methods. Optimization Methods and Software, 35(4):661-680, 2020.

[11]

E. G. Birgin, J. L. Gardenghi, J. M. Martínez, S. A. Santos, and P. L. Toint. Worstcase evaluation complexity for unconstrained nonlinear optimization using high-order regularized models. Mathematical Programming, 163(1-2):359-368, 2017.

[12]

J. Blanchet, C. Cartis, M. Menickelly, and K. Scheinberg. Convergence rate analysis of a stochastic trust-region method via supermartingales. INFORMS Journal on Optimization , 1(2):92-119, 2019.

[13]

R. Bollapragada, R. H. Byrd, and J. Nocedal. Exact and inexact subsampled Newton methods for optimization. IMA Journal of Numerical Analysis, 39(2):545-578, 2019.

[14]

L. Bottou, F. E. Curtis, and J. Nocedal. Optimization methods for large-scale machine learning. SIAM Review, 60(2):223-311, 2018.

[15]

S. Bubeck, Q. Jiang, Y. T. Lee, Y. Li, and A. Sidford. Near-optimal method for highly smooth convex optimization. In COLT, pages 492-507, 2019.

[16]

R. H. Byrd, G. M. Chin, W. Neveitt, and J. Nocedal. On the use of stochastic Hessian information in optimization methods for machine learning. SIAM Journal on Optimization, 21(3):977-995, 2011.

[17]

R.H Byrd, S.L Hansen, J. Nocedal, and Y. Singer. A stochastic quasi-Newton method for large-scale optimization. SIAM Journal on Optimization, 26(2):1008-1031, 2016.

[18]

C. Cartis and K. Scheinberg. Global convergence rate analysis of unconstrained optimization methods based on probabilistic models. Mathematical Programming, 169(2):337-375,

[19]

2018. C. Cartis, N. I. M. Gould, and P. L. Toint. Adaptive cubic regularisation methods for unconstrained optimization. Part I: motivation, convergence and numerical results. Mathematical Programming, 127(2):245-295, 2011a.

[20]

C. Cartis, N. I. M. Gould, and P. L. Toint. Adaptive cubic regularisation methods for unconstrained optimization. Part II: worst-case function-and derivative-evaluation complexity. Mathematical Programming, 130(2):295-319, 2011b.

[21]

R. Collobert, S. Bengio, and Y. Bengio. A parallel mixture of SVMs for very large scale problems. In NIPS, pages 633-640, 2002.

[22]

G. Cormode and C. Dickens. Iterative Hessian sketch in input sparsity time. ArXiv Preprint:1910.14166, 2019.

[23]

N. Doikov and P. Richtárik. Randomized block cubic Newton method. In International Conference on Machine Learning, pages 1290-1298. PMLR, 2018.

[24]

P. Drineas and M. W. Mahoney. Lectures on randomized numerical linear algebra. The Mathematics of Data, 25:1, 2018.

[25]

J. Duchi, E. Hazan, and Y. Singer. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12:2121-2159, 2011.

[26]

M. A. Erdogdu and A. Montanari. Convergence rates of sub-sampled newton methods. In NIPS, pages 3052-3060. MIT Press, 2015.

[27]

J. Friedman, T. Hastie, and R. Tibshirani. The elements of statistical learning, volume 1. Springer series in statistics New York, 2001.

[28]

R. Frostig, R. Ge, S. Kakade, and A. Sidford. Un-regularizing: approximate proximal point and faster stochastic algorithms for empirical risk minimization. In ICML, pages 2540-2548, 2015.

[29]

D. Garber and E. Hazan. Fast and simple PCA via convex optimization. ArXiv Preprint: 1509.05647, 2015.

[30]

A. Gasnikov, P. Dvurechensky, E. Gorbunov, E. Vorontsova, D. Selikhanovych, and C. A. Uribe. Optimal tensor methods in smooth convex and uniformly convex optimization. In COLT, pages 1374-1391, 2019.

[31]

S. Ghadimi and G. Lan. Accelerated gradient methods for nonconvex nonlinear and stochastic programming. Mathematical Programming, 156(1-2):59-99, 2016.

[32]

S. Ghadimi, H. Liu, and T. Zhang. Second-order methods with cubic regularization under inexact information. ArXiv Preprint: 1710.05782, 2017.

[33]

I. Goodfellow, Y. Bengio, and A. Courville. Deep Learning. MIT Press, 2016.

[34]

P. Jain, S. M. Kakade, R. Kidambi, P. Netrapalli, and A. Sidford. Accelerating stochastic gradient descent for least squares regression. In COLT, pages 545-604, 2018.

[35]

B. Jiang, T. Lin, and S. Zhang. A unified adaptive tensor approximation scheme to accelerate composite convex optimization. SIAM Journal on Optimization, 30(4):2897-2926, 2020. B. Jiang, H. Wang, and S. Zhang. An optimal high-order tensor method for convex optimization. Mathematics of Operations Research, 46(4):1390-1412, 2021.

[36]

J. M. Kohler and A. Lucchi. Subsampled cubic regularization for non-convex optimization. In ICML, pages 1895-1904, 2017.

[37]

B. Kulis. Metric learning: A survey. Foundations and Trends R in Machine Learning, 5(4): 287-364, 2013.

[38]

S. B. Kylasa, F. Roosta-Khorasani, M. W. Mahoney, and A. Grama. GPU accelerated subsampled Newton's method. In Proceedings of the 2019 SIAM International Conference on Data Mining, pages 702-710. SIAM, 2019.

[39]

G. Lan. An optimal method for stochastic composite optimization. Mathematical Programming , 133(1):365-397, 2012.

[40]

X. Li, S. Wang, and Z. Zhang. Do subsampled Newton methods work for high-dimensional data. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 4723-4730, 2020.

[41]

X. Liu, C-J. Hsieh, J. D. Lee, and Y. Sun. An inexact subsampled proximal Newton-type method for large-scale machine learning. ArXiv Preprint: 1708.08552, 2017.

[42]

D. G. Luenberger and Y. Ye. Linear and Nonlinear Programming, volume 2. Springer, 1984.

[43]

L. Mason, J. Baxter, P. L. Bartlett, and M. R. Frean. Boosting algorithms as gradient descent. In NIPS, pages 512-518, 2000.

[44]

R. D. C. Monteiro and B. F. Svaiter. Iteration-complexity of a Newton proximal extragradient method for monotone variational inequalities and inclusion problems. SIAM Journal on Optimization, 22(3):914-935, 2012.

[45]

R. D. C. Monteiro and B. F. Svaiter. An accelerated hybrid proximal extragradient method for convex optimization and its implications to second-order methods. SIAM Journal on Optimization, 23(3):1092-1125, 2013.

[46]

Yu. Nesterov. A method for solving the convex programming problem with convergence rate O(1/k2). Dokl. Akad. Nauk SSSR, pages 543-547, 1983. (in Russian).

[47]

Yu. Nesterov. Introductory Lectures on Convex Optimization: A Basic Course. Springer Science & Business Media, 2004.

[48]

Yu. Nesterov. Accelerating the cubic regularization of Newton's method on convex problems. Mathematical Programming, 112(1):159-181, 2008.

[49]

Yu. Nesterov. Implementable tensor methods in unconstrained convex optimization. Mathematical Programming, pages 1-27, 2019.

[50]

C. Paquette and K. Scheinberg. A stochastic line search method with expected complexity analysis. SIAM Journal on Optimization, 30(1):349-376, 2020.

[51]

M. Pilanci and M. J.Wainwright. Newton sketch: A near linear-time optimization algorithm with linear-quadratic convergence. SIAM Journal on Optimization, 27(1):205-245, 2017.

[52]

Z. Qu and P. Richt_arik. Coordinate descent with arbitrary sampling I: algorithms and complexity. Optimization Methods and Software, 31(5):829-857, 2016a.

[53]

Z. Qu and P. Richt_arik. Coordinate descent with arbitrary sampling II: Expected separable overapproximation. Optimization Methods and Software, 31(5):858-884, 2016b.

[54]

H. Robbins and S. Monro. A stochastic approximation method. The Annals of Mathematical Statistics, pages 400-407, 1951.

[55]

F. Roosta-Khorasani and M. W. Mahoney. Sub-sampled Newton methods. Mathematical Programming, 174(1-2):293-326, 2019.

[56]

N. N. Schraudolph, J. Yu, and S. Günter. A stochastic quasi-Newton method for online convex optimization. In AISTATS, pages 436-443, 2007.

[57]

S. Shalev-Shwartz. SDCA without duality, regularization, and individual convexity. In ICML, pages 747-754, 2016.

[58]

S. Shalev-Shwartz and T. Zhang. Accelerated mini-batch stochastic dual coordinate ascent. In NIPS, pages 378-385, 2013.

[59]

C. Song and J. Liu. Inexact proximal cubic regularized Newton methods for convex optimization. ArXiv Preprint: 1902.02388, 2019.

[60]

S. Sra, S. Nowozin, and S. J. Wright. Optimization for Machine Learning. MIT Press, 2012.

[61]

N. Tripuraneni, M. Stern, C. Jin, J. Regier, and M. I. Jordan. Stochastic cubic regularization for fast nonconvex optimization. In NIPS, pages 2899-2908, 2018.

[62]

X. Wang, S. Ma, D. Goldfarb, and W. Liu. Stochastic quasi-Newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization, 27(2):927-956, 2017.

[63]

P. Xu, J. Yang, F. Roosta-Khorasani, C. Ré, and M. W. Mahoney. Sub-sampled Newton methods with non-uniform sampling. In NIPS, pages 3000-3008, 2016.

[64]

P. Xu, F. Roosta-Khorasani, and M. W. Mahoney. Second-order optimization for nonconvex machine learning: An empirical study. In SDM, pages 199-207. SIAM, 2020a.

[65]

P. Xu, F. Roosta-Khorasani, and M. W. Mahoney. Newton-type methods for nonconvex optimization under inexact Hessian information. Mathematical Programming, 184:35-70, 2020b.

[66]

Z. Yao, P. Xu, F. Roosta-Khorasani, and M. W. Mahoney. Inexact non-convex Newton-type methods. INFORMS Journal on Optimization, 3(2):119-226, 2021.

[67]

H. Ye, L. Luo, and Z. Zhang. Nesterov's acceleration for approximate Newton. Journal of Machine Learning Research, 21(142):1-37, 2020.

[68]

J. Zhang, L. Xiao, and S. Zhang. Adaptive stochastic variance reduction for subsampled newton method with cubic regularization. INFORMS Journal on Optimization, published online:.

Index Terms

Accelerating adaptive cubic regularization of Newton's method via random sampling

Index terms have been assigned to the content through auto-classification.

Recommendations

Accelerating the cubic regularization of Newton's method on convex problems

In this paper we propose an accelerated version of the cubic regularization of Newton's method (Nesterov and Polyak, in Math Program 108(1): 177---205, 2006). The original version, used for minimizing a convex function with Lipschitz-continuous Hessian, ...
Adaptive cubic regularisation methods for unconstrained optimization. Part I: motivation, convergence and numerical results

An Adaptive Regularisation algorithm using Cubics (ARC) is proposed for unconstrained optimization, generalizing at the same time an unpublished method due to Griewank (Technical Report NA/12, 1981, DAMTP, University of Cambridge), an algorithm by ...
Adaptive cubic regularisation methods for unconstrained optimization. Part II: worst-case function- and derivative-evaluation complexity

An Adaptive Regularisation framework using Cubics (ARC) was proposed for unconstrained optimization and analysed in Cartis, Gould and Toint (Part I, Math Program, doi: 10.1007/s10107-009-0286-5, 2009), generalizing at the same time an unpublished method ...

Comments

Information & Contributors

Information

Published In

cover image The Journal of Machine Learning Research

The Journal of Machine Learning Research Volume 23, Issue 1

January 2022

15939 pages

ISSN:1532-4435

EISSN:1533-7928

Editors:
Francis Bach
INRIA
,
David Blei
Columbia University

Issue’s Table of Contents

Copyright © 2022.

CC-BY 4.0

Publisher

JMLR.org

Publication History

Accepted: 01 March 2022

Revised: 01 March 2022

Published: 01 January 2022

Received: 01 August 2020

Published in JMLR Volume 23, Issue 1

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
87
Total Downloads

Downloads (Last 12 months)64
Downloads (Last 6 weeks)14

Reflects downloads up to 27 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Figures

Tables

Media

View Issue’s Table of Contents