Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Free access

Accelerating adaptive cubic regularization of Newton's method via random sampling

Published: 01 January 2022 Publication History

Abstract

In this paper, we consider an unconstrained optimization model where the objective is a sum of a large number of possibly nonconvex functions, though overall the objective is assumed to be smooth and convex. Our bid to solving such model uses the framework of cubic regularization of Newton's method. As well known, the crux in cubic regularization is its utilization of the Hessian information, which may be computationally expensive for large-scale problems. To tackle this, we resort to approximating the Hessian matrix via subsampling. In particular, we propose to compute an approximated Hessian matrix by either uniformly or non-uniformly sub-sampling the components of the objective. Based upon such sampling strategy, we develop accelerated adaptive cubic regularization approaches and provide theoretical guarantees on global iteration complexity of O(ε1/3) with high probability, which matches that of the original accelerated cubic regularization methods Jiang et al. (2020) using the full Hessian information. Interestingly, we also show that in the worst case scenario our algorithm still achieves O(ε5/6) log(ε)) iteration complexity bound. The proof techniques are new to our knowledge and can be of independent interets. Experimental results on the regularized logistic regression problems demonstrate a clear effect of acceleration on several real data sets.

References

[1]
N. Agarwal, B. Bullins, and E. Hazan. Second-order stochastic optimization for machine learning in linear time. The Journal of Machine Learning Research, 18(1):4148-4187, 2017.
[2]
Z. Allen-Zhu. Katyusha: the first direct accelera1tion of stochastic gradient methods. In STOC, pages 1200-1205. ACM, 2017.
[3]
Z. Allen-Zhu and Y. Yuan. Improved SVRG for non-strongly-convex or sum-of-nonconvex objectives. In ICML, pages 1080-1089, 2016.
[4]
Zeyuan Allen-Zhu. Natasha 2: Faster non-convex optimization than SGD. In NIPS, pages 2675-2686, 2018.
[5]
Y. Arjevani, O. Shamir, and R. Shi_. Oracle complexity of second-order methods for smooth convex optimization. Mathematical Programming, 178(1-2):327-360, 2019.
[6]
P. Baldi, P. Sadowski, and D. Whiteson. Searching for exotic particles in high-energy physics with deep learning. Nature communications, 5:4308, 2014.
[7]
S. Bellavia, G. Gurioli, B. Morini, and P. L. Toint. Adaptive regularization algorithms with inexact evaluations for nonconvex optimization. SIAM Journal on Optimization, 29(4): 2881-2915, 2019.
[8]
S. Bellavia, G. Gurioli, and B. Morini. Adaptive cubic regularization methods with dynamic inexact hessian information and applications to finite-sum minimization. IMA Journal of Numerical Analysis, 41(1):764-799, 2021.
[9]
Stefania Bellavia and Gianmarco Gurioli. Stochastic analysis of an adaptive cubic regularization method under inexact gradient evaluations and dynamic Hessian accuracy. Optimization, 71(1):227-261, 2022.
[10]
A. S. Berahas, R. Bollapragada, and J. Nocedal. An investigation of Newton-sketch and subsampled Newton methods. Optimization Methods and Software, 35(4):661-680, 2020.
[11]
E. G. Birgin, J. L. Gardenghi, J. M. Martínez, S. A. Santos, and P. L. Toint. Worstcase evaluation complexity for unconstrained nonlinear optimization using high-order regularized models. Mathematical Programming, 163(1-2):359-368, 2017.
[12]
J. Blanchet, C. Cartis, M. Menickelly, and K. Scheinberg. Convergence rate analysis of a stochastic trust-region method via supermartingales. INFORMS Journal on Optimization , 1(2):92-119, 2019.
[13]
R. Bollapragada, R. H. Byrd, and J. Nocedal. Exact and inexact subsampled Newton methods for optimization. IMA Journal of Numerical Analysis, 39(2):545-578, 2019.
[14]
L. Bottou, F. E. Curtis, and J. Nocedal. Optimization methods for large-scale machine learning. SIAM Review, 60(2):223-311, 2018.
[15]
S. Bubeck, Q. Jiang, Y. T. Lee, Y. Li, and A. Sidford. Near-optimal method for highly smooth convex optimization. In COLT, pages 492-507, 2019.
[16]
R. H. Byrd, G. M. Chin, W. Neveitt, and J. Nocedal. On the use of stochastic Hessian information in optimization methods for machine learning. SIAM Journal on Optimization, 21(3):977-995, 2011.
[17]
R.H Byrd, S.L Hansen, J. Nocedal, and Y. Singer. A stochastic quasi-Newton method for large-scale optimization. SIAM Journal on Optimization, 26(2):1008-1031, 2016.
[18]
C. Cartis and K. Scheinberg. Global convergence rate analysis of unconstrained optimization methods based on probabilistic models. Mathematical Programming, 169(2):337-375,
[19]
2018. C. Cartis, N. I. M. Gould, and P. L. Toint. Adaptive cubic regularisation methods for unconstrained optimization. Part I: motivation, convergence and numerical results. Mathematical Programming, 127(2):245-295, 2011a.
[20]
C. Cartis, N. I. M. Gould, and P. L. Toint. Adaptive cubic regularisation methods for unconstrained optimization. Part II: worst-case function-and derivative-evaluation complexity. Mathematical Programming, 130(2):295-319, 2011b.
[21]
R. Collobert, S. Bengio, and Y. Bengio. A parallel mixture of SVMs for very large scale problems. In NIPS, pages 633-640, 2002.
[22]
G. Cormode and C. Dickens. Iterative Hessian sketch in input sparsity time. ArXiv Preprint:1910.14166, 2019.
[23]
N. Doikov and P. Richtárik. Randomized block cubic Newton method. In International Conference on Machine Learning, pages 1290-1298. PMLR, 2018.
[24]
P. Drineas and M. W. Mahoney. Lectures on randomized numerical linear algebra. The Mathematics of Data, 25:1, 2018.
[25]
J. Duchi, E. Hazan, and Y. Singer. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12:2121-2159, 2011.
[26]
M. A. Erdogdu and A. Montanari. Convergence rates of sub-sampled newton methods. In NIPS, pages 3052-3060. MIT Press, 2015.
[27]
J. Friedman, T. Hastie, and R. Tibshirani. The elements of statistical learning, volume 1. Springer series in statistics New York, 2001.
[28]
R. Frostig, R. Ge, S. Kakade, and A. Sidford. Un-regularizing: approximate proximal point and faster stochastic algorithms for empirical risk minimization. In ICML, pages 2540-2548, 2015.
[29]
D. Garber and E. Hazan. Fast and simple PCA via convex optimization. ArXiv Preprint: 1509.05647, 2015.
[30]
A. Gasnikov, P. Dvurechensky, E. Gorbunov, E. Vorontsova, D. Selikhanovych, and C. A. Uribe. Optimal tensor methods in smooth convex and uniformly convex optimization. In COLT, pages 1374-1391, 2019.
[31]
S. Ghadimi and G. Lan. Accelerated gradient methods for nonconvex nonlinear and stochastic programming. Mathematical Programming, 156(1-2):59-99, 2016.
[32]
S. Ghadimi, H. Liu, and T. Zhang. Second-order methods with cubic regularization under inexact information. ArXiv Preprint: 1710.05782, 2017.
[33]
I. Goodfellow, Y. Bengio, and A. Courville. Deep Learning. MIT Press, 2016.
[34]
P. Jain, S. M. Kakade, R. Kidambi, P. Netrapalli, and A. Sidford. Accelerating stochastic gradient descent for least squares regression. In COLT, pages 545-604, 2018.
[35]
B. Jiang, T. Lin, and S. Zhang. A unified adaptive tensor approximation scheme to accelerate composite convex optimization. SIAM Journal on Optimization, 30(4):2897-2926, 2020. B. Jiang, H. Wang, and S. Zhang. An optimal high-order tensor method for convex optimization. Mathematics of Operations Research, 46(4):1390-1412, 2021.
[36]
J. M. Kohler and A. Lucchi. Subsampled cubic regularization for non-convex optimization. In ICML, pages 1895-1904, 2017.
[37]
B. Kulis. Metric learning: A survey. Foundations and Trends R in Machine Learning, 5(4): 287-364, 2013.
[38]
S. B. Kylasa, F. Roosta-Khorasani, M. W. Mahoney, and A. Grama. GPU accelerated subsampled Newton's method. In Proceedings of the 2019 SIAM International Conference on Data Mining, pages 702-710. SIAM, 2019.
[39]
G. Lan. An optimal method for stochastic composite optimization. Mathematical Programming , 133(1):365-397, 2012.
[40]
X. Li, S. Wang, and Z. Zhang. Do subsampled Newton methods work for high-dimensional data. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 4723-4730, 2020.
[41]
X. Liu, C-J. Hsieh, J. D. Lee, and Y. Sun. An inexact subsampled proximal Newton-type method for large-scale machine learning. ArXiv Preprint: 1708.08552, 2017.
[42]
D. G. Luenberger and Y. Ye. Linear and Nonlinear Programming, volume 2. Springer, 1984.
[43]
L. Mason, J. Baxter, P. L. Bartlett, and M. R. Frean. Boosting algorithms as gradient descent. In NIPS, pages 512-518, 2000.
[44]
R. D. C. Monteiro and B. F. Svaiter. Iteration-complexity of a Newton proximal extragradient method for monotone variational inequalities and inclusion problems. SIAM Journal on Optimization, 22(3):914-935, 2012.
[45]
R. D. C. Monteiro and B. F. Svaiter. An accelerated hybrid proximal extragradient method for convex optimization and its implications to second-order methods. SIAM Journal on Optimization, 23(3):1092-1125, 2013.
[46]
Yu. Nesterov. A method for solving the convex programming problem with convergence rate O(1/k2). Dokl. Akad. Nauk SSSR, pages 543-547, 1983. (in Russian).
[47]
Yu. Nesterov. Introductory Lectures on Convex Optimization: A Basic Course. Springer Science & Business Media, 2004.
[48]
Yu. Nesterov. Accelerating the cubic regularization of Newton's method on convex problems. Mathematical Programming, 112(1):159-181, 2008.
[49]
Yu. Nesterov. Implementable tensor methods in unconstrained convex optimization. Mathematical Programming, pages 1-27, 2019.
[50]
C. Paquette and K. Scheinberg. A stochastic line search method with expected complexity analysis. SIAM Journal on Optimization, 30(1):349-376, 2020.
[51]
M. Pilanci and M. J.Wainwright. Newton sketch: A near linear-time optimization algorithm with linear-quadratic convergence. SIAM Journal on Optimization, 27(1):205-245, 2017.
[52]
Z. Qu and P. Richt_arik. Coordinate descent with arbitrary sampling I: algorithms and complexity. Optimization Methods and Software, 31(5):829-857, 2016a.
[53]
Z. Qu and P. Richt_arik. Coordinate descent with arbitrary sampling II: Expected separable overapproximation. Optimization Methods and Software, 31(5):858-884, 2016b.
[54]
H. Robbins and S. Monro. A stochastic approximation method. The Annals of Mathematical Statistics, pages 400-407, 1951.
[55]
F. Roosta-Khorasani and M. W. Mahoney. Sub-sampled Newton methods. Mathematical Programming, 174(1-2):293-326, 2019.
[56]
N. N. Schraudolph, J. Yu, and S. Günter. A stochastic quasi-Newton method for online convex optimization. In AISTATS, pages 436-443, 2007.
[57]
S. Shalev-Shwartz. SDCA without duality, regularization, and individual convexity. In ICML, pages 747-754, 2016.
[58]
S. Shalev-Shwartz and T. Zhang. Accelerated mini-batch stochastic dual coordinate ascent. In NIPS, pages 378-385, 2013.
[59]
C. Song and J. Liu. Inexact proximal cubic regularized Newton methods for convex optimization. ArXiv Preprint: 1902.02388, 2019.
[60]
S. Sra, S. Nowozin, and S. J. Wright. Optimization for Machine Learning. MIT Press, 2012.
[61]
N. Tripuraneni, M. Stern, C. Jin, J. Regier, and M. I. Jordan. Stochastic cubic regularization for fast nonconvex optimization. In NIPS, pages 2899-2908, 2018.
[62]
X. Wang, S. Ma, D. Goldfarb, and W. Liu. Stochastic quasi-Newton methods for nonconvex stochastic optimization. SIAM Journal on Optimization, 27(2):927-956, 2017.
[63]
P. Xu, J. Yang, F. Roosta-Khorasani, C. Ré, and M. W. Mahoney. Sub-sampled Newton methods with non-uniform sampling. In NIPS, pages 3000-3008, 2016.
[64]
P. Xu, F. Roosta-Khorasani, and M. W. Mahoney. Second-order optimization for nonconvex machine learning: An empirical study. In SDM, pages 199-207. SIAM, 2020a.
[65]
P. Xu, F. Roosta-Khorasani, and M. W. Mahoney. Newton-type methods for nonconvex optimization under inexact Hessian information. Mathematical Programming, 184:35-70, 2020b.
[66]
Z. Yao, P. Xu, F. Roosta-Khorasani, and M. W. Mahoney. Inexact non-convex Newton-type methods. INFORMS Journal on Optimization, 3(2):119-226, 2021.
[67]
H. Ye, L. Luo, and Z. Zhang. Nesterov's acceleration for approximate Newton. Journal of Machine Learning Research, 21(142):1-37, 2020.
[68]
J. Zhang, L. Xiao, and S. Zhang. Adaptive stochastic variance reduction for subsampled newton method with cubic regularization. INFORMS Journal on Optimization, published online:.

Index Terms

  1. Accelerating adaptive cubic regularization of Newton's method via random sampling
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image The Journal of Machine Learning Research
          The Journal of Machine Learning Research  Volume 23, Issue 1
          January 2022
          15939 pages
          ISSN:1532-4435
          EISSN:1533-7928
          Issue’s Table of Contents
          CC-BY 4.0

          Publisher

          JMLR.org

          Publication History

          Accepted: 01 March 2022
          Revised: 01 March 2022
          Published: 01 January 2022
          Received: 01 August 2020
          Published in JMLR Volume 23, Issue 1

          Author Tags

          1. sum of nonconvex functions
          2. acceleration
          3. parameter-free adaptive algorithm
          4. cubic regularization
          5. Newton's method
          6. random sampling
          7. iteration complexity.

          Qualifiers

          • Research-article

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • 0
            Total Citations
          • 87
            Total Downloads
          • Downloads (Last 12 months)64
          • Downloads (Last 6 weeks)14
          Reflects downloads up to 27 Jan 2025

          Other Metrics

          Citations

          View Options

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          Login options

          Full Access

          Figures

          Tables

          Media

          Share

          Share

          Share this Publication link

          Share on social media