Abstract
Successive quadratic approximations, or second-order proximal methods, are useful for minimizing functions that are a sum of a smooth part and a convex, possibly nonsmooth part that promotes regularization. Most analyses of iteration complexity focus on the special case of proximal gradient method, or accelerated variants thereof. There have been only a few studies of methods that use a second-order approximation to the smooth part, due in part to the difficulty of obtaining closed-form solutions to the subproblems at each iteration. In fact, iterative algorithms may need to be used to find inexact solutions to these subproblems. In this work, we present global analysis of the iteration complexity of inexact successive quadratic approximation methods, showing that an inexact solution of the subproblem that is within a fixed multiplicative precision of optimality suffices to guarantee the same order of convergence rate as the exact version, with complexity related in an intuitive way to the measure of inexactness. Our result allows flexible choices of the second-order term, including Newton and quasi-Newton choices, and does not necessarily require increasing precision of the subproblem solution on later iterations. For problems exhibiting a property related to strong convexity, the algorithms converge at global linear rates. For general convex problems, the convergence rate is linear in early stages, while the overall rate is O(1 / k). For nonconvex problems, a first-order optimality criterion converges to zero at a rate of \(O(1/\sqrt{k})\).
Similar content being viewed by others
Notes
The definition of \({\varDelta }\) in [32] contains another term \(\omega d^T H d/2\), where \(\omega \in [0,1)\) is a parameter. We take \(\omega =0\) for simplicity, but our analysis can be extended in a straightforward way to the case of \(\omega \in (0,1)\).
Note that for \(\eta \in [0,1)\), \(1 / (1- \eta ) > 1/ (2(1 - \sqrt{\eta }))\).
We could instead require only \(H_k^0 \succeq 0\) and start with \(H_k + I\) instead.
Downloaded from http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/.
References
Bach, F.: Duality between subgradient and conditional gradient methods. SIAM J. Optim. 25(1), 115–129 (2015)
Bonettini, S., Loris, I., Porta, F., Prato, M.: Variable metric inexact line-search-based methods for nonsmooth optimization. SIAM J. Optim. 26(2), 891–921 (2016)
Bonettini, S., Loris, I., Porta, F., Prato, M., Rebegoldi, S.: On the convergence of a linesearch based proximal-gradient method for nonconvex optimization. Inverse Problems 33(5), 055005 (2017)
Burke, J.V., Moré, J.J., Toraldo, G.: Convergence properties of trust region methods for linear and convex constraints. Math. Program. 47(1–3), 305–336 (1990)
Byrd, R.H., Lu, P., Nocedal, J., Zhu, C.: A limited memory algorithm for bound constrained optimization. SIAM J. Sci. Comput. 16, 1190–1208 (1995)
Byrd, R.H., Nocedal, J., Oztoprak, F.: An inexact successive quadratic approximation method for \({L}-1\) regularized optimization. Math. Program. 157(2), 375–396 (2016)
Chouzenoux, E., Pesquet, J.C., Repetti, A.: Variable metric forward–backward algorithm for minimizing the sum of a differentiable function and a convex function. J. Optim. Theory Appl. 162(1), 107–132 (2014)
Combettes, P.L., Wajs, V.R.: Signal recovery by proximal forward–backward splitting. Multiscale Model. Simul. 4(4), 1168–1200 (2005)
Conn, A.R., Gould, N.I.M., Toint, P.L.: Global convergence of a class of trust region algorithms for optimization with simple bounds. SIAM J. Numer. Anal. 25(2), 433–460 (1988)
Drusvyatskiy, D., Lewis, A.S.: Error bounds, quadratic growth, and linear convergence of proximal methods. Math. Oper. Res. 43(3), 919–948 (2005)
Fletcher, R.: Practical Methods of Optimization. Wiley, Hoboken (1987)
Ghanbari, H., Scheinberg, K.: Proximal quasi-Newton methods for regularized convex optimization with linear and accelerated sublinear convergence rates. Comput. Optim. Appl. 69(3), 597–627 (2018)
Jiang, K., Sun, D., Toh, K.C.: An inexact accelerated proximal gradient method for large scale linearly constrained convex sdp. SIAM J. Optim. 22(3), 1042–1064 (2012)
Lee, C.P., Chang, K.W.: Distributed block-diagonal approximation methods for regularized empirical risk minimization. Tech. rep. (2017)
Lee, C.p., Lim, C.H., Wright, S.J.: A distributed quasi-Newton algorithm for empirical risk minimization with nonsmooth regularization. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1646–1655. ACM, New York (2018)
Lee, C.P., Roth, D.: Distributed box-constrained quadratic optimization for dual linear SVM. In: Proceedings of the International Conference on Machine Learning (2015)
Lee, J.D., Sun, Y., Saunders, M.A.: Proximal Newton-type methods for minimizing composite functions. SIAM J. Optim. 24(3), 1420–1443 (2014)
Li, D.H., Fukushima, M.: On the global convergence of the BFGS method for nonconvex unconstrained optimization problems. SIAM J. Optim. 11(4), 1054–1064 (2001)
Li, J., Andersen, M.S., Vandenberghe, L.: Inexact proximal Newton methods for self-concordant functions. Math. Methods Oper. Res. 85(1), 19–41 (2017)
Lin, C.J., Moré, J.J.: Newton’s method for large-scale bound constrained problems. SIAM J. Optim. 9, 1100–1127 (1999)
Lin, H., Mairal, J., Harchaoui, Z.: Catalyst acceleration for first-order convex optimization: from theory to practice. J. Mach. Learn. Res. 18(212), 1–54 (2018)
Liu, D.C., Nocedal, J.: On the limited memory BFGS method for large scale optimization. Math. Program. 45(1), 503–528 (1989)
Moré, J.J., Sorensen, D.C.: Computing a trust region step. SIAM J. Sci. Stat. Comput. 4(3), 553–572 (1983)
Necoara, I., Nesterov, Yu., Glineur, F.: Linear convergence of first order methods for non-strongly convex optimization. Math. Program. (2018). https://doi.org/10.1007/s10107-018-1232-1
Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course. Kluwer Academic Publishers, Dordrecht (2004)
Nesterov, Y.: Gradient methods for minimizing composite functions. Math. Program. 140(1), 125–161 (2013)
Nocedal, J., Wright, S.J.: Numerical Optimization, 2nd edn. Springer, Berlin (2006)
Rodomanov, A., Kropotov, D.: A superlinearly-convergent proximal Newton-type method for the optimization of finite sums. In: Proceedings of the International Conference on Machine Learning, pp. 2597–2605 (2016)
Scheinberg, K., Tang, X.: Practical inexact proximal quasi-Newton method with global complexity analysis. Math. Program. 160(1–2), 495–529 (2016)
Schmidt, M., Roux, N., Bach, F.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Advances in Neural Information Processing Systems, pp. 1458–1466 (2011)
Tran-Dinh, Q., Kyrillidis, A., Cevher, V.: An inexact proximal path-following algorithm for constrained convex minimization. SIAM J. Optim. 24(4), 1718–1745 (2014)
Tseng, P., Yun, S.: A coordinate gradient descent method for nonsmooth separable minimization. Math. Program. 117(1), 387–423 (2009)
Villa, S., Salzo, S., Baldassarre, L., Verri, A.: Accelerated and inexact forward–backward algorithms. SIAM J. Optim. 23(3), 1607–1633 (2013)
Wright, S.J., Nowak, R.D., Figueiredo, M.A.T.: Sparse reconstruction by separable approximation. IEEE Trans. Signal Process. 57, 2479–2493 (2009)
Yang, T.: Trading computation for communication: Distributed stochastic dual coordinate ascent. In: Advances in Neural Information Processing Systems, pp. 629–637 (2013)
Zheng, S., Wang, J., Xia, F., Xu, W., Zhang, T.: A general distributed dual coordinate optimization framework for regularized loss minimization. J. Mach. Learn. Res. 18(115), 1–52 (2017)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work was supported by NSF awards 1447449, 1628384, 1634579, and 1740707; Subcontracts and 3F-30222 and 8F-30039 from Argonne National Laboratory; and Award N660011824020 from the DARPA Lagrange Program.
Appendices
Proof of Lemma 5
Proof
We have
where in (56a) we used the convexity of f, in (56b) we set \(d=\lambda (P_{{\varOmega }}(x)-x)\), and in (56c) we used the optimal set strong convexity (10) of F. Thus we obtain (25). \(\square \)
Proof of Lemma 6
Proof
Consider
then by setting the derivative to zero in (57), we have
When \(\delta _k \ge A_k\), we have from (58) that \(\lambda _k = 1\). Therefore, from (30) we get
proving (31).
On the other hand, since \(A \ge A_k > 0, c_k \ge 0\) for all k, (30) can be further upper-bounded by
Now take
For \(\delta _k \ge A \ge A_k\), (31) still applies. If \(A > \delta _k\), we have from (59) that \(\lambda _k = \delta _k / A\), hence
This together with (31) imply that \(\{\delta _k\}\) is a monotonically decreasing sequence. Dividing both sides of (60) by \(\delta _{k+1}\delta _k\), and from the fact that \(\delta _k\) is decreasing and nonnegative, we conclude
Summing this inequality from \(k_0\), and using \(\delta _{k_0} < A\), we obtain
proving (32). \(\square \)
Rights and permissions
About this article
Cite this article
Lee, Cp., Wright, S.J. Inexact Successive quadratic approximation for regularized optimization. Comput Optim Appl 72, 641–674 (2019). https://doi.org/10.1007/s10589-019-00059-z
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10589-019-00059-z