Inexact Successive quadratic approximation for regularized optimization

Lee, Ching-pei; Wright, Stephen J.

doi:10.1007/s10589-019-00059-z

Inexact Successive quadratic approximation for regularized optimization

Published: 25 January 2019

Volume 72, pages 641–674, (2019)
Cite this article

Computational Optimization and Applications Aims and scope Submit manuscript

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Successive quadratic approximations, or second-order proximal methods, are useful for minimizing functions that are a sum of a smooth part and a convex, possibly nonsmooth part that promotes regularization. Most analyses of iteration complexity focus on the special case of proximal gradient method, or accelerated variants thereof. There have been only a few studies of methods that use a second-order approximation to the smooth part, due in part to the difficulty of obtaining closed-form solutions to the subproblems at each iteration. In fact, iterative algorithms may need to be used to find inexact solutions to these subproblems. In this work, we present global analysis of the iteration complexity of inexact successive quadratic approximation methods, showing that an inexact solution of the subproblem that is within a fixed multiplicative precision of optimality suffices to guarantee the same order of convergence rate as the exact version, with complexity related in an intuitive way to the measure of inexactness. Our result allows flexible choices of the second-order term, including Newton and quasi-Newton choices, and does not necessarily require increasing precision of the subproblem solution on later iterations. For problems exhibiting a property related to strong convexity, the algorithms converge at global linear rates. For general convex problems, the convergence rate is linear in early stages, while the overall rate is O(1 / k). For nonconvex problems, a first-order optimality criterion converges to zero at a rate of $O(1/\sqrt{k})$.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Global complexity analysis of inexact successive quadratic approximation methods for regularized optimization under mild assumptions

Article 27 February 2020

An inexact successive quadratic approximation method for a class of difference-of-convex optimization problems

Article 02 March 2022

Accelerated first-order methods for large-scale convex optimization: nearly optimal complexity under strong convexity

Article 01 June 2019

Notes

The definition of ${\varDelta }$ in [32] contains another term $\omega d^T H d/2$, where $\omega \in [0,1)$ is a parameter. We take $\omega =0$ for simplicity, but our analysis can be extended in a straightforward way to the case of $\omega \in (0,1)$.
Note that for $\eta \in [0,1)$, $1 / (1- \eta ) > 1/ (2(1 - \sqrt{\eta }))$.
We could instead require only $H_k^0 \succeq 0$ and start with $H_k + I$ instead.
Downloaded from http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/.

References

Bach, F.: Duality between subgradient and conditional gradient methods. SIAM J. Optim. 25(1), 115–129 (2015)
Article MathSciNet MATH Google Scholar
Bonettini, S., Loris, I., Porta, F., Prato, M.: Variable metric inexact line-search-based methods for nonsmooth optimization. SIAM J. Optim. 26(2), 891–921 (2016)
Article MathSciNet MATH Google Scholar
Bonettini, S., Loris, I., Porta, F., Prato, M., Rebegoldi, S.: On the convergence of a linesearch based proximal-gradient method for nonconvex optimization. Inverse Problems 33(5), 055005 (2017)
Article MathSciNet MATH Google Scholar
Burke, J.V., Moré, J.J., Toraldo, G.: Convergence properties of trust region methods for linear and convex constraints. Math. Program. 47(1–3), 305–336 (1990)
Article MathSciNet MATH Google Scholar
Byrd, R.H., Lu, P., Nocedal, J., Zhu, C.: A limited memory algorithm for bound constrained optimization. SIAM J. Sci. Comput. 16, 1190–1208 (1995)
Article MathSciNet MATH Google Scholar
Byrd, R.H., Nocedal, J., Oztoprak, F.: An inexact successive quadratic approximation method for ${L}-1$ regularized optimization. Math. Program. 157(2), 375–396 (2016)
Article MathSciNet MATH Google Scholar
Chouzenoux, E., Pesquet, J.C., Repetti, A.: Variable metric forward–backward algorithm for minimizing the sum of a differentiable function and a convex function. J. Optim. Theory Appl. 162(1), 107–132 (2014)
Article MathSciNet MATH Google Scholar
Combettes, P.L., Wajs, V.R.: Signal recovery by proximal forward–backward splitting. Multiscale Model. Simul. 4(4), 1168–1200 (2005)
Article MathSciNet MATH Google Scholar
Conn, A.R., Gould, N.I.M., Toint, P.L.: Global convergence of a class of trust region algorithms for optimization with simple bounds. SIAM J. Numer. Anal. 25(2), 433–460 (1988)
Article MathSciNet MATH Google Scholar
Drusvyatskiy, D., Lewis, A.S.: Error bounds, quadratic growth, and linear convergence of proximal methods. Math. Oper. Res. 43(3), 919–948 (2005)
Article MathSciNet Google Scholar
Fletcher, R.: Practical Methods of Optimization. Wiley, Hoboken (1987)
MATH Google Scholar
Ghanbari, H., Scheinberg, K.: Proximal quasi-Newton methods for regularized convex optimization with linear and accelerated sublinear convergence rates. Comput. Optim. Appl. 69(3), 597–627 (2018)
Article MathSciNet MATH Google Scholar
Jiang, K., Sun, D., Toh, K.C.: An inexact accelerated proximal gradient method for large scale linearly constrained convex sdp. SIAM J. Optim. 22(3), 1042–1064 (2012)
Article MathSciNet MATH Google Scholar
Lee, C.P., Chang, K.W.: Distributed block-diagonal approximation methods for regularized empirical risk minimization. Tech. rep. (2017)
Lee, C.p., Lim, C.H., Wright, S.J.: A distributed quasi-Newton algorithm for empirical risk minimization with nonsmooth regularization. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1646–1655. ACM, New York (2018)
Lee, C.P., Roth, D.: Distributed box-constrained quadratic optimization for dual linear SVM. In: Proceedings of the International Conference on Machine Learning (2015)
Lee, J.D., Sun, Y., Saunders, M.A.: Proximal Newton-type methods for minimizing composite functions. SIAM J. Optim. 24(3), 1420–1443 (2014)
Article MathSciNet MATH Google Scholar
Li, D.H., Fukushima, M.: On the global convergence of the BFGS method for nonconvex unconstrained optimization problems. SIAM J. Optim. 11(4), 1054–1064 (2001)
Article MathSciNet MATH Google Scholar
Li, J., Andersen, M.S., Vandenberghe, L.: Inexact proximal Newton methods for self-concordant functions. Math. Methods Oper. Res. 85(1), 19–41 (2017)
Article MathSciNet MATH Google Scholar
Lin, C.J., Moré, J.J.: Newton’s method for large-scale bound constrained problems. SIAM J. Optim. 9, 1100–1127 (1999)
Article MathSciNet MATH Google Scholar
Lin, H., Mairal, J., Harchaoui, Z.: Catalyst acceleration for first-order convex optimization: from theory to practice. J. Mach. Learn. Res. 18(212), 1–54 (2018)
MathSciNet MATH Google Scholar
Liu, D.C., Nocedal, J.: On the limited memory BFGS method for large scale optimization. Math. Program. 45(1), 503–528 (1989)
Article MathSciNet MATH Google Scholar
Moré, J.J., Sorensen, D.C.: Computing a trust region step. SIAM J. Sci. Stat. Comput. 4(3), 553–572 (1983)
Article MathSciNet MATH Google Scholar
Necoara, I., Nesterov, Yu., Glineur, F.: Linear convergence of first order methods for non-strongly convex optimization. Math. Program. (2018). https://doi.org/10.1007/s10107-018-1232-1
Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course. Kluwer Academic Publishers, Dordrecht (2004)
Book MATH Google Scholar
Nesterov, Y.: Gradient methods for minimizing composite functions. Math. Program. 140(1), 125–161 (2013)
Article MathSciNet MATH Google Scholar
Nocedal, J., Wright, S.J.: Numerical Optimization, 2nd edn. Springer, Berlin (2006)
MATH Google Scholar
Rodomanov, A., Kropotov, D.: A superlinearly-convergent proximal Newton-type method for the optimization of finite sums. In: Proceedings of the International Conference on Machine Learning, pp. 2597–2605 (2016)
Scheinberg, K., Tang, X.: Practical inexact proximal quasi-Newton method with global complexity analysis. Math. Program. 160(1–2), 495–529 (2016)
Article MathSciNet MATH Google Scholar
Schmidt, M., Roux, N., Bach, F.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Advances in Neural Information Processing Systems, pp. 1458–1466 (2011)
Tran-Dinh, Q., Kyrillidis, A., Cevher, V.: An inexact proximal path-following algorithm for constrained convex minimization. SIAM J. Optim. 24(4), 1718–1745 (2014)
Article MathSciNet MATH Google Scholar
Tseng, P., Yun, S.: A coordinate gradient descent method for nonsmooth separable minimization. Math. Program. 117(1), 387–423 (2009)
Article MathSciNet MATH Google Scholar
Villa, S., Salzo, S., Baldassarre, L., Verri, A.: Accelerated and inexact forward–backward algorithms. SIAM J. Optim. 23(3), 1607–1633 (2013)
Article MathSciNet MATH Google Scholar
Wright, S.J., Nowak, R.D., Figueiredo, M.A.T.: Sparse reconstruction by separable approximation. IEEE Trans. Signal Process. 57, 2479–2493 (2009)
Article MathSciNet MATH Google Scholar
Yang, T.: Trading computation for communication: Distributed stochastic dual coordinate ascent. In: Advances in Neural Information Processing Systems, pp. 629–637 (2013)
Zheng, S., Wang, J., Xia, F., Xu, W., Zhang, T.: A general distributed dual coordinate optimization framework for regularized loss minimization. J. Mach. Learn. Res. 18(115), 1–52 (2017)
MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Computer Sciences Department and Wisconsin Institute for Discovery, University of Wisconsin-Madison, Madison, WI, USA
Ching-pei Lee & Stephen J. Wright

Authors

Ching-pei Lee
View author publications
You can also search for this author in PubMed Google Scholar
Stephen J. Wright
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ching-pei Lee.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported by NSF awards 1447449, 1628384, 1634579, and 1740707; Subcontracts and 3F-30222 and 8F-30039 from Argonne National Laboratory; and Award N660011824020 from the DARPA Lagrange Program.

Appendices

Proof of Lemma 5

Proof

We have

$$\begin{aligned} Q^*&= \min _{d} \, \nabla f\left( x\right) ^T d + \frac{1}{2}d^T H d + \psi \left( x+ d\right) - \psi \left( x\right) \nonumber \\&\le \min _d \, f\left( x+d\right) + \psi \left( x+d\right) + \frac{1}{2}d^T H d - F\left( x\right) \end{aligned}$$

(56a)

$$\begin{aligned}&\le F\left( x + \lambda \left( P_{{\varOmega }}\left( x\right) - x \right) \right) + \frac{\lambda ^2}{2}\left( P_{\varOmega }\left( x\right) - x\right) ^T H \left( P_{\varOmega }\left( x\right) - x \right) - F\left( x\right) \;\; \nonumber \\&\qquad \forall \lambda \in [0,1] \end{aligned}$$

(56b)

$$\begin{aligned}&\le \left( 1 - \lambda \right) F\left( x\right) + \lambda F^* - \frac{\mu \lambda \left( 1 -\lambda \right) }{2} \left\| x - P_{\varOmega }\left( x\right) \right\| ^2 \end{aligned}$$

(56c)

$$\begin{aligned}&\qquad + \frac{\lambda ^2 }{2} \left( x - P_{\varOmega }\left( x\right) \right) ^T H\left( x - P_{\varOmega }\left( x\right) \right) - F\left( x\right) \;\; \forall \lambda \in [0,1] \nonumber \\&\le \lambda (F^* - F\left( x\right) ) - \frac{\mu \lambda \left( 1 -\lambda \right) }{2} \left\| x - P_{\varOmega }\left( x\right) \right\| ^2 \nonumber \\&\quad + \frac{\lambda ^2 }{2} \Vert H\Vert \Vert x - P_{\varOmega }\left( x\right) \Vert ^2 \;\; \forall \lambda \in [0,1], \end{aligned}$$

(56d)

where in (56a) we used the convexity of f, in (56b) we set $d=\lambda (P_{{\varOmega }}(x)-x)$, and in (56c) we used the optimal set strong convexity (10) of F. Thus we obtain (25). $\square $

Proof of Lemma 6

Proof

Consider

$$\begin{aligned} \lambda _k = \arg \min _{\lambda \in [0,1]} \, -\lambda \delta _k + \frac{\lambda ^2}{2}A_k, \end{aligned}$$

(57)

then by setting the derivative to zero in (57), we have

$$\begin{aligned} \lambda _k = \min \left\{ 1, \frac{\delta _k}{A_k}\right\} . \end{aligned}$$

(58)

When $\delta _k \ge A_k$, we have from (58) that $\lambda _k = 1$. Therefore, from (30) we get

$$\begin{aligned} \delta _{k+1} \le \delta _{k} + c_k\left( - \delta _k + \frac{A_k}{2}\right) \le \delta _{k} + c_k\left( -\delta _k + \frac{\delta _k}{2}\right) = \left( 1 - \frac{c_k}{2}\right) \delta _{k}, \end{aligned}$$

proving (31).

On the other hand, since $A \ge A_k > 0, c_k \ge 0$ for all k, (30) can be further upper-bounded by

$$\begin{aligned} \delta _{k+1} \le \delta _k + c_k\left( -\lambda _k \delta _k + \frac{A_k}{2}\lambda _k^2\right) \le \delta _k + c_k \left( -\lambda _k \delta _k + \frac{A}{2} \lambda _k^2 \right) ,\quad \forall \lambda _k \in [0,1]. \end{aligned}$$

Now take

$$\begin{aligned} \lambda _k = \min \left\{ 1, \frac{\delta _k}{A}\right\} . \end{aligned}$$

(59)

For $\delta _k \ge A \ge A_k$, (31) still applies. If $A > \delta _k$, we have from (59) that $\lambda _k = \delta _k / A$, hence

$$\begin{aligned} \delta _{k+1} \le \delta _k - \frac{c_k}{2A}\delta _k^2. \end{aligned}$$

(60)

This together with (31) imply that $\{\delta _k\}$ is a monotonically decreasing sequence. Dividing both sides of (60) by $\delta _{k+1}\delta _k$, and from the fact that $\delta _k$ is decreasing and nonnegative, we conclude

$$\begin{aligned} \delta _k^{-1} \le \delta _{k+1}^{-1} - \frac{c_k \delta _k}{2\delta _{k+1}A} \le \delta _{k+1}^{-1} - \frac{c_k}{2A} \end{aligned}$$

Summing this inequality from $k_0$, and using $\delta _{k_0} < A$, we obtain

$$\begin{aligned} \delta _k^{-1} \ge \delta _{k_0}^{-1} + \frac{\sum _{t=k_0}^{k-1}c_t}{2 A} \ge \frac{\sum _{t=k_0}^{k-1}c_t + 2}{2 A} \Rightarrow \delta _k \le \frac{2 A}{\sum _{t=k_0}^{k-1} c_t + 2}, \end{aligned}$$

proving (32). $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lee, Cp., Wright, S.J. Inexact Successive quadratic approximation for regularized optimization. Comput Optim Appl 72, 641–674 (2019). https://doi.org/10.1007/s10589-019-00059-z

Download citation

Received: 03 March 2018
Published: 25 January 2019
Issue Date: 15 April 2019
DOI: https://doi.org/10.1007/s10589-019-00059-z

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Inexact Successive quadratic approximation for regularized optimization

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Global complexity analysis of inexact successive quadratic approximation methods for regularized optimization under mild assumptions

An inexact successive quadratic approximation method for a class of difference-of-convex optimization problems

Accelerated first-order methods for large-scale convex optimization: nearly optimal complexity under strong convexity

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Proof of Lemma 5

Proof

Proof of Lemma 6

Proof

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Inexact Successive quadratic approximation for regularized optimization

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Global complexity analysis of inexact successive quadratic approximation methods for regularized optimization under mild assumptions

An inexact successive quadratic approximation method for a class of difference-of-convex optimization problems

Accelerated first-order methods for large-scale convex optimization: nearly optimal complexity under strong convexity

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Proof of Lemma 5

Proof

Proof of Lemma 6

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation