Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Advertisement

Inexact Successive quadratic approximation for regularized optimization

  • Published:
Computational Optimization and Applications Aims and scope Submit manuscript

    We’re sorry, something doesn't seem to be working properly.

    Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Successive quadratic approximations, or second-order proximal methods, are useful for minimizing functions that are a sum of a smooth part and a convex, possibly nonsmooth part that promotes regularization. Most analyses of iteration complexity focus on the special case of proximal gradient method, or accelerated variants thereof. There have been only a few studies of methods that use a second-order approximation to the smooth part, due in part to the difficulty of obtaining closed-form solutions to the subproblems at each iteration. In fact, iterative algorithms may need to be used to find inexact solutions to these subproblems. In this work, we present global analysis of the iteration complexity of inexact successive quadratic approximation methods, showing that an inexact solution of the subproblem that is within a fixed multiplicative precision of optimality suffices to guarantee the same order of convergence rate as the exact version, with complexity related in an intuitive way to the measure of inexactness. Our result allows flexible choices of the second-order term, including Newton and quasi-Newton choices, and does not necessarily require increasing precision of the subproblem solution on later iterations. For problems exhibiting a property related to strong convexity, the algorithms converge at global linear rates. For general convex problems, the convergence rate is linear in early stages, while the overall rate is O(1 / k). For nonconvex problems, a first-order optimality criterion converges to zero at a rate of \(O(1/\sqrt{k})\).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. The definition of \({\varDelta }\) in [32] contains another term \(\omega d^T H d/2\), where \(\omega \in [0,1)\) is a parameter. We take \(\omega =0\) for simplicity, but our analysis can be extended in a straightforward way to the case of \(\omega \in (0,1)\).

  2. Note that for \(\eta \in [0,1)\), \(1 / (1- \eta ) > 1/ (2(1 - \sqrt{\eta }))\).

  3. We could instead require only \(H_k^0 \succeq 0\) and start with \(H_k + I\) instead.

  4. Downloaded from http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/.

References

  1. Bach, F.: Duality between subgradient and conditional gradient methods. SIAM J. Optim. 25(1), 115–129 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  2. Bonettini, S., Loris, I., Porta, F., Prato, M.: Variable metric inexact line-search-based methods for nonsmooth optimization. SIAM J. Optim. 26(2), 891–921 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  3. Bonettini, S., Loris, I., Porta, F., Prato, M., Rebegoldi, S.: On the convergence of a linesearch based proximal-gradient method for nonconvex optimization. Inverse Problems 33(5), 055005 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  4. Burke, J.V., Moré, J.J., Toraldo, G.: Convergence properties of trust region methods for linear and convex constraints. Math. Program. 47(1–3), 305–336 (1990)

    Article  MathSciNet  MATH  Google Scholar 

  5. Byrd, R.H., Lu, P., Nocedal, J., Zhu, C.: A limited memory algorithm for bound constrained optimization. SIAM J. Sci. Comput. 16, 1190–1208 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  6. Byrd, R.H., Nocedal, J., Oztoprak, F.: An inexact successive quadratic approximation method for \({L}-1\) regularized optimization. Math. Program. 157(2), 375–396 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  7. Chouzenoux, E., Pesquet, J.C., Repetti, A.: Variable metric forward–backward algorithm for minimizing the sum of a differentiable function and a convex function. J. Optim. Theory Appl. 162(1), 107–132 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  8. Combettes, P.L., Wajs, V.R.: Signal recovery by proximal forward–backward splitting. Multiscale Model. Simul. 4(4), 1168–1200 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  9. Conn, A.R., Gould, N.I.M., Toint, P.L.: Global convergence of a class of trust region algorithms for optimization with simple bounds. SIAM J. Numer. Anal. 25(2), 433–460 (1988)

    Article  MathSciNet  MATH  Google Scholar 

  10. Drusvyatskiy, D., Lewis, A.S.: Error bounds, quadratic growth, and linear convergence of proximal methods. Math. Oper. Res. 43(3), 919–948 (2005)

    Article  MathSciNet  Google Scholar 

  11. Fletcher, R.: Practical Methods of Optimization. Wiley, Hoboken (1987)

    MATH  Google Scholar 

  12. Ghanbari, H., Scheinberg, K.: Proximal quasi-Newton methods for regularized convex optimization with linear and accelerated sublinear convergence rates. Comput. Optim. Appl. 69(3), 597–627 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  13. Jiang, K., Sun, D., Toh, K.C.: An inexact accelerated proximal gradient method for large scale linearly constrained convex sdp. SIAM J. Optim. 22(3), 1042–1064 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  14. Lee, C.P., Chang, K.W.: Distributed block-diagonal approximation methods for regularized empirical risk minimization. Tech. rep. (2017)

  15. Lee, C.p., Lim, C.H., Wright, S.J.: A distributed quasi-Newton algorithm for empirical risk minimization with nonsmooth regularization. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1646–1655. ACM, New York (2018)

  16. Lee, C.P., Roth, D.: Distributed box-constrained quadratic optimization for dual linear SVM. In: Proceedings of the International Conference on Machine Learning (2015)

  17. Lee, J.D., Sun, Y., Saunders, M.A.: Proximal Newton-type methods for minimizing composite functions. SIAM J. Optim. 24(3), 1420–1443 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  18. Li, D.H., Fukushima, M.: On the global convergence of the BFGS method for nonconvex unconstrained optimization problems. SIAM J. Optim. 11(4), 1054–1064 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  19. Li, J., Andersen, M.S., Vandenberghe, L.: Inexact proximal Newton methods for self-concordant functions. Math. Methods Oper. Res. 85(1), 19–41 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  20. Lin, C.J., Moré, J.J.: Newton’s method for large-scale bound constrained problems. SIAM J. Optim. 9, 1100–1127 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  21. Lin, H., Mairal, J., Harchaoui, Z.: Catalyst acceleration for first-order convex optimization: from theory to practice. J. Mach. Learn. Res. 18(212), 1–54 (2018)

    MathSciNet  MATH  Google Scholar 

  22. Liu, D.C., Nocedal, J.: On the limited memory BFGS method for large scale optimization. Math. Program. 45(1), 503–528 (1989)

    Article  MathSciNet  MATH  Google Scholar 

  23. Moré, J.J., Sorensen, D.C.: Computing a trust region step. SIAM J. Sci. Stat. Comput. 4(3), 553–572 (1983)

    Article  MathSciNet  MATH  Google Scholar 

  24. Necoara, I., Nesterov, Yu., Glineur, F.: Linear convergence of first order methods for non-strongly convex optimization. Math. Program. (2018). https://doi.org/10.1007/s10107-018-1232-1

  25. Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course. Kluwer Academic Publishers, Dordrecht (2004)

    Book  MATH  Google Scholar 

  26. Nesterov, Y.: Gradient methods for minimizing composite functions. Math. Program. 140(1), 125–161 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  27. Nocedal, J., Wright, S.J.: Numerical Optimization, 2nd edn. Springer, Berlin (2006)

    MATH  Google Scholar 

  28. Rodomanov, A., Kropotov, D.: A superlinearly-convergent proximal Newton-type method for the optimization of finite sums. In: Proceedings of the International Conference on Machine Learning, pp. 2597–2605 (2016)

  29. Scheinberg, K., Tang, X.: Practical inexact proximal quasi-Newton method with global complexity analysis. Math. Program. 160(1–2), 495–529 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  30. Schmidt, M., Roux, N., Bach, F.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Advances in Neural Information Processing Systems, pp. 1458–1466 (2011)

  31. Tran-Dinh, Q., Kyrillidis, A., Cevher, V.: An inexact proximal path-following algorithm for constrained convex minimization. SIAM J. Optim. 24(4), 1718–1745 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  32. Tseng, P., Yun, S.: A coordinate gradient descent method for nonsmooth separable minimization. Math. Program. 117(1), 387–423 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  33. Villa, S., Salzo, S., Baldassarre, L., Verri, A.: Accelerated and inexact forward–backward algorithms. SIAM J. Optim. 23(3), 1607–1633 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  34. Wright, S.J., Nowak, R.D., Figueiredo, M.A.T.: Sparse reconstruction by separable approximation. IEEE Trans. Signal Process. 57, 2479–2493 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  35. Yang, T.: Trading computation for communication: Distributed stochastic dual coordinate ascent. In: Advances in Neural Information Processing Systems, pp. 629–637 (2013)

  36. Zheng, S., Wang, J., Xia, F., Xu, W., Zhang, T.: A general distributed dual coordinate optimization framework for regularized loss minimization. J. Mach. Learn. Res. 18(115), 1–52 (2017)

    MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ching-pei Lee.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported by NSF awards 1447449, 1628384, 1634579, and 1740707; Subcontracts and 3F-30222 and 8F-30039 from Argonne National Laboratory; and Award N660011824020 from the DARPA Lagrange Program.

Appendices

Proof of Lemma 5

Proof

We have

$$\begin{aligned} Q^*&= \min _{d} \, \nabla f\left( x\right) ^T d + \frac{1}{2}d^T H d + \psi \left( x+ d\right) - \psi \left( x\right) \nonumber \\&\le \min _d \, f\left( x+d\right) + \psi \left( x+d\right) + \frac{1}{2}d^T H d - F\left( x\right) \end{aligned}$$
(56a)
$$\begin{aligned}&\le F\left( x + \lambda \left( P_{{\varOmega }}\left( x\right) - x \right) \right) + \frac{\lambda ^2}{2}\left( P_{\varOmega }\left( x\right) - x\right) ^T H \left( P_{\varOmega }\left( x\right) - x \right) - F\left( x\right) \;\; \nonumber \\&\qquad \forall \lambda \in [0,1] \end{aligned}$$
(56b)
$$\begin{aligned}&\le \left( 1 - \lambda \right) F\left( x\right) + \lambda F^* - \frac{\mu \lambda \left( 1 -\lambda \right) }{2} \left\| x - P_{\varOmega }\left( x\right) \right\| ^2 \end{aligned}$$
(56c)
$$\begin{aligned}&\qquad + \frac{\lambda ^2 }{2} \left( x - P_{\varOmega }\left( x\right) \right) ^T H\left( x - P_{\varOmega }\left( x\right) \right) - F\left( x\right) \;\; \forall \lambda \in [0,1] \nonumber \\&\le \lambda (F^* - F\left( x\right) ) - \frac{\mu \lambda \left( 1 -\lambda \right) }{2} \left\| x - P_{\varOmega }\left( x\right) \right\| ^2 \nonumber \\&\quad + \frac{\lambda ^2 }{2} \Vert H\Vert \Vert x - P_{\varOmega }\left( x\right) \Vert ^2 \;\; \forall \lambda \in [0,1], \end{aligned}$$
(56d)

where in (56a) we used the convexity of f, in (56b) we set \(d=\lambda (P_{{\varOmega }}(x)-x)\), and in (56c) we used the optimal set strong convexity (10) of F. Thus we obtain (25). \(\square \)

Proof of Lemma 6

Proof

Consider

$$\begin{aligned} \lambda _k = \arg \min _{\lambda \in [0,1]} \, -\lambda \delta _k + \frac{\lambda ^2}{2}A_k, \end{aligned}$$
(57)

then by setting the derivative to zero in (57), we have

$$\begin{aligned} \lambda _k = \min \left\{ 1, \frac{\delta _k}{A_k}\right\} . \end{aligned}$$
(58)

When \(\delta _k \ge A_k\), we have from (58) that \(\lambda _k = 1\). Therefore, from (30) we get

$$\begin{aligned} \delta _{k+1} \le \delta _{k} + c_k\left( - \delta _k + \frac{A_k}{2}\right) \le \delta _{k} + c_k\left( -\delta _k + \frac{\delta _k}{2}\right) = \left( 1 - \frac{c_k}{2}\right) \delta _{k}, \end{aligned}$$

proving (31).

On the other hand, since \(A \ge A_k > 0, c_k \ge 0\) for all k, (30) can be further upper-bounded by

$$\begin{aligned} \delta _{k+1} \le \delta _k + c_k\left( -\lambda _k \delta _k + \frac{A_k}{2}\lambda _k^2\right) \le \delta _k + c_k \left( -\lambda _k \delta _k + \frac{A}{2} \lambda _k^2 \right) ,\quad \forall \lambda _k \in [0,1]. \end{aligned}$$

Now take

$$\begin{aligned} \lambda _k = \min \left\{ 1, \frac{\delta _k}{A}\right\} . \end{aligned}$$
(59)

For \(\delta _k \ge A \ge A_k\), (31) still applies. If \(A > \delta _k\), we have from (59) that \(\lambda _k = \delta _k / A\), hence

$$\begin{aligned} \delta _{k+1} \le \delta _k - \frac{c_k}{2A}\delta _k^2. \end{aligned}$$
(60)

This together with (31) imply that \(\{\delta _k\}\) is a monotonically decreasing sequence. Dividing both sides of (60) by \(\delta _{k+1}\delta _k\), and from the fact that \(\delta _k\) is decreasing and nonnegative, we conclude

$$\begin{aligned} \delta _k^{-1} \le \delta _{k+1}^{-1} - \frac{c_k \delta _k}{2\delta _{k+1}A} \le \delta _{k+1}^{-1} - \frac{c_k}{2A} \end{aligned}$$

Summing this inequality from \(k_0\), and using \(\delta _{k_0} < A\), we obtain

$$\begin{aligned} \delta _k^{-1} \ge \delta _{k_0}^{-1} + \frac{\sum _{t=k_0}^{k-1}c_t}{2 A} \ge \frac{\sum _{t=k_0}^{k-1}c_t + 2}{2 A} \Rightarrow \delta _k \le \frac{2 A}{\sum _{t=k_0}^{k-1} c_t + 2}, \end{aligned}$$

proving (32). \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lee, Cp., Wright, S.J. Inexact Successive quadratic approximation for regularized optimization. Comput Optim Appl 72, 641–674 (2019). https://doi.org/10.1007/s10589-019-00059-z

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10589-019-00059-z

Keywords