article

Stochastic compositional gradient descent: algorithms for minimizing compositions of expected-value functions

Authors:

Han LiuAuthors Info & Claims

Mathematical Programming: Series A and B, Volume 161, Issue 1-2

Pages 419 - 449

https://doi.org/10.1007/s10107-016-1017-3

Published: 01 January 2017 Publication History

Abstract

Classical stochastic gradient methods are well suited for minimizing expected-value objective functions. However, they do not apply to the minimization of a nonlinear function involving expected values or a composition of two expected-value functions, i.e., the problem $$\min _x \mathbf{E}_v\left[ f_v\big (\mathbf{E}_w [g_w(x)]\big ) \right] .$$minxEvfv(Ew[gw(x)]). In order to solve this stochastic composition problem, we propose a class of stochastic compositional gradient descent (SCGD) algorithms that can be viewed as stochastic versions of quasi-gradient method. SCGD update the solutions based on noisy sample gradients of $$f_v,g_{w}$$fv,gw and use an auxiliary variable to track the unknown quantity $$\mathbf{E}_w\left[ g_w(x)\right] $$Ewgw(x). We prove that the SCGD converge almost surely to an optimal solution for convex optimization problems, as long as such a solution exists. The convergence involves the interplay of two iterations with different time scales. For nonsmooth convex problems, the SCGD achieves a convergence rate of $$\mathcal {O}(k^{-1/4})$$O(k-1/4) in the general case and $$\mathcal {O}(k^{-2/3})$$O(k-2/3) in the strongly convex case, after taking k samples. For smooth convex problems, the SCGD can be accelerated to converge at a rate of $$\mathcal {O}(k^{-2/7})$$O(k-2/7) in the general case and $$\mathcal {O}(k^{-4/5})$$O(k-4/5) in the strongly convex case. For nonconvex problems, we prove that any limit point generated by SCGD is a stationary point, for which we also provide the convergence rate analysis. Indeed, the stochastic setting where one wants to optimize compositions of expected-value functions is very common in practice. The proposed SCGD methods find wide applications in learning, estimation, dynamic programming, etc.

References

[1]

Agarwal, A., Bartlett, P., Ravikumar, P., Wainwright, M.: Information-theoretic lower bounds on the oracle complexity of stochastic convex optimization. IEEE Trans. Inf. Theory 58(5), 3235---3249 (2012)

Digital Library

[2]

Amini, A.A., Wainwright, M.J.: High-dimensional analysis of semidefinite relaxations for sparse principal components. Ann. Stat. 37(5B), 2877---2921 (2009)

[3]

Benveniste, A., Métivier, M., Priouret, P.: Adaptive Algorithms and Stochastic Approximations. Springer, Berlin (2012)

Digital Library

[4]

Bertsekas, D.P.: Dynamic Programming and Optimal Control, Volume I---II, 4th edn. Athena Scientific, Belmont (2007)

[5]

Bertsekas, D.P.: Incremental proximal methods for large scale convex optimization. Math. Program. Ser. B 129, 163---195 (2011)

Digital Library

[6]

Bertsekas, D.P., Tsitsiklis, J.N.: Parallel and Distributed Computation: Numerical Methods. Athena Scientific, Belmont (1989)

Digital Library

[7]

Bhatnagar, S., Borkar, V.S.: A two timescale stochastic approximation scheme for simulation-based parametric optimization. Prob. Eng. Inf. Sci. 12, 519---531 (1998)

[8]

Borkar, V.S.: Stochastic approximation with two time scales. Syst. Control Lett. 29, 291---294 (1997)

Digital Library

[9]

Borkar, V.S.: Stochastic Approximation: A Dynamical Systems Viewpoint. Cambridge University Press, Cambridge (2008)

[10]

Dentcheva, D., Penev, S., Ruszczynski, A.: Statistical estimation of composite risk functionals and risk optimization problems. arXiv preprint. arXiv:1504.02658 (2015)

[11]

Ermoliev, Y., Wets, R.: Numerical Techniques for Stochastic Optimization. Springer, New York (2011)

[12]

Ermoliev, Y.M.: Methods of Stochastic Programming. Monographs in Optimization and OR. Nauka, Moscow (1976)

[13]

Ghadimi, S., Lan, G.: Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization I: a generic algorithmic framework. SIAM J. Optim. 22(4), 1469---1492 (2012)

[14]

Ghadimi, S., Lan, G.: Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization, II: shrinking procedures and optimal algorithms. SIAM J. Optim. 23(4), 2061---2089 (2013)

[15]

Huang, J., Horowitz, J.L., Wei, F.: Variable selection in nonparametric additive models. Ann. Stat. 38(4), 2282 (2010)

[16]

Kiefer, J., Wolfowitz, J.: Stochastic estimation of the maximum of a regression function. Ann. Math. Stat. 23(3), 462---466 (1952)

[17]

Konda, R., Tsitsikilis, J.N.: Convergence rate of linear two-time-scale stochastic approximation. Ann. Appl. Prob. 14, 796---819 (2004)

[18]

Korostelev, A.P.: Stochastic Recurrent Procedures. Nauka, Moscow (1984)

[19]

Kushner, H.J., Yin, G.: Stochastic Approximation and Recursive Algorithms and Applications. Springer, New York (2003)

[20]

Nedić, A.: Random algorithms for convex minimization problems. Math. Prog. Ser. B 129, 225---253 (2011)

Digital Library

[21]

Nedić, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM J. Optim. 12, 109---138 (2001)

Digital Library

[22]

Nedić, A., Bertsekas, D.P., Borkar, V.S.: Distributed asynchronous incremental subgradient methods. Stud. Comput. Math. 8, 381---407 (2001)

[23]

Nemirovski, A., Rubinstein, R.Y.: An efficient stochastic approximation algorithm for stochastic saddle point problems. In: Dror, M., L'Ecuyer, P., Szidarovszky, F. (eds.) Modeling Uncertainty, pp. 156---184. Springer (2005)

[24]

Nemirovsky, A., Yudin, D.: Problem Complexity and Method Efficiency in Optimization. Wiley, New York (1983)

[25]

Nesterov, Y.: A method of solving a convex programming problem with convergence rate $${\cal O}(1/k^2)$$O(1/k2). Sov. Math. Dokl. 27(2), 372---376 (1983)

[26]

Polyak, B.T., Juditsky, A.: Acceleration of stochastic approximation by averaging. SIAM J. Control Optim. 30(4), 838---855 (1992)

Digital Library

[27]

Rakhlin, A., Shamir, O., Sridharan, K.: Making gradient descent optimal for strongly convex stochastic optimization. In: Proceedings of the 29th International Conference on Machine Learning, pp. 449---456 (2012)

Digital Library

[28]

Ravikumar, P., Lafferty, J., Liu, H., Wasserman, L.: Sparse additive models. J. R. Stat. Soc. Ser. B 71(5), 1009---1030 (2009)

[29]

Robbins, H., Siegmund, D.: A convergence theorem for nonnegative almost supermartingales and some applications. J.S. Rustagi, Optimizing Methods in Statistics, pp. 233---257. Academic Press, NY (1985)

[30]

Shamir, O., Zhang, T.: Stochastic gradient descent for non-smooth optimization: convergence results and optimal averaging schemes. In: Proceedings of the 30th International Conference on Machine Learning, pp. 71---79 (2013)

Digital Library

[31]

Shapiro, A., Dentcheva, D., Ruszczynski, A.: Lectures on stochastic programming: modeling and theory. MOS-SIAM series on optimization. SIAM, Philadelphia (2009)

Digital Library

[32]

Tibshirani, R.: Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Ser. B 58, 267---288 (1996)

[33]

Wang, M., Bertsekas, D.P.: Incremental constraint projection methods for variational inequalities. Math. Program. Ser. A 150(2), 321---363 (2015)

Digital Library

[34]

Wang, M., Bertsekas, D.P.: Incremental constraint projection-proximal methods for nonsmooth convex optimization. SIAM J. Optim. (2014) (in press)

[35]

Yuan, M., Lin, Y.: Model selection and estimation in the Gaussian graphical model. Biometrika 94(1), 19---35 (2007)

Cited By

Zhang XPayani ALee MSouvenir RGao HSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)A federated stochastic multi-level compositional minimax algorithm for deep AUC maximizationProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694533(59601-59640)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3694533
Yang MWei XYang TYing YSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Stability and generalization of stochastic compositional gradient descent algorithmsProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694405(56542-56593)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3694405
Wang YPan BLi MLu JKong LJiang BKong LSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Sample average approximation for conditional stochastic optimization with dependent dataProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694171(51237-51254)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3694171
Show More Cited By

Stochastic compositional gradient descent: algorithms for minimizing compositions of expected-value functions

Recommendations

Multilevel Stochastic Gradient Methods for Nested Composition Optimization

Stochastic gradient methods are scalable for solving large-scale optimization problems that involve empirical expectations of loss functions. Existing results mainly apply to optimization problems where the objectives are one- or two-level expectations. In ...
Stochastic Intermediate Gradient Method for Convex Problems with Stochastic Inexact Oracle

In this paper, we introduce new methods for convex optimization problems with stochastic inexact oracle. Our first method is an extension of the Intermediate Gradient Method proposed by Devolder, Glineur and Nesterov for problems with deterministic ...
A coordinate gradient descent method for nonsmooth separable minimization

We consider the problem of minimizing the sum of a smooth function and a separable convex function. This problem includes as special cases bound-constrained optimization and smooth optimization with ℓ1-regularization. We propose a (block) coordinate ...

Comments

Information & Contributors

Information

Published In

cover image Mathematical Programming: Series A and B

Mathematical Programming: Series A and B Volume 161, Issue 1-2

Jan 2017

615 pages

ISSN:0025-5610

Issue’s Table of Contents

Copyright © Copyright © 2017 Springer-Verlag Berlin Heidelberg and Mathematical Optimization Society.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 01 January 2017

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

60
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 16 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhang XPayani ALee MSouvenir RGao HSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)A federated stochastic multi-level compositional minimax algorithm for deep AUC maximizationProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694533(59601-59640)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3694533
Yang MWei XYang TYing YSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Stability and generalization of stochastic compositional gradient descent algorithmsProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694405(56542-56593)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3694405
Wang YPan BLi MLu JKong LJiang BKong LSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Sample average approximation for conditional stochastic optimization with dependent dataProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694171(51237-51254)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3694171
Pan XLi XLiu JSun TSun KChen LQu ZSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Stability and generalization for stochastic recursive momentum-based algorithms for (strongly-)convex one to k-level stochastic optimizationsProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693660(39201-39275)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3693660
Jiang WYang SYang WWang YWan YZhang LSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Projection-free variance reduction methods for stochastic constrained multi-level compositional optimizationProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692952(21962-21987)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3692952
Gao HSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)A doubly recursive stochastic compositional gradient descent method for federated multi-level compositional optimizationProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692652(14540-14610)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3692652
Liu JPan XDuan JLi HLi YQu ZWooldridge MDy JNatarajan S(2024)Faster stochastic variance reduction methods for compositional MiniMax optimizationProceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v38i12.29300(13927-13935)Online publication date: 20-Feb-2024
https://dl.acm.org/doi/10.1609/aaai.v38i12.29300
Yang SLi F(2024)A Communication-Efficient Algorithm for Federated Multilevel Stochastic Compositional OptimizationIEEE Transactions on Signal Processing10.1109/TSP.2024.339235172(2333-2347)Online publication date: 23-Apr-2024
https://dl.acm.org/doi/10.1109/TSP.2024.3392351
Grieshammer MPflug LStingl MUihlein A(2024)The continuous stochastic gradient method: part I–convergence theoryComputational Optimization and Applications10.1007/s10589-023-00542-887:3(935-976)Online publication date: 1-Apr-2024
https://dl.acm.org/doi/10.1007/s10589-023-00542-8
Grieshammer MPflug LStingl MUihlein A(2024)The continuous stochastic gradient method: part II–application and numericsComputational Optimization and Applications10.1007/s10589-023-00540-w87:3(977-1008)Online publication date: 1-Apr-2024
https://dl.acm.org/doi/10.1007/s10589-023-00540-w
Show More Cited By

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents