Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Stochastic compositional gradient descent: algorithms for minimizing compositions of expected-value functions

Published: 01 January 2017 Publication History

Abstract

Classical stochastic gradient methods are well suited for minimizing expected-value objective functions. However, they do not apply to the minimization of a nonlinear function involving expected values or a composition of two expected-value functions, i.e., the problem $$\min _x \mathbf{E}_v\left[ f_v\big (\mathbf{E}_w [g_w(x)]\big ) \right] .$$minxEvfv(Ew[gw(x)]). In order to solve this stochastic composition problem, we propose a class of stochastic compositional gradient descent (SCGD) algorithms that can be viewed as stochastic versions of quasi-gradient method. SCGD update the solutions based on noisy sample gradients of $$f_v,g_{w}$$fv,gw and use an auxiliary variable to track the unknown quantity $$\mathbf{E}_w\left[ g_w(x)\right] $$Ewgw(x). We prove that the SCGD converge almost surely to an optimal solution for convex optimization problems, as long as such a solution exists. The convergence involves the interplay of two iterations with different time scales. For nonsmooth convex problems, the SCGD achieves a convergence rate of $$\mathcal {O}(k^{-1/4})$$O(k-1/4) in the general case and $$\mathcal {O}(k^{-2/3})$$O(k-2/3) in the strongly convex case, after taking k samples. For smooth convex problems, the SCGD can be accelerated to converge at a rate of $$\mathcal {O}(k^{-2/7})$$O(k-2/7) in the general case and $$\mathcal {O}(k^{-4/5})$$O(k-4/5) in the strongly convex case. For nonconvex problems, we prove that any limit point generated by SCGD is a stationary point, for which we also provide the convergence rate analysis. Indeed, the stochastic setting where one wants to optimize compositions of expected-value functions is very common in practice. The proposed SCGD methods find wide applications in learning, estimation, dynamic programming, etc.

References

[1]
Agarwal, A., Bartlett, P., Ravikumar, P., Wainwright, M.: Information-theoretic lower bounds on the oracle complexity of stochastic convex optimization. IEEE Trans. Inf. Theory 58(5), 3235---3249 (2012)
[2]
Amini, A.A., Wainwright, M.J.: High-dimensional analysis of semidefinite relaxations for sparse principal components. Ann. Stat. 37(5B), 2877---2921 (2009)
[3]
Benveniste, A., Métivier, M., Priouret, P.: Adaptive Algorithms and Stochastic Approximations. Springer, Berlin (2012)
[4]
Bertsekas, D.P.: Dynamic Programming and Optimal Control, Volume I---II, 4th edn. Athena Scientific, Belmont (2007)
[5]
Bertsekas, D.P.: Incremental proximal methods for large scale convex optimization. Math. Program. Ser. B 129, 163---195 (2011)
[6]
Bertsekas, D.P., Tsitsiklis, J.N.: Parallel and Distributed Computation: Numerical Methods. Athena Scientific, Belmont (1989)
[7]
Bhatnagar, S., Borkar, V.S.: A two timescale stochastic approximation scheme for simulation-based parametric optimization. Prob. Eng. Inf. Sci. 12, 519---531 (1998)
[8]
Borkar, V.S.: Stochastic approximation with two time scales. Syst. Control Lett. 29, 291---294 (1997)
[9]
Borkar, V.S.: Stochastic Approximation: A Dynamical Systems Viewpoint. Cambridge University Press, Cambridge (2008)
[10]
Dentcheva, D., Penev, S., Ruszczynski, A.: Statistical estimation of composite risk functionals and risk optimization problems. arXiv preprint. arXiv:1504.02658 (2015)
[11]
Ermoliev, Y., Wets, R.: Numerical Techniques for Stochastic Optimization. Springer, New York (2011)
[12]
Ermoliev, Y.M.: Methods of Stochastic Programming. Monographs in Optimization and OR. Nauka, Moscow (1976)
[13]
Ghadimi, S., Lan, G.: Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization I: a generic algorithmic framework. SIAM J. Optim. 22(4), 1469---1492 (2012)
[14]
Ghadimi, S., Lan, G.: Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization, II: shrinking procedures and optimal algorithms. SIAM J. Optim. 23(4), 2061---2089 (2013)
[15]
Huang, J., Horowitz, J.L., Wei, F.: Variable selection in nonparametric additive models. Ann. Stat. 38(4), 2282 (2010)
[16]
Kiefer, J., Wolfowitz, J.: Stochastic estimation of the maximum of a regression function. Ann. Math. Stat. 23(3), 462---466 (1952)
[17]
Konda, R., Tsitsikilis, J.N.: Convergence rate of linear two-time-scale stochastic approximation. Ann. Appl. Prob. 14, 796---819 (2004)
[18]
Korostelev, A.P.: Stochastic Recurrent Procedures. Nauka, Moscow (1984)
[19]
Kushner, H.J., Yin, G.: Stochastic Approximation and Recursive Algorithms and Applications. Springer, New York (2003)
[20]
Nedić, A.: Random algorithms for convex minimization problems. Math. Prog. Ser. B 129, 225---253 (2011)
[21]
Nedić, A., Bertsekas, D.P.: Incremental subgradient methods for nondifferentiable optimization. SIAM J. Optim. 12, 109---138 (2001)
[22]
Nedić, A., Bertsekas, D.P., Borkar, V.S.: Distributed asynchronous incremental subgradient methods. Stud. Comput. Math. 8, 381---407 (2001)
[23]
Nemirovski, A., Rubinstein, R.Y.: An efficient stochastic approximation algorithm for stochastic saddle point problems. In: Dror, M., L'Ecuyer, P., Szidarovszky, F. (eds.) Modeling Uncertainty, pp. 156---184. Springer (2005)
[24]
Nemirovsky, A., Yudin, D.: Problem Complexity and Method Efficiency in Optimization. Wiley, New York (1983)
[25]
Nesterov, Y.: A method of solving a convex programming problem with convergence rate $${\cal O}(1/k^2)$$O(1/k2). Sov. Math. Dokl. 27(2), 372---376 (1983)
[26]
Polyak, B.T., Juditsky, A.: Acceleration of stochastic approximation by averaging. SIAM J. Control Optim. 30(4), 838---855 (1992)
[27]
Rakhlin, A., Shamir, O., Sridharan, K.: Making gradient descent optimal for strongly convex stochastic optimization. In: Proceedings of the 29th International Conference on Machine Learning, pp. 449---456 (2012)
[28]
Ravikumar, P., Lafferty, J., Liu, H., Wasserman, L.: Sparse additive models. J. R. Stat. Soc. Ser. B 71(5), 1009---1030 (2009)
[29]
Robbins, H., Siegmund, D.: A convergence theorem for nonnegative almost supermartingales and some applications. J.S. Rustagi, Optimizing Methods in Statistics, pp. 233---257. Academic Press, NY (1985)
[30]
Shamir, O., Zhang, T.: Stochastic gradient descent for non-smooth optimization: convergence results and optimal averaging schemes. In: Proceedings of the 30th International Conference on Machine Learning, pp. 71---79 (2013)
[31]
Shapiro, A., Dentcheva, D., Ruszczynski, A.: Lectures on stochastic programming: modeling and theory. MOS-SIAM series on optimization. SIAM, Philadelphia (2009)
[32]
Tibshirani, R.: Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Ser. B 58, 267---288 (1996)
[33]
Wang, M., Bertsekas, D.P.: Incremental constraint projection methods for variational inequalities. Math. Program. Ser. A 150(2), 321---363 (2015)
[34]
Wang, M., Bertsekas, D.P.: Incremental constraint projection-proximal methods for nonsmooth convex optimization. SIAM J. Optim. (2014) (in press)
[35]
Yuan, M., Lin, Y.: Model selection and estimation in the Gaussian graphical model. Biometrika 94(1), 19---35 (2007)

Cited By

View all
  • (2024)A federated stochastic multi-level compositional minimax algorithm for deep AUC maximizationProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694533(59601-59640)Online publication date: 21-Jul-2024
  • (2024)Stability and generalization of stochastic compositional gradient descent algorithmsProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694405(56542-56593)Online publication date: 21-Jul-2024
  • (2024)Sample average approximation for conditional stochastic optimization with dependent dataProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694171(51237-51254)Online publication date: 21-Jul-2024
  • Show More Cited By
  1. Stochastic compositional gradient descent: algorithms for minimizing compositions of expected-value functions

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image Mathematical Programming: Series A and B
        Mathematical Programming: Series A and B  Volume 161, Issue 1-2
        Jan 2017
        615 pages

        Publisher

        Springer-Verlag

        Berlin, Heidelberg

        Publication History

        Published: 01 January 2017

        Author Tags

        1. 68W27
        2. 90C06
        3. 90C15
        4. 90C25
        5. Convex optimization
        6. Sample complexity
        7. Simulation
        8. Statistical learning
        9. Stochastic gradient
        10. Stochastic optimization

        Qualifiers

        • Article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 16 Feb 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)A federated stochastic multi-level compositional minimax algorithm for deep AUC maximizationProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694533(59601-59640)Online publication date: 21-Jul-2024
        • (2024)Stability and generalization of stochastic compositional gradient descent algorithmsProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694405(56542-56593)Online publication date: 21-Jul-2024
        • (2024)Sample average approximation for conditional stochastic optimization with dependent dataProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3694171(51237-51254)Online publication date: 21-Jul-2024
        • (2024)Stability and generalization for stochastic recursive momentum-based algorithms for (strongly-)convex one to k-level stochastic optimizationsProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3693660(39201-39275)Online publication date: 21-Jul-2024
        • (2024)Projection-free variance reduction methods for stochastic constrained multi-level compositional optimizationProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692952(21962-21987)Online publication date: 21-Jul-2024
        • (2024)A doubly recursive stochastic compositional gradient descent method for federated multi-level compositional optimizationProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692652(14540-14610)Online publication date: 21-Jul-2024
        • (2024)Faster stochastic variance reduction methods for compositional MiniMax optimizationProceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v38i12.29300(13927-13935)Online publication date: 20-Feb-2024
        • (2024)A Communication-Efficient Algorithm for Federated Multilevel Stochastic Compositional OptimizationIEEE Transactions on Signal Processing10.1109/TSP.2024.339235172(2333-2347)Online publication date: 23-Apr-2024
        • (2024)The continuous stochastic gradient method: part I–convergence theoryComputational Optimization and Applications10.1007/s10589-023-00542-887:3(935-976)Online publication date: 1-Apr-2024
        • (2024)The continuous stochastic gradient method: part II–application and numericsComputational Optimization and Applications10.1007/s10589-023-00540-w87:3(977-1008)Online publication date: 1-Apr-2024
        • Show More Cited By

        View Options

        View options

        Figures

        Tables

        Media

        Share

        Share

        Share this Publication link

        Share on social media