Article

Free access

Zeroth-order (Non)-convex stochastic optimization via conditional gradient and gradient updates

Authors:

Krishnakumar Balasubramanian,

Saeed GhadimiAuthors Info & Claims

NIPS'18: Proceedings of the 32nd International Conference on Neural Information Processing Systems

Pages 3459 - 3468

Published: 03 December 2018 Publication History

PDF eReader Publisher Site

Abstract

In this paper, we propose and analyze zeroth-order stochastic approximation algorithms for nonconvex and convex optimization. Specifically, we propose generalizations of the conditional gradient algorithm achieving rates similar to the standard stochastic gradient algorithm using only zeroth-order information. Furthermore, under a structural sparsity assumption, we first illustrate an implicit regularization phenomenon where the standard stochastic gradient algorithm with zeroth-order information adapts to the sparsity of the problem at hand by just varying the stepsize. Next, we propose a truncated stochastic gradient algorithm with zeroth-order information, whose rate depends only poly-logarithmically on the dimensionality.

References

[1]

Sébastien Bubeck. Convex optimization: Algorithms and complexity. Foundations and Trends® in Machine Learning, 8(3-4):231-357, 2015.

Digital Library

[2]

Sébastien Bubeck and Nicolo Cesa-Bianchi. Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations and Trends® in Machine Learning, 5(1):1-122, 2012.

[3]

Pin-Yu Chen, Huan Zhang, Yash Sharma, Jinfeng Yi, and Cho-Jui Hsieh. Zoo: Zeroth order optimization based black-box attacks to deep neural networks without training substitute models. In Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pages 15-26. ACM, 2017.

Digital Library

[4]

Krzysztof Choromanski, Mark Rowland, Vikas Sindhwani, Richard Turner, and Adrian Weller. Structured evolution with compact architectures for scalable policy optimization. In Proceedings of the 35th International Conference on Machine Learning. PMLR, 2018.

[5]

Andrew Conn, Katya Scheinberg, and Luis Vicente. Introduction to derivative-free optimization, volume 8. Siam, 2009.

Digital Library

[6]

V. Demyanov and A. Rubinov. Approximate methods in optimization problems. American Elsevier Publishing Co, 1970.

[7]

John Duchi, Michael Jordan, Martin Wainwright, and Andre Wibisono. Optimal rates for zero-order convex optimization: The power of two function evaluations. IEEE Transactions on Information Theory, 61(5):2788-2806, 2015.

Digital Library

[8]

Marguerite Frank and Philip Wolfe. An algorithm for quadratic programming. Naval Research Logistics Quarterly, 3:95-110, 1956.

[9]

S. Ghadimi and G. Lan. Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM Journal on Optimization, 23(4):2341-2368, 2013.

Digital Library

[10]

Saeed Ghadimi. Conditional gradient type methods for composite nonlinear and stochastic optimization. Mathematical Programming, 2018.

Digital Library

[11]

Ian Goodfellow, Yoshua Bengio, Aaron Courville, and Yoshua Bengio. Deep learning, volume 1. MIT press Cambridge, 2016.

Digital Library

[12]

Elad Hazan and Satyen Kale. Projection-free online learning. In Proceedings of the 29th International Coference on International Conference on Machine Learning, pages 1843-1850. Omnipress, 2012.

Digital Library

[13]

Elad Hazan and Haipeng Luo. Variance-reduced and projection-free stochastic optimization. In International Conference on Machine Learning, pages 1263-1271, 2016.

Digital Library

[14]

Donald Hearn. The gap function of a convex program. Operations Research Letters, 2, 1982.

Digital Library

[15]

Martin Jaggi. Revisiting frank-wolfe: Projection-free sparse convex optimization. In ICML (1), pages 427-435, 2013.

Digital Library

[16]

Prateek Jain and Purushottam Kar. Non-convex optimization for machine learning. Foundations and Trends® in Machine Learning, 10(3-4):142-336, 2017.

Digital Library

[17]

Prateek Jain, Ambuj Tewari, and Purushottam Kar. On iterative hard thresholding methods for high-dimensional m-estimation. In Advances in Neural Information Processing Systems, pages 685-693, 2014.

Digital Library

[18]

Kevin Jamieson, Robert Nowak, and Ben Recht. Query complexity of derivative-free optimization. In Advances in Neural Information Processing Systems, pages 2672-2680, 2012.

Digital Library

[19]

Guanghui Lan and Yi Zhou. Conditional gradient sliding for convex optimization. SIAM Journal on Optimization, 26(2):1379-1409, 2016.

Digital Library

[20]

Horia Mania, Aurelia Guy, and Benjamin Recht. Simple random search provides a competitive approach to reinforcement learning. In Advances in Neural Information Processing Systems, 2018.

[21]

Jonas Mockus. Bayesian approach to global optimization: theory and applications, volume 37. Springer Science & Business Media, 2012.

[22]

Aryan Mokhtari, Hamed Hassani, and Amin Karbasi. Conditional gradient method for stochastic submodular maximization: Closing the gap. In International Conference on Artificial Intelligence and Statistics, pages 1886-1895, 2018.

[23]

Aryan Mokhtari, Hamed Hassani, and Amin Karbasi. Stochastic conditional gradient methods: From convex minimization to submodular maximization. arXiv preprint arXiv:1804.09554, 2018.

[24]

A. S. Nemirovski and D. Yudin. Problem complexity and method efficiency in optimization. Wiley-Interscience Series in Discrete Mathematics. John Wiley, XV, 1983.

[25]

Y. E. Nesterov. Introductory Lectures on Convex Optimization: a basic course. Kluwer Academic Publishers, Massachusetts, 2004.

Digital Library

[26]

Yurii Nesterov. Introductory lectures on convex optimization: A basic course, volume 87. Springer Science & Business Media, 2013.

Digital Library

[27]

Yurii Nesterov and Vladimir Spokoiny. Random gradient-free minimization of convex functions. Foundations of Computational Mathematics, 17(2):527-566, 2017.

Digital Library

[28]

Sashank Reddi, Suvrit Sra, Barnabás Póczos, and Alexander Smola. Stochastic Frank-Wolfe Methods for Nonconvex Optimization. 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pages 1244-1251, 2016.

[29]

Reuven Rubinstein and Dirk Kroese. Simulation and the Monte Carlo method, volume 10. John Wiley & Sons, 2016.

Digital Library

[30]

Tim Salimans, Jonathan Ho, Xi Chen, Szymon Sidor, and Ilya Sutskever. Evolution strategies as a scalable alternative to reinforcement learning. arXiv preprint arXiv:1703.03864, 2017.

[31]

Ohad Shamir. On the complexity of bandit and derivative-free stochastic convex optimization. In Conference on Learning Theory, pages 3-24, 2013.

[32]

Jasper Snoek, Hugo Larochelle, and Ryan Adams. Practical bayesian optimization of machine learning algorithms. In Advances in neural information processing systems, pages 2951-2959, 2012.

Digital Library

[33]

James Spall. Introduction to stochastic search and optimization: estimation, simulation, and control, volume 65. John Wiley & Sons, 2005.

Digital Library

[34]

Yining Wang, Simon Du, Sivaraman Balakrishnan, and Aarti Singh. Stochastic zeroth-order optimization in high dimensions. Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, 2018.

Cited By

Ahad AWang GKim CJana SLin ZKwon YAamodt TSwift MJerger N(2023)FreePart: Hardening Data Processing Software via Framework-based Partitioning and IsolationProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 410.1145/3623278.3624760(169-188)Online publication date: 25-Mar-2023
https://dl.acm.org/doi/10.1145/3623278.3624760
Morinaga DFukuchi KSakuma JAkimoto YKrawiec K(2021)Convergence rate of the (1+1)-evolution strategy with success-based step-size adaptation on convex quadratic functionsProceedings of the Genetic and Evolutionary Computation Conference10.1145/3449639.3459289(1169-1177)Online publication date: 26-Jun-2021
https://dl.acm.org/doi/10.1145/3449639.3459289

Zeroth-order (Non)-convex stochastic optimization via conditional gradient and gradient updates
1. Information systems
  1. Data management systems
    1. Database administration
2. Theory of computation
  1. Design and analysis of algorithms

Recommendations

Inexact proximal stochastic gradient method for convex composite optimization

We study an inexact proximal stochastic gradient (IPSG) method for convex composite optimization, whose objective function is a summation of an average of a large number of smooth convex functions and a convex, but possibly nonsmooth, function. Variance ...
A memory gradient method for non-smooth convex optimization

Based on the Moreau–Yosida regularization and a modified line search technique, this paper presents an implementable memory gradient method for solving a possibly non-differentiable convex minimization problem by converting the original objective ...
Inexact proximal gradient methods for non-convex and non-smooth optimization
AAAI'18/IAAI'18/EAAI'18: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence

In machine learning research, the proximal gradient methods are popular for solving various optimization problems with non-smooth regularization. Inexact proximal gradient methods are extremely important when exactly solving the proximal operator is time-...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

NIPS'18: Proceedings of the 32nd International Conference on Neural Information Processing Systems

December 2018

11021 pages

Publisher

Curran Associates Inc.

Red Hook, NY, United States

Publication History

Published: 03 December 2018

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
70
Total Downloads

Downloads (Last 12 months)28
Downloads (Last 6 weeks)7

Reflects downloads up to 04 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Ahad AWang GKim CJana SLin ZKwon YAamodt TSwift MJerger N(2023)FreePart: Hardening Data Processing Software via Framework-based Partitioning and IsolationProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 410.1145/3623278.3624760(169-188)Online publication date: 25-Mar-2023
https://dl.acm.org/doi/10.1145/3623278.3624760
Morinaga DFukuchi KSakuma JAkimoto YKrawiec K(2021)Convergence rate of the (1+1)-evolution strategy with success-based step-size adaptation on convex quadratic functionsProceedings of the Genetic and Evolutionary Computation Conference10.1145/3449639.3459289(1169-1177)Online publication date: 26-Jun-2021
https://dl.acm.org/doi/10.1145/3449639.3459289

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents