Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/3327144.3327264guideproceedingsArticle/Chapter ViewAbstractPublication PagesnipsConference Proceedingsconference-collections
Article
Free access

Zeroth-order (Non)-convex stochastic optimization via conditional gradient and gradient updates

Published: 03 December 2018 Publication History

Abstract

In this paper, we propose and analyze zeroth-order stochastic approximation algorithms for nonconvex and convex optimization. Specifically, we propose generalizations of the conditional gradient algorithm achieving rates similar to the standard stochastic gradient algorithm using only zeroth-order information. Furthermore, under a structural sparsity assumption, we first illustrate an implicit regularization phenomenon where the standard stochastic gradient algorithm with zeroth-order information adapts to the sparsity of the problem at hand by just varying the stepsize. Next, we propose a truncated stochastic gradient algorithm with zeroth-order information, whose rate depends only poly-logarithmically on the dimensionality.

References

[1]
Sébastien Bubeck. Convex optimization: Algorithms and complexity. Foundations and Trends® in Machine Learning, 8(3-4):231-357, 2015.
[2]
Sébastien Bubeck and Nicolo Cesa-Bianchi. Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations and Trends® in Machine Learning, 5(1):1-122, 2012.
[3]
Pin-Yu Chen, Huan Zhang, Yash Sharma, Jinfeng Yi, and Cho-Jui Hsieh. Zoo: Zeroth order optimization based black-box attacks to deep neural networks without training substitute models. In Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security, pages 15-26. ACM, 2017.
[4]
Krzysztof Choromanski, Mark Rowland, Vikas Sindhwani, Richard Turner, and Adrian Weller. Structured evolution with compact architectures for scalable policy optimization. In Proceedings of the 35th International Conference on Machine Learning. PMLR, 2018.
[5]
Andrew Conn, Katya Scheinberg, and Luis Vicente. Introduction to derivative-free optimization, volume 8. Siam, 2009.
[6]
V. Demyanov and A. Rubinov. Approximate methods in optimization problems. American Elsevier Publishing Co, 1970.
[7]
John Duchi, Michael Jordan, Martin Wainwright, and Andre Wibisono. Optimal rates for zero-order convex optimization: The power of two function evaluations. IEEE Transactions on Information Theory, 61(5):2788-2806, 2015.
[8]
Marguerite Frank and Philip Wolfe. An algorithm for quadratic programming. Naval Research Logistics Quarterly, 3:95-110, 1956.
[9]
S. Ghadimi and G. Lan. Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM Journal on Optimization, 23(4):2341-2368, 2013.
[10]
Saeed Ghadimi. Conditional gradient type methods for composite nonlinear and stochastic optimization. Mathematical Programming, 2018.
[11]
Ian Goodfellow, Yoshua Bengio, Aaron Courville, and Yoshua Bengio. Deep learning, volume 1. MIT press Cambridge, 2016.
[12]
Elad Hazan and Satyen Kale. Projection-free online learning. In Proceedings of the 29th International Coference on International Conference on Machine Learning, pages 1843-1850. Omnipress, 2012.
[13]
Elad Hazan and Haipeng Luo. Variance-reduced and projection-free stochastic optimization. In International Conference on Machine Learning, pages 1263-1271, 2016.
[14]
Donald Hearn. The gap function of a convex program. Operations Research Letters, 2, 1982.
[15]
Martin Jaggi. Revisiting frank-wolfe: Projection-free sparse convex optimization. In ICML (1), pages 427-435, 2013.
[16]
Prateek Jain and Purushottam Kar. Non-convex optimization for machine learning. Foundations and Trends® in Machine Learning, 10(3-4):142-336, 2017.
[17]
Prateek Jain, Ambuj Tewari, and Purushottam Kar. On iterative hard thresholding methods for high-dimensional m-estimation. In Advances in Neural Information Processing Systems, pages 685-693, 2014.
[18]
Kevin Jamieson, Robert Nowak, and Ben Recht. Query complexity of derivative-free optimization. In Advances in Neural Information Processing Systems, pages 2672-2680, 2012.
[19]
Guanghui Lan and Yi Zhou. Conditional gradient sliding for convex optimization. SIAM Journal on Optimization, 26(2):1379-1409, 2016.
[20]
Horia Mania, Aurelia Guy, and Benjamin Recht. Simple random search provides a competitive approach to reinforcement learning. In Advances in Neural Information Processing Systems, 2018.
[21]
Jonas Mockus. Bayesian approach to global optimization: theory and applications, volume 37. Springer Science & Business Media, 2012.
[22]
Aryan Mokhtari, Hamed Hassani, and Amin Karbasi. Conditional gradient method for stochastic submodular maximization: Closing the gap. In International Conference on Artificial Intelligence and Statistics, pages 1886-1895, 2018.
[23]
Aryan Mokhtari, Hamed Hassani, and Amin Karbasi. Stochastic conditional gradient methods: From convex minimization to submodular maximization. arXiv preprint arXiv:1804.09554, 2018.
[24]
A. S. Nemirovski and D. Yudin. Problem complexity and method efficiency in optimization. Wiley-Interscience Series in Discrete Mathematics. John Wiley, XV, 1983.
[25]
Y. E. Nesterov. Introductory Lectures on Convex Optimization: a basic course. Kluwer Academic Publishers, Massachusetts, 2004.
[26]
Yurii Nesterov. Introductory lectures on convex optimization: A basic course, volume 87. Springer Science & Business Media, 2013.
[27]
Yurii Nesterov and Vladimir Spokoiny. Random gradient-free minimization of convex functions. Foundations of Computational Mathematics, 17(2):527-566, 2017.
[28]
Sashank Reddi, Suvrit Sra, Barnabás Póczos, and Alexander Smola. Stochastic Frank-Wolfe Methods for Nonconvex Optimization. 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pages 1244-1251, 2016.
[29]
Reuven Rubinstein and Dirk Kroese. Simulation and the Monte Carlo method, volume 10. John Wiley & Sons, 2016.
[30]
Tim Salimans, Jonathan Ho, Xi Chen, Szymon Sidor, and Ilya Sutskever. Evolution strategies as a scalable alternative to reinforcement learning. arXiv preprint arXiv:1703.03864, 2017.
[31]
Ohad Shamir. On the complexity of bandit and derivative-free stochastic convex optimization. In Conference on Learning Theory, pages 3-24, 2013.
[32]
Jasper Snoek, Hugo Larochelle, and Ryan Adams. Practical bayesian optimization of machine learning algorithms. In Advances in neural information processing systems, pages 2951-2959, 2012.
[33]
James Spall. Introduction to stochastic search and optimization: estimation, simulation, and control, volume 65. John Wiley & Sons, 2005.
[34]
Yining Wang, Simon Du, Sivaraman Balakrishnan, and Aarti Singh. Stochastic zeroth-order optimization in high dimensions. Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics, 2018.

Cited By

View all
  • (2023)FreePart: Hardening Data Processing Software via Framework-based Partitioning and IsolationProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 410.1145/3623278.3624760(169-188)Online publication date: 25-Mar-2023
  • (2021)Convergence rate of the (1+1)-evolution strategy with success-based step-size adaptation on convex quadratic functionsProceedings of the Genetic and Evolutionary Computation Conference10.1145/3449639.3459289(1169-1177)Online publication date: 26-Jun-2021
  1. Zeroth-order (Non)-convex stochastic optimization via conditional gradient and gradient updates

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image Guide Proceedings
      NIPS'18: Proceedings of the 32nd International Conference on Neural Information Processing Systems
      December 2018
      11021 pages

      Publisher

      Curran Associates Inc.

      Red Hook, NY, United States

      Publication History

      Published: 03 December 2018

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)28
      • Downloads (Last 6 weeks)7
      Reflects downloads up to 04 Oct 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)FreePart: Hardening Data Processing Software via Framework-based Partitioning and IsolationProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 410.1145/3623278.3624760(169-188)Online publication date: 25-Mar-2023
      • (2021)Convergence rate of the (1+1)-evolution strategy with success-based step-size adaptation on convex quadratic functionsProceedings of the Genetic and Evolutionary Computation Conference10.1145/3449639.3459289(1169-1177)Online publication date: 26-Jun-2021

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Get Access

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media