Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Stochastic Convex Optimization with Bandit Feedback

Published: 01 January 2013 Publication History

Abstract

This paper addresses the problem of minimizing a convex, Lipschitz function $f$ over a convex, compact set $\mathcal{X}$ under a stochastic bandit (i.e., noisy zeroth-order) feedback model. In this model, the algorithm is allowed to observe noisy realizations of the function value $f(x)$ at any query point $x \in \mathcal{X}$. The quantity of interest is the regret of the algorithm, which is the sum of the function values at algorithm's query points minus the optimal function value. We demonstrate a generalization of the ellipsoid algorithm that incurs $\widetilde{\mathcal{O}}({\rm poly}(d)\sqrt{T})$ regret. Since any algorithm has regret at least $\Omega(\sqrt{T})$ on this problem, our algorithm is optimal in terms of the scaling with $T$.

References

[1]
A. Agarwal, O. Dekel, and L. Xiao, Optimal algorithms for online convex optimization with multi-point bandit feedback, in Proceedings of COLT, 2010.
[2]
R. Agrawal, The continuum-armed bandit problem, SIAM J. Control Optim., 33 (1995), pp. 1926--1951.
[3]
P. Auer, R. Ortner, and C. Szepesvári, Improved rates for the stochastic continuum-armed bandit problem, in Proceedings of COLT, 2007, pp. 454--468.
[4]
D. Bertsimas and S. Vempala, Solving convex programs by random walks, J. ACM, 51 (2004), pp. 540--556.
[5]
S. Bubeck, R. Munos, G. Stolz, and C. Szepesvári, $\mathcal{X}$-armed bandits, J. Mach. Learn. Res., 12 (2011), pp. 1655--1695.
[6]
V. V. Buldygin and Yu. V. Kozachenko, Sub-Gaussian random variables, Ukrainian Math. J., 32 (1980), pp. 483--489.
[7]
A. R. Conn, K. Scheinberg, and L. N. Vicente, Introduction to Derivative-Free Optimization, SIAM, Philadelphia, 2009.
[8]
E. W. Cope, Regret and convergence bounds for a class of continuum-armed bandit problems, IEEE Trans. Automat. Control, 54 (2009), pp. 1243--1253.
[9]
V. Dani, T. P. Hayes, and S. M. Kakade, Stochastic linear optimization under bandit feedback, in Proceedings of the 21st Annual Conference on Learning Theory (COLT), 2008.
[10]
A. D. Flaxman, A. T. Kalai, and B. H. Mcmahan, Online convex optimization in the bandit setting: Gradient descent without a gradient, in Proceedings of the 16th Annual ACM-SIAM Symposium on Discrete Algorithms, 2005, pp. 385--394.
[11]
D. Goldfarb and M. J. Todd, Modifications and implementation of the ellipsoid algorithm for linear programming, Math. Program., 23 (1982), pp. 1--19.
[12]
J. Kiefer and J. Wolfowitz, Stochastic estimation of the maximum of a regression function, Ann. Math. Statist., 23 (1952), pp. 462--466.
[13]
R. Kleinberg, Nearly tight bounds for the continuum-armed bandit problem, Adv. Neural Inf. Process. Syst., 18 (2005).
[14]
R. Kleinberg, A. Slivkins, and E. Upfal, Multi-armed bandits in metric spaces, in Proceedings of the 40th Annual ACM Symposium on Theory of Computing, 2008, pp. 681--690.
[15]
L. Lovász, Geometric algorithms and algorithmic geometry, in Proceedings of International Congress of Mathematicians, 1990, pp. 139--154.
[16]
A. Nemirovski and D. Yudin, Problem Complexity and Method Efficiency in Optimization, Wiley, New York, 1983.
[17]
Y. Nesterov, Random Gradient-Free Minimization of Convex Functions, Technical report 2011/1, Center for Operations Research and Econometrics, Université catholique de Louvain, 2011.
[18]
M. Raginsky and A. Rakhlin, Information-based complexity, feedback and dynamics in convex programming, IEEE Trans. Inform. Theory, 57 (2011), pp. 7036--7056.
[19]
N. Srinivas, A. Krause, S.M. Kakade, and M. Seeger, Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design, arXiv:0912.3995, 2009.
[20]
J. Y. Yu and S. Mannor, Unimodal bandits, in Proceedings of ICML, 2011.

Cited By

View all
  • (2022)Gradient-free methods for deterministic and stochastic nonsmooth nonconvex optimizationProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602167(26160-26175)Online publication date: 28-Nov-2022
  • (2022)Joint Learning and Control in Stochastic Queueing Networks with Unknown UtilitiesProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/35706196:3(1-32)Online publication date: 8-Dec-2022
  • (2022)Elastic Job Scheduling with Unknown Utility FunctionsACM SIGMETRICS Performance Evaluation Review10.1145/3529113.352913749:3(67-68)Online publication date: 25-Mar-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image SIAM Journal on Optimization
SIAM Journal on Optimization  Volume 23, Issue 1
2013
686 pages
ISSN:1052-6234
DOI:10.1137/sjope8.23.1
Issue’s Table of Contents

Publisher

Society for Industrial and Applied Mathematics

United States

Publication History

Published: 01 January 2013

Author Tags

  1. derivative-free optimization
  2. bandit optimization
  3. ellipsoid method

Author Tags

  1. 90C56
  2. 90C25
  3. 68T05

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Gradient-free methods for deterministic and stochastic nonsmooth nonconvex optimizationProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602167(26160-26175)Online publication date: 28-Nov-2022
  • (2022)Joint Learning and Control in Stochastic Queueing Networks with Unknown UtilitiesProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/35706196:3(1-32)Online publication date: 8-Dec-2022
  • (2022)Elastic Job Scheduling with Unknown Utility FunctionsACM SIGMETRICS Performance Evaluation Review10.1145/3529113.352913749:3(67-68)Online publication date: 25-Mar-2022
  • (2021)Approximate optimization of convex functions with outlier noiseProceedings of the 35th International Conference on Neural Information Processing Systems10.5555/3540261.3540884(8147-8157)Online publication date: 6-Dec-2021
  • (2021)Optimal Policy for Dynamic Assortment Planning Under Multinomial Logit ModelsMathematics of Operations Research10.1287/moor.2021.113346:4(1639-1657)Online publication date: 1-Nov-2021
  • (2021)Multimodal Dynamic PricingManagement Science10.1287/mnsc.2020.381967:10(6136-6152)Online publication date: 1-Oct-2021
  • (2021)Budget Allocation as a Multi-Agent System of Contextual & Continuous BanditsProceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining10.1145/3447548.3467124(2937-2945)Online publication date: 14-Aug-2021
  • (2020)Multidimensional Dynamic Pricing for Welfare MaximizationACM Transactions on Economics and Computation10.1145/33815278:1(1-35)Online publication date: 17-Apr-2020
  • (2020)Unimodal Bandits with Continuous ArmsProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/33794804:1(1-28)Online publication date: 5-Jun-2020
  • (2019)Technical Note—Nonstationary Stochastic Optimization Under L-Variation MeasuresOperations Research10.1287/opre.2019.184367:6(1752-1765)Online publication date: 1-Nov-2019

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media