research-article

Stochastic Convex Optimization with Bandit Feedback

Authors:

Dean P. Foster,

Sham M. Kakade,

Alexander RakhlinAuthors Info & Claims

SIAM Journal on Optimization, Volume 23, Issue 1

Pages 213 - 240

https://doi.org/10.1137/110850827

Published: 01 January 2013 Publication History

Abstract

This paper addresses the problem of minimizing a convex, Lipschitz function $f$ over a convex, compact set $\mathcal{X}$ under a stochastic bandit (i.e., noisy zeroth-order) feedback model. In this model, the algorithm is allowed to observe noisy realizations of the function value $f(x)$ at any query point $x \in \mathcal{X}$. The quantity of interest is the regret of the algorithm, which is the sum of the function values at algorithm's query points minus the optimal function value. We demonstrate a generalization of the ellipsoid algorithm that incurs $\widetilde{\mathcal{O}}({\rm poly}(d)\sqrt{T})$ regret. Since any algorithm has regret at least $\Omega(\sqrt{T})$ on this problem, our algorithm is optimal in terms of the scaling with $T$.

References

[1]

A. Agarwal, O. Dekel, and L. Xiao, Optimal algorithms for online convex optimization with multi-point bandit feedback, in Proceedings of COLT, 2010.

[2]

R. Agrawal, The continuum-armed bandit problem, SIAM J. Control Optim., 33 (1995), pp. 1926--1951.

[3]

P. Auer, R. Ortner, and C. Szepesvári, Improved rates for the stochastic continuum-armed bandit problem, in Proceedings of COLT, 2007, pp. 454--468.

[4]

D. Bertsimas and S. Vempala, Solving convex programs by random walks, J. ACM, 51 (2004), pp. 540--556.

[5]

S. Bubeck, R. Munos, G. Stolz, and C. Szepesvári, $\mathcal{X}$-armed bandits, J. Mach. Learn. Res., 12 (2011), pp. 1655--1695.

[6]

V. V. Buldygin and Yu. V. Kozachenko, Sub-Gaussian random variables, Ukrainian Math. J., 32 (1980), pp. 483--489.

[7]

A. R. Conn, K. Scheinberg, and L. N. Vicente, Introduction to Derivative-Free Optimization, SIAM, Philadelphia, 2009.

[8]

E. W. Cope, Regret and convergence bounds for a class of continuum-armed bandit problems, IEEE Trans. Automat. Control, 54 (2009), pp. 1243--1253.

[9]

V. Dani, T. P. Hayes, and S. M. Kakade, Stochastic linear optimization under bandit feedback, in Proceedings of the 21st Annual Conference on Learning Theory (COLT), 2008.

[10]

A. D. Flaxman, A. T. Kalai, and B. H. Mcmahan, Online convex optimization in the bandit setting: Gradient descent without a gradient, in Proceedings of the 16th Annual ACM-SIAM Symposium on Discrete Algorithms, 2005, pp. 385--394.

[11]

D. Goldfarb and M. J. Todd, Modifications and implementation of the ellipsoid algorithm for linear programming, Math. Program., 23 (1982), pp. 1--19.

[12]

J. Kiefer and J. Wolfowitz, Stochastic estimation of the maximum of a regression function, Ann. Math. Statist., 23 (1952), pp. 462--466.

[13]

R. Kleinberg, Nearly tight bounds for the continuum-armed bandit problem, Adv. Neural Inf. Process. Syst., 18 (2005).

[14]

R. Kleinberg, A. Slivkins, and E. Upfal, Multi-armed bandits in metric spaces, in Proceedings of the 40th Annual ACM Symposium on Theory of Computing, 2008, pp. 681--690.

[15]

L. Lovász, Geometric algorithms and algorithmic geometry, in Proceedings of International Congress of Mathematicians, 1990, pp. 139--154.

[16]

A. Nemirovski and D. Yudin, Problem Complexity and Method Efficiency in Optimization, Wiley, New York, 1983.

[17]

Y. Nesterov, Random Gradient-Free Minimization of Convex Functions, Technical report 2011/1, Center for Operations Research and Econometrics, Université catholique de Louvain, 2011.

[18]

M. Raginsky and A. Rakhlin, Information-based complexity, feedback and dynamics in convex programming, IEEE Trans. Inform. Theory, 57 (2011), pp. 7036--7056.

[19]

N. Srinivas, A. Krause, S.M. Kakade, and M. Seeger, Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design, arXiv:0912.3995, 2009.

[20]

J. Y. Yu and S. Mannor, Unimodal bandits, in Proceedings of ICML, 2011.

Cited By

Lin TZheng ZJordan MKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)Gradient-free methods for deterministic and stochastic nonsmooth nonconvex optimizationProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602167(26160-26175)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3602167
Fu XModiano E(2022)Joint Learning and Control in Stochastic Queueing Networks with Unknown UtilitiesProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/35706196:3(1-32)Online publication date: 8-Dec-2022
https://dl.acm.org/doi/10.1145/3570619
Fu XModiano E(2022)Elastic Job Scheduling with Unknown Utility FunctionsACM SIGMETRICS Performance Evaluation Review10.1145/3529113.352913749:3(67-68)Online publication date: 25-Mar-2022
https://dl.acm.org/doi/10.1145/3529113.3529137
Show More Cited By

Index Terms

Stochastic Convex Optimization with Bandit Feedback
1. Mathematics of computing
  1. Mathematical analysis
    1. Mathematical optimization
      1. Continuous optimization
        Convex optimization
2. Theory of computation
  1. Design and analysis of algorithms
    1. Mathematical optimization
      1. Continuous optimization
        Convex optimization

Index terms have been assigned to the content through auto-classification.

Recommendations

Numerical Analysis of $\mathcal{V}\mathcal{U}$-Decomposition, $\mathcal{U}$-Gradient, and $\mathcal{U}$-Hessian Approximations

Advances in bundle methods for nonsmooth optimization have lead to the development of $\mathcal{V}\mathcal{U}$-decompositions, the $\mathcal{U}$-gradient, and the $\mathcal{U}$-Hessian. These variational analysis constructs have proven extremely valuable ...
Manifold Sampling for Optimization of Nonconvex Functions That Are Piecewise Linear Compositions of Smooth Components

We develop a manifold sampling algorithm for the minimization of a nonsmooth composite function $f \triangleq \psi + h \circ F$ when $\psi$ is smooth with known derivatives, $h$ is a known, nonsmooth, piecewise linear function, and $F$ is smooth but ...
A derivative-free comirror algorithm for convex optimization

We consider the minimization of a nonsmooth convex function over a compact convex set subject to a nonsmooth convex constraint. We work in the setting of derivative-free optimization DFO, assuming that the objective and constraint functions are ...

Comments

Information & Contributors

Information

Published In

cover image SIAM Journal on Optimization

SIAM Journal on Optimization Volume 23, Issue 1

2013

686 pages

ISSN:1052-6234

DOI:10.1137/sjope8.23.1

Issue’s Table of Contents

© 2013, Society for Industrial and Applied Mathematics.

Publisher

Society for Industrial and Applied Mathematics

United States

Publication History

Published: 01 January 2013

Author Tags

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 10 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Lin TZheng ZJordan MKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)Gradient-free methods for deterministic and stochastic nonsmooth nonconvex optimizationProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602167(26160-26175)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3602167
Fu XModiano E(2022)Joint Learning and Control in Stochastic Queueing Networks with Unknown UtilitiesProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/35706196:3(1-32)Online publication date: 8-Dec-2022
https://dl.acm.org/doi/10.1145/3570619
Fu XModiano E(2022)Elastic Job Scheduling with Unknown Utility FunctionsACM SIGMETRICS Performance Evaluation Review10.1145/3529113.352913749:3(67-68)Online publication date: 25-Mar-2022
https://dl.acm.org/doi/10.1145/3529113.3529137
De AKhanna SLi HNikpey HRanzato MBeygelzimer ADauphin YLiang PVaughan J(2021)Approximate optimization of convex functions with outlier noiseProceedings of the 35th International Conference on Neural Information Processing Systems10.5555/3540261.3540884(8147-8157)Online publication date: 6-Dec-2021
https://dl.acm.org/doi/10.5555/3540261.3540884
Chen XWang YZhou Y(2021)Optimal Policy for Dynamic Assortment Planning Under Multinomial Logit ModelsMathematics of Operations Research10.1287/moor.2021.113346:4(1639-1657)Online publication date: 1-Nov-2021
https://dl.acm.org/doi/10.1287/moor.2021.1133
Wang YChen BSimchi-Levi D(2021)Multimodal Dynamic PricingManagement Science10.1287/mnsc.2020.381967:10(6136-6152)Online publication date: 1-Oct-2021
https://dl.acm.org/doi/10.1287/mnsc.2020.3819
Han BArndt CZhu FChin Ooi BMiao CWang HSkrypnyk IHsu WChawla S(2021)Budget Allocation as a Multi-Agent System of Contextual & Continuous BanditsProceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining10.1145/3447548.3467124(2937-2945)Online publication date: 14-Aug-2021
https://dl.acm.org/doi/10.1145/3447548.3467124
Roth ASlivkins AUllman JWu Z(2020)Multidimensional Dynamic Pricing for Welfare MaximizationACM Transactions on Economics and Computation10.1145/33815278:1(1-35)Online publication date: 17-Apr-2020
https://dl.acm.org/doi/10.1145/3381527
Combes RProutière AFauquette A(2020)Unimodal Bandits with Continuous ArmsProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/33794804:1(1-28)Online publication date: 5-Jun-2020
https://dl.acm.org/doi/10.1145/3379480
Chen XWang YWang Y(2019)Technical Note—Nonstationary Stochastic Optimization Under L-Variation MeasuresOperations Research10.1287/opre.2019.184367:6(1752-1765)Online publication date: 1-Nov-2019
https://dl.acm.org/doi/10.1287/opre.2019.1843

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents