research-article

Unimodal Bandits with Continuous Arms: Order-optimal Regret without Smoothness

Authors:

Richard Combes,

Alexandre Proutière,

Alexandre FauquetteAuthors Info & Claims

Proceedings of the ACM on Measurement and Analysis of Computing Systems, Volume 4, Issue 1

Article No.: 14, Pages 1 - 28

https://doi.org/10.1145/3379480

Published: 27 May 2020 Publication History

Abstract

We consider stochastic bandit problems with a continuous set of arms and where the expected reward is a continuous and unimodal function of the arm. For these problems, we propose the Stochastic Polychotomy (SP) algorithms, and derive finite-time upper bounds on its regret and optimization error. We show that, for a class of reward functions, the SP algorithm achieves a regret and an optimization error with optimal scalings, i.e., $O(\sqrtT )$ and $O(1/\sqrtT )$ (up to a logarithmic factor), respectively. SP constitutes the first order-optimal algorithm for non-smooth expected reward functions, as well as for smooth functions with unknown smoothness. The algorithm is based on sequential statistical tests used to successively trim an interval that contains the best arm with high probability. These tests exhibit a minimal sample complexity which confers to SP its adaptivity and optimality. Numerical experiments actually reveal that the algorithm even outperforms state-of-the-art algorithms that exploit the knowledge of the smoothness of the reward function. The performance of SP is further illustrated on the problem of setting optimal reserve prices in repeated second-price auctions; there, the algorithm is evaluated on real-world data.

References

[1]

A. Agarwal, D. Foster, D. Hsu, S. Kakade, and A. Rakhlin. 2013. Stochastic Convex Optimization with Bandit Feedback. SIAM Journal on Optimization, Vol. 23, 1 (2013), 213--240.

Digital Library

[2]

R. Agrawal. 1995. The Continuum-Armed Bandit Problem. SIAM Journal on Control and Optimization, Vol. 33, 6 (Nov. 1995), 1926--1951.

Digital Library

[3]

J. Audibert, S. Bubeck, and R. Munos. 2010. Best Arm Identification in Multi-Armed Bandits. In Proc. of COLT .

[4]

P. Auer, R. Ortner, and C. Szepesvári. 2007. Improved rates for the stochastic continuum-armed bandit problem. In Learning Theory . Springer, 454--468.

[5]

Baruch Awerbuch and Robert Kleinberg. 2008. Online Linear Optimization and Adaptive Routing. J. Comput. Syst. Sci., Vol. 74, 1 (Feb. 2008), 97--114.

Digital Library

[6]

Avrim Blum, Vijay Kumar, Atri Rudra, and Felix Wu. 2004. Online learning in online auctions. Theoretical Computer Science, Vol. 324, 2 (2004), 137 -- 146. Online Algorithms: In Memoriam, Steve Seiden.

Digital Library

[7]

Sé bastien Bubeck, Yin Tat Lee, and Ronen Eldan. 2017. Kernel-based methods for bandit convex optimization. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2017, Montreal, QC, Canada, June 19--23, 2017. 72--85. https://doi.org/10.1145/3055399.3055403

Digital Library

[8]

S. Bubeck, R. Munos, G. Stoltz, and C Szepesvári. 2008. Online optimization in X-armed bandits. In Proc. of NIPS .

[9]

S. Bubeck, G. Stoltz, and J. Yu. 2011. Lipschitz Bandits without the Lipschitz Constant. In Proc. of ALT .

[10]

Lin Chen, Mingrui Zhang, and Amin Karbasi. 2019. Projection-Free Bandit Convex Optimization. In Proceedings of Machine Learning Research (Proceedings of Machine Learning Research), Kamalika Chaudhuri and Masashi Sugiyama (Eds.), Vol. 89. PMLR, 2047--2056. http://proceedings.mlr.press/v89/chen19f.html

[11]

Richard Combes, Stefan Magureanu, and Alexandre Proutiere. 2017. Minimal Exploration in Structured Stochastic Bandits. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc., 1763--1771. http://papers.nips.cc/paper/6773-minimal-exploration-in-structured-stochastic-bandits.pdf

[12]

Richard Combes, Stefan Magureanu, Alexandre Proutiere, and Cyrille Laroche. 2015. Learning to Rank: Regret Lower Bounds and Efficient Algorithms. SIGMETRICS Perform. Eval. Rev., Vol. 43, 1 (June 2015), 231--244.

Digital Library

[13]

R. Combes, J. Ok, A. Proutiere, D. Yun, and Y. Yi. 2018. Optimal Rate Sampling in 802.11 systems: Theory, Design, and Implementation. IEEE Transactions on Mobile Computing (2018).

[14]

R. Combes and A. Proutiere. 2014a. Unimodal Bandits: Regret lower bounds and Optimal Algorithms. In Proc. of ICML .

[15]

R. Combes and A. Proutiere. 2014b. Unimodal Bandits: Regret lower bounds and Optimal Algorithms. Technical Report, http://arxiv.org/abs/1405.5096.

[16]

R. Combes and A. Proutiere. 2015. Dynamic Rate and Channel Selection in Cognitive Radio Systems. IEEE Journal on Selected Areas in Communications, Vol. 33, 5 (May 2015), 910--921.

Digital Library

[17]

E. W. Cope. 2009. Regret and Convergence Bounds for a Class of Continuum-Armed Bandit Problems. IEEE Trans. Automat. Contr., Vol. 54, 6 (2009), 1243--1253.

[18]

V. Dani, T. Hayes, and S Kakade. 2008. Stochastic Linear Optimization under Bandit Feedback. In Proc. of COLT .

[19]

E. Even-Dar, S. Mannor, and Y. Mansour. 2006. Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems. Journal of Machine Learning Research, Vol. 7 (2006), 1079--1105.

Digital Library

[20]

A. Garivier and O. Cappé. 2011. The KL-UCB algorithm for bounded stochastic bandits and beyond. In Proc. of COLT .

[21]

Jean-Bastien Grill, Michal Valko, and Ré mi Munos. 2015. Black-box optimization of noisy functions with unknown smoothness. In Neural Information Processing Systems .

[22]

K. Jamieson, M. Malloy, R. Nowak, and S. Bubeck. 2014. lil' UCB : An Optimal Exploration Algorithm for Multi-Armed Bandits. Proc. of COLT (2014).

[23]

K. Jamieson, R. Nowak, and B. Recht. 2012. Query Complexity of Derivative-Free Optimization. In Proc. of NIPS .

[24]

Kwang-Sung Jun, Aniruddha Bhargava, Robert Nowak, and Rebecca Willett. 2017. Scalable Generalized Linear Bandits: Online Computation and Hashing. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS'17). Curran Associates Inc., USA, 98--108. http://dl.acm.org/citation.cfm?id=3294771.3294781

[25]

S. Kalyanakrishnan, A. Tewari, P. Auer, and P. Stone. 2012. PAC Subset Selection in Stochastic Multi-armed Bandits. In Proc. of ICML .

[26]

J. Kiefer. 1953. Sequential minimax search for a maximum. Proc. Amer. Math. Soc., Vol. 4, 3 (1953), 502--506.

[27]

R. Kleinberg. 2004. Nearly Tight Bounds for the Continuum-Armed Bandit Problem. In Proc. of NIPS .

[28]

R. Kleinberg, A. Slivkins, and E. Upfal. 2008. Multi-armed bandits in metric spaces. In Proc. of ACM STOC. 681--690.

[29]

T.L. Lai and H. Robbins. 1985. Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, Vol. 6, 1 (1985), 4--2.

Digital Library

[30]

Andrea Locatelli and Alexandra Carpentier. 2018. Adaptivity to Smoothness in X-armed bandits. In Proceedings of the 31st Conference On Learning Theory, Vol. 75. 1463--1492.

[31]

Shiyin Lu, Guanghui Wang, Yao Hu, and Lijun Zhang. 2019. Optimal Algorithms for Lipschitz Bandits with Heavy-tailed Rewards. In Proceedings of the 36th International Conference on Machine Learning (Proceedings of Machine Learning Research), Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.), Vol. 97. PMLR, Long Beach, California, USA, 4154--4163. http://proceedings.mlr.press/v97/lu19c.html

[32]

S. Magureanu, R. Combes, and A. Proutiere. 2014. Lipschitz Bandits: Regret Lower Bounds and Optimal Algorithms. In Proc. of COLT .

[33]

Stefan Magureanu, Alexandre Proutiere, Marcus Isaksson, and Boxun Zhang. 2017. Online Learning of Optimally Diverse Rankings. Proc. ACM Meas. Anal. Comput. Syst., Vol. 1, 2 (Dec. 2017), 32:1--32:26.

Digital Library

[34]

S. Mannor and J. Tsitsiklis. 2004. The Sample Complexity of Exploration in the Multi-Armed Bandit Problem. Journal of Machine Learning Research, Vol. 5 (Dec. 2004), 623--648.

[35]

Roger B Myerson. 1981. Optimal auction design. Mathematics of operations research, Vol. 6, 1 (1981), 58--73.

[36]

Michael Rothschild. 1974. A two-armed bandit theory of market pricing. Journal of Economic Theory, Vol. 9, 2 (1974), 185--202.

[37]

O. Shamir. 2013. On the Complexity of Bandit and Derivative-Free Stochastic Convex Optimization. In Proc. of COLT .

[38]

Ohad Shamir. 2017. An Optimal Algorithm for Bandit and Zero-order Convex Optimization with Two-point Feedback. J. Mach. Learn. Res., Vol. 18, 1 (Jan. 2017), 1703--1713. http://dl.acm.org/citation.cfm?id=3122009.3153008

[39]

Aleksandrs Slivkins. 2014. Contextual Bandits with Similarity Information. J. Mach. Learn. Res., Vol. 15, 1 (Jan. 2014), 2533--2568. http://dl.acm.org/citation.cfm?id=2627435.2670330

Digital Library

[40]

James C. Spall. 2003. Introduction to Stochastic Search and Optimization .John Wiley & Sons, Inc.

Digital Library

[41]

William R Thompson. 1933. On the Likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, Vol. 25, 3--4 (12 1933), 285--294.

[42]

Jonathan Weed, Vianney Perchet, and Philippe Rigollet. 2016. Online learning in repeated auctions. In 29th Annual Conference on Learning Theory, Vitaly Feldman, Alexander Rakhlin, and Ohad Shamir (Eds.), Vol. 49. 1562--1583.

[43]

J. Yu and S. Mannor. 2011. Unimodal Bandits. In Proc. of ICML .

[44]

Weinan Zhang, Shuai Yuan, Jun Wang, and Xuehua Shen. 2014. Real-Time Bidding Benchmarking with iPinYou Dataset. arXiv preprint arXiv:1407.7073 (2014).

Cited By

Chen NKhademi A(2024)Adaptive Seamless Dose-Finding TrialsManufacturing & Service Operations Management10.1287/msom.2023.024626:5(1656-1673)Online publication date: 1-Sep-2024
https://dl.acm.org/doi/10.1287/msom.2023.0246
Abdullah MEddine Elayoubi SChahed T(2024)Efficient Queue Control Policies for Latency-Critical Traffic in Mobile NetworksIEEE Transactions on Network and Service Management10.1109/TNSM.2024.345839021:5(5076-5090)Online publication date: 1-Oct-2024
https://dl.acm.org/doi/10.1109/TNSM.2024.3458390
Cutkosky ADas AKong WLee CSen REvans RShpitser I(2023)Blackbox optimization of unimodal functionsProceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence10.5555/3625834.3625879(476-484)Online publication date: 31-Jul-2023
https://dl.acm.org/doi/10.5555/3625834.3625879
Show More Cited By

Index Terms

Unimodal Bandits with Continuous Arms: Order-optimal Regret without Smoothness
1. Mathematics of computing
  1. Mathematical analysis
    1. Mathematical optimization
      1. Continuous optimization
        Stochastic control and optimization

Recommendations

Unimodal Bandits with Continuous Arms: Order-optimal Regret without Smoothness
SIGMETRICS '20: Abstracts of the 2020 SIGMETRICS/Performance Joint International Conference on Measurement and Modeling of Computer Systems

We consider stochastic bandit problems with a continuous set of arms and where the expected reward is a continuous and unimodal function of the arm. For these problems, we propose the Stochastic Polychotomy (SP) algorithms, and derive finite-time upper ...
Unimodal Bandits with Continuous Arms: Order-optimal Regret without Smoothness

We consider stochastic bandit problems with a continuous set of arms and where the expected reward is a continuous and unimodal function of the arm. For these problems, we propose the Stochastic Polychotomy (SP) algorithms, and derive finite-time upper ...
Multi-armed bandits with dependent arms
Abstract
We study a variant of the multi-armed bandit problem (MABP) which we call as MABs with dependent arms. Multiple arms are grouped together to form a cluster, and the reward distributions of arms in the same cluster are known functions of an unknown ...

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Measurement and Analysis of Computing Systems

Proceedings of the ACM on Measurement and Analysis of Computing Systems Volume 4, Issue 1

SIGMETRICS

March 2020

467 pages

EISSN:2476-1249

DOI:10.1145/3402934

Editors:
Augustin Chaintreau
Columbia University
,
Leana Golubchik
University of Southern California
,
Zhi-Li Zhang
University of Minnesota

Issue’s Table of Contents

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 May 2020

Online AM: 07 May 2020

Published in POMACS Volume 4, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Stiftelsen för Strategisk Forskning
European Research Council
Vetenskapsrådet

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

8
Total Citations
View Citations
375
Total Downloads

Downloads (Last 12 months)36
Downloads (Last 6 weeks)3

Reflects downloads up to 09 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Chen NKhademi A(2024)Adaptive Seamless Dose-Finding TrialsManufacturing & Service Operations Management10.1287/msom.2023.024626:5(1656-1673)Online publication date: 1-Sep-2024
https://dl.acm.org/doi/10.1287/msom.2023.0246
Abdullah MEddine Elayoubi SChahed T(2024)Efficient Queue Control Policies for Latency-Critical Traffic in Mobile NetworksIEEE Transactions on Network and Service Management10.1109/TNSM.2024.345839021:5(5076-5090)Online publication date: 1-Oct-2024
https://dl.acm.org/doi/10.1109/TNSM.2024.3458390
Cutkosky ADas AKong WLee CSen REvans RShpitser I(2023)Blackbox optimization of unimodal functionsProceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence10.5555/3625834.3625879(476-484)Online publication date: 31-Jul-2023
https://dl.acm.org/doi/10.5555/3625834.3625879
Tang WHo CLiu YRanzato MBeygelzimer ADauphin YLiang PVaughan J(2021)Bandit learning with delayed impact of actionsProceedings of the 35th International Conference on Neural Information Processing Systems10.5555/3540261.3542314(26804-26817)Online publication date: 6-Dec-2021
https://dl.acm.org/doi/10.5555/3540261.3542314
Vakili SBouziani NJalali SBernacchia AShiu DRanzato MBeygelzimer ADauphin YLiang PVaughan J(2021)Optimal order simple regret for Gaussian process banditsProceedings of the 35th International Conference on Neural Information Processing Systems10.5555/3540261.3541883(21202-21215)Online publication date: 6-Dec-2021
https://dl.acm.org/doi/10.5555/3540261.3541883
Chen NRanzato MBeygelzimer ADauphin YLiang PVaughan J(2021)Multi-armed bandit requiring monotone arm sequencesProceedings of the 35th International Conference on Neural Information Processing Systems10.5555/3540261.3541492(16093-16103)Online publication date: 6-Dec-2021
https://dl.acm.org/doi/10.5555/3540261.3541492
Ju WBao WGe LYuan DDemartini GZuccon GCulpepper JHuang ZTong H(2021)Dynamic Early Exit Scheduling for Deep Neural Network Inference through Contextual BanditsProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482335(823-832)Online publication date: 26-Oct-2021
https://dl.acm.org/doi/10.1145/3459637.3482335
Zhu ZZhu JLiu JLiu Y(2021)Federated BanditProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/34473805:1(1-29)Online publication date: 22-Feb-2021
https://dl.acm.org/doi/10.1145/3447380

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents