Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Unimodal Bandits with Continuous Arms: Order-optimal Regret without Smoothness

Published: 27 May 2020 Publication History

Abstract

We consider stochastic bandit problems with a continuous set of arms and where the expected reward is a continuous and unimodal function of the arm. For these problems, we propose the Stochastic Polychotomy (SP) algorithms, and derive finite-time upper bounds on its regret and optimization error. We show that, for a class of reward functions, the SP algorithm achieves a regret and an optimization error with optimal scalings, i.e., $O(\sqrtT )$ and $O(1/\sqrtT )$ (up to a logarithmic factor), respectively. SP constitutes the first order-optimal algorithm for non-smooth expected reward functions, as well as for smooth functions with unknown smoothness. The algorithm is based on sequential statistical tests used to successively trim an interval that contains the best arm with high probability. These tests exhibit a minimal sample complexity which confers to SP its adaptivity and optimality. Numerical experiments actually reveal that the algorithm even outperforms state-of-the-art algorithms that exploit the knowledge of the smoothness of the reward function. The performance of SP is further illustrated on the problem of setting optimal reserve prices in repeated second-price auctions; there, the algorithm is evaluated on real-world data.

References

[1]
A. Agarwal, D. Foster, D. Hsu, S. Kakade, and A. Rakhlin. 2013. Stochastic Convex Optimization with Bandit Feedback. SIAM Journal on Optimization, Vol. 23, 1 (2013), 213--240.
[2]
R. Agrawal. 1995. The Continuum-Armed Bandit Problem. SIAM Journal on Control and Optimization, Vol. 33, 6 (Nov. 1995), 1926--1951.
[3]
J. Audibert, S. Bubeck, and R. Munos. 2010. Best Arm Identification in Multi-Armed Bandits. In Proc. of COLT .
[4]
P. Auer, R. Ortner, and C. Szepesvári. 2007. Improved rates for the stochastic continuum-armed bandit problem. In Learning Theory . Springer, 454--468.
[5]
Baruch Awerbuch and Robert Kleinberg. 2008. Online Linear Optimization and Adaptive Routing. J. Comput. Syst. Sci., Vol. 74, 1 (Feb. 2008), 97--114.
[6]
Avrim Blum, Vijay Kumar, Atri Rudra, and Felix Wu. 2004. Online learning in online auctions. Theoretical Computer Science, Vol. 324, 2 (2004), 137 -- 146. Online Algorithms: In Memoriam, Steve Seiden.
[7]
Sé bastien Bubeck, Yin Tat Lee, and Ronen Eldan. 2017. Kernel-based methods for bandit convex optimization. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2017, Montreal, QC, Canada, June 19--23, 2017. 72--85. https://doi.org/10.1145/3055399.3055403
[8]
S. Bubeck, R. Munos, G. Stoltz, and C Szepesvári. 2008. Online optimization in X-armed bandits. In Proc. of NIPS .
[9]
S. Bubeck, G. Stoltz, and J. Yu. 2011. Lipschitz Bandits without the Lipschitz Constant. In Proc. of ALT .
[10]
Lin Chen, Mingrui Zhang, and Amin Karbasi. 2019. Projection-Free Bandit Convex Optimization. In Proceedings of Machine Learning Research (Proceedings of Machine Learning Research), Kamalika Chaudhuri and Masashi Sugiyama (Eds.), Vol. 89. PMLR, 2047--2056. http://proceedings.mlr.press/v89/chen19f.html
[11]
Richard Combes, Stefan Magureanu, and Alexandre Proutiere. 2017. Minimal Exploration in Structured Stochastic Bandits. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.). Curran Associates, Inc., 1763--1771. http://papers.nips.cc/paper/6773-minimal-exploration-in-structured-stochastic-bandits.pdf
[12]
Richard Combes, Stefan Magureanu, Alexandre Proutiere, and Cyrille Laroche. 2015. Learning to Rank: Regret Lower Bounds and Efficient Algorithms. SIGMETRICS Perform. Eval. Rev., Vol. 43, 1 (June 2015), 231--244.
[13]
R. Combes, J. Ok, A. Proutiere, D. Yun, and Y. Yi. 2018. Optimal Rate Sampling in 802.11 systems: Theory, Design, and Implementation. IEEE Transactions on Mobile Computing (2018).
[14]
R. Combes and A. Proutiere. 2014a. Unimodal Bandits: Regret lower bounds and Optimal Algorithms. In Proc. of ICML .
[15]
R. Combes and A. Proutiere. 2014b. Unimodal Bandits: Regret lower bounds and Optimal Algorithms. Technical Report, http://arxiv.org/abs/1405.5096.
[16]
R. Combes and A. Proutiere. 2015. Dynamic Rate and Channel Selection in Cognitive Radio Systems. IEEE Journal on Selected Areas in Communications, Vol. 33, 5 (May 2015), 910--921.
[17]
E. W. Cope. 2009. Regret and Convergence Bounds for a Class of Continuum-Armed Bandit Problems. IEEE Trans. Automat. Contr., Vol. 54, 6 (2009), 1243--1253.
[18]
V. Dani, T. Hayes, and S Kakade. 2008. Stochastic Linear Optimization under Bandit Feedback. In Proc. of COLT .
[19]
E. Even-Dar, S. Mannor, and Y. Mansour. 2006. Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems. Journal of Machine Learning Research, Vol. 7 (2006), 1079--1105.
[20]
A. Garivier and O. Cappé. 2011. The KL-UCB algorithm for bounded stochastic bandits and beyond. In Proc. of COLT .
[21]
Jean-Bastien Grill, Michal Valko, and Ré mi Munos. 2015. Black-box optimization of noisy functions with unknown smoothness. In Neural Information Processing Systems .
[22]
K. Jamieson, M. Malloy, R. Nowak, and S. Bubeck. 2014. lil' UCB : An Optimal Exploration Algorithm for Multi-Armed Bandits. Proc. of COLT (2014).
[23]
K. Jamieson, R. Nowak, and B. Recht. 2012. Query Complexity of Derivative-Free Optimization. In Proc. of NIPS .
[24]
Kwang-Sung Jun, Aniruddha Bhargava, Robert Nowak, and Rebecca Willett. 2017. Scalable Generalized Linear Bandits: Online Computation and Hashing. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS'17). Curran Associates Inc., USA, 98--108. http://dl.acm.org/citation.cfm?id=3294771.3294781
[25]
S. Kalyanakrishnan, A. Tewari, P. Auer, and P. Stone. 2012. PAC Subset Selection in Stochastic Multi-armed Bandits. In Proc. of ICML .
[26]
J. Kiefer. 1953. Sequential minimax search for a maximum. Proc. Amer. Math. Soc., Vol. 4, 3 (1953), 502--506.
[27]
R. Kleinberg. 2004. Nearly Tight Bounds for the Continuum-Armed Bandit Problem. In Proc. of NIPS .
[28]
R. Kleinberg, A. Slivkins, and E. Upfal. 2008. Multi-armed bandits in metric spaces. In Proc. of ACM STOC. 681--690.
[29]
T.L. Lai and H. Robbins. 1985. Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, Vol. 6, 1 (1985), 4--2.
[30]
Andrea Locatelli and Alexandra Carpentier. 2018. Adaptivity to Smoothness in X-armed bandits. In Proceedings of the 31st Conference On Learning Theory, Vol. 75. 1463--1492.
[31]
Shiyin Lu, Guanghui Wang, Yao Hu, and Lijun Zhang. 2019. Optimal Algorithms for Lipschitz Bandits with Heavy-tailed Rewards. In Proceedings of the 36th International Conference on Machine Learning (Proceedings of Machine Learning Research), Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.), Vol. 97. PMLR, Long Beach, California, USA, 4154--4163. http://proceedings.mlr.press/v97/lu19c.html
[32]
S. Magureanu, R. Combes, and A. Proutiere. 2014. Lipschitz Bandits: Regret Lower Bounds and Optimal Algorithms. In Proc. of COLT .
[33]
Stefan Magureanu, Alexandre Proutiere, Marcus Isaksson, and Boxun Zhang. 2017. Online Learning of Optimally Diverse Rankings. Proc. ACM Meas. Anal. Comput. Syst., Vol. 1, 2 (Dec. 2017), 32:1--32:26.
[34]
S. Mannor and J. Tsitsiklis. 2004. The Sample Complexity of Exploration in the Multi-Armed Bandit Problem. Journal of Machine Learning Research, Vol. 5 (Dec. 2004), 623--648.
[35]
Roger B Myerson. 1981. Optimal auction design. Mathematics of operations research, Vol. 6, 1 (1981), 58--73.
[36]
Michael Rothschild. 1974. A two-armed bandit theory of market pricing. Journal of Economic Theory, Vol. 9, 2 (1974), 185--202.
[37]
O. Shamir. 2013. On the Complexity of Bandit and Derivative-Free Stochastic Convex Optimization. In Proc. of COLT .
[38]
Ohad Shamir. 2017. An Optimal Algorithm for Bandit and Zero-order Convex Optimization with Two-point Feedback. J. Mach. Learn. Res., Vol. 18, 1 (Jan. 2017), 1703--1713. http://dl.acm.org/citation.cfm?id=3122009.3153008
[39]
Aleksandrs Slivkins. 2014. Contextual Bandits with Similarity Information. J. Mach. Learn. Res., Vol. 15, 1 (Jan. 2014), 2533--2568. http://dl.acm.org/citation.cfm?id=2627435.2670330
[40]
James C. Spall. 2003. Introduction to Stochastic Search and Optimization .John Wiley & Sons, Inc.
[41]
William R Thompson. 1933. On the Likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, Vol. 25, 3--4 (12 1933), 285--294.
[42]
Jonathan Weed, Vianney Perchet, and Philippe Rigollet. 2016. Online learning in repeated auctions. In 29th Annual Conference on Learning Theory, Vitaly Feldman, Alexander Rakhlin, and Ohad Shamir (Eds.), Vol. 49. 1562--1583.
[43]
J. Yu and S. Mannor. 2011. Unimodal Bandits. In Proc. of ICML .
[44]
Weinan Zhang, Shuai Yuan, Jun Wang, and Xuehua Shen. 2014. Real-Time Bidding Benchmarking with iPinYou Dataset. arXiv preprint arXiv:1407.7073 (2014).

Cited By

View all
  • (2024)Adaptive Seamless Dose-Finding TrialsManufacturing & Service Operations Management10.1287/msom.2023.024626:5(1656-1673)Online publication date: 1-Sep-2024
  • (2024)Efficient Queue Control Policies for Latency-Critical Traffic in Mobile NetworksIEEE Transactions on Network and Service Management10.1109/TNSM.2024.345839021:5(5076-5090)Online publication date: 1-Oct-2024
  • (2023)Blackbox optimization of unimodal functionsProceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence10.5555/3625834.3625879(476-484)Online publication date: 31-Jul-2023
  • Show More Cited By

Index Terms

  1. Unimodal Bandits with Continuous Arms: Order-optimal Regret without Smoothness

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the ACM on Measurement and Analysis of Computing Systems
    Proceedings of the ACM on Measurement and Analysis of Computing Systems  Volume 4, Issue 1
    SIGMETRICS
    March 2020
    467 pages
    EISSN:2476-1249
    DOI:10.1145/3402934
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 May 2020
    Online AM: 07 May 2020
    Published in POMACS Volume 4, Issue 1

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. continuous bandits
    2. golden section search
    3. multi-armed bandits
    4. stochastic optimization
    5. unimodal bandits

    Qualifiers

    • Research-article

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)36
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 09 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Adaptive Seamless Dose-Finding TrialsManufacturing & Service Operations Management10.1287/msom.2023.024626:5(1656-1673)Online publication date: 1-Sep-2024
    • (2024)Efficient Queue Control Policies for Latency-Critical Traffic in Mobile NetworksIEEE Transactions on Network and Service Management10.1109/TNSM.2024.345839021:5(5076-5090)Online publication date: 1-Oct-2024
    • (2023)Blackbox optimization of unimodal functionsProceedings of the Thirty-Ninth Conference on Uncertainty in Artificial Intelligence10.5555/3625834.3625879(476-484)Online publication date: 31-Jul-2023
    • (2021)Bandit learning with delayed impact of actionsProceedings of the 35th International Conference on Neural Information Processing Systems10.5555/3540261.3542314(26804-26817)Online publication date: 6-Dec-2021
    • (2021)Optimal order simple regret for Gaussian process banditsProceedings of the 35th International Conference on Neural Information Processing Systems10.5555/3540261.3541883(21202-21215)Online publication date: 6-Dec-2021
    • (2021)Multi-armed bandit requiring monotone arm sequencesProceedings of the 35th International Conference on Neural Information Processing Systems10.5555/3540261.3541492(16093-16103)Online publication date: 6-Dec-2021
    • (2021)Dynamic Early Exit Scheduling for Deep Neural Network Inference through Contextual BanditsProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482335(823-832)Online publication date: 26-Oct-2021
    • (2021)Federated BanditProceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/34473805:1(1-29)Online publication date: 22-Feb-2021

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media