Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/3702676.3702731guideproceedingsArticle/Chapter ViewAbstractPublication PagesuaiConference Proceedingsconference-collections
research-article

Bandits with Knapsacks and predictions

Published: 15 July 2024 Publication History

Abstract

We study the Bandits with Knapsacks problem with the aim of designing a learning-augmented online learning algorithm upholding better regret guarantees than the state-of-the-art primal-dual algorithms with worst-case guarantees, under both stochastic and adversarial inputs. In the adversarial case, we obtain better competitive ratios when the input predictions are accurate, while also maintaining worst-case guarantees for imprecise predictions. We introduce two algorithms tailored for the full and bandit feedback settings, respectively. Both algorithms integrate a static prediction with a worst-case no-α-regret algorithm. This yields an optimized competitive ratio of (π + (1 - π)/α)-1 in scenarios where the prediction is perfect, and a competitive ratio of α/(1 - π) in the case of highly imprecise predictions, where π ∈ (0, 1) is chosen by the learner and α is Slater's parameter. We complement this analysis by studying the stochastic setting under full feedback. We provide an algorithm which guarantees a pseudo-regret of Õ(√T) with poor predictions, and 0 pseudo-regret with perfect predictions.

References

[1]
Shipra Agrawal and Nikhil R Devanur. Bandits with global convex constraints and objective. Operations Research, 67(5):1486-1502, 2019.
[2]
Shipra Agrawal, Nikhil R Devanur, and Lihong Li. An efficient algorithm for contextual bandits with knapsacks, and an extension to concave objectives. In 29th Annual Conference on Learning Theory (COLT), 2016.
[3]
Antonios Antoniadis, Christian Coester, Marek Eliáš, Adam Polak, and Bertrand Simon. Online metric algorithms with untrusted predictions. ACM Trans. Algorithms, 19 (2), apr 2023. ISSN 1549-6325.
[4]
Moshe Babaioff, Shaddin Dughmi, Robert Kleinberg, and Aleksandrs Slivkins. Dynamic pricing with limited supply. In Proceedings of the 13th ACM Conference on Electronic Commerce, pages 74-91, 2012.
[5]
Ashwinkumar Badanidiyuru, Robert Kleinberg, and Yaron Singer. Learning on a budget: posted price mechanisms for online procurement. In Proceedings of the 13th ACM conference on electronic commerce, pages 128-145, 2012.
[6]
Ashwinkumar Badanidiyuru, Robert Kleinberg, and Aleksandrs Slivkins. Bandits with knapsacks. In 2013 IEEE 54th Annual Symposium on Foundations of Computer Science, pages 207-216, 2013.
[7]
Ashwinkumar Badanidiyuru, John Langford, and Aleksandrs Slivkins. Resourceful contextual bandits. In Conference on Learning Theory, pages 1109-1134. PMLR, 2014.
[8]
Ashwinkumar Badanidiyuru, Robert Kleinberg, and Aleksandrs Slivkins. Bandits with knapsacks. J. ACM, 65(3), 2018.
[9]
Santiago Balseiro, Haihao Lu, and Vahab Mirrokni. Dual mirror descent for online allocation problems. In International Conference on Machine Learning, pages 613-628. PMLR, 2020.
[10]
Santiago R Balseiro, Haihao Lu, and Vahab Mirrokni. The best of many worlds: Dual mirror descent for online allocation problems. Operations Research, 2022.
[11]
Étienne Bamas, Andreas Maggiori, and Ola Svensson. The primal-dual method for learning augmented algorithms, 2020.
[12]
Omar Besbes and Assaf Zeevi. Dynamic pricing without knowing the demand function: Risk bounds and near-optimal algorithms. Operations Research, 57(6):1407-1420, 2009.
[13]
Matteo Castiglioni, Andrea Celli, and Christian Kroer. Online learning with knapsacks: the best of both worlds. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato, editors, Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pages 2767-2783. PMLR, 17-23 Jul 2022a.
[14]
Matteo Castiglioni, Andrea Celli, and Christian Kroer. Online learning with knapsacks: the best of both worlds. In International Conference on Machine Learning, pages 2767-2783. PMLR, 2022b.
[15]
Rohan Deb, Aadirupa Saha, and Arindam Banerjee. Think before you duel: Understanding complexities of preference learning under constrained resources. In Sanjoy Dasgupta, Stephan Mandt, and Yingzhen Li, editors, Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, volume 238 of Proceedings of Machine Learning Research, pages 4546-4554. PMLR, 02-04 May 2024. URL https://proceedings.mlr.press/v238/deb24a.html.
[16]
Yuval Emek, Shay Kutten, and Yangguang Shi. Online paging with a vanishing regret. In ITCS, 2021.
[17]
Zhe Feng, Swati Padmanabhan, and Di Wang. Online bidding algorithms for return-on-spend constrained advertisers. In Proceedings of the ACM Web Conference 2023, page 3550-3560, 2023.
[18]
Anupam Gupta, Debmalya Panigrahi, Bernardo Subercaseaux, and Kevin Sun. Augmenting online algorithms with \varepsilon-accurate predictions. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems, volume 35, pages 2115-2127. Curran Associates, Inc., 2022. URL https://proceedings.neurips.cc/paper_files/paper/2022/file/0ea048312aa812b2711fe765a9e9ef05-Paper-Conference.pdf.
[19]
Nicole Immorlica, Karthik Abinav Sankararaman, Robert Schapire, and Aleksandrs Slivkins. Adversarial bandits with knapsacks. In 60th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2019, pages 202-219. IEEE Computer Society, 2019.
[20]
Nicole Immorlica, Karthik Sankararaman, Robert Schapire, and Aleksandrs Slivkins. Adversarial bandits with knapsacks. J. ACM, 69(6), 2022. ISSN 0004-5411.
[21]
Thomas Kesselheim and Sahil Singla. Online learning with vector costs and bandits with knapsacks. In Jacob Abernethy and Shivani Agarwal, editors, Proceedings of Thirty Third Conference on Learning Theory, volume 125 of Proceedings of Machine Learning Research, pages 2286-2305. PMLR, 09-12 Jul 2020.
[22]
Silvio Lattanzi, Thomas Lavastida, Benjamin Moseley, and Sergei Vassilvitskii. Online Scheduling via Learned Weights, pages 1859-1877. 12 2020. ISBN 978-1-61197-599-4.
[23]
Alexander Lindermayr and Nicole Megow. Algorithms with predictions. https://algorithms-with-predictions.github.io, 2023. URL https://algorithms-with-predictions.github.io. Online: accessed 2023-07-12.
[24]
Thodoris Lykouris and Sergei Vassilvtiskii. Competitive caching with machine learned advice. In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 3296-3305. PMLR, 10-15 Jul 2018.
[25]
Lixing Lyu and Wang Chi Cheung. Bandits with knapsacks: advice on time-varying demands. In International Conference on Machine Learning, pages 23212-23238. PMLR, 2023.
[26]
Michael Mitzenmacher. Scheduling with predictions and the price of misprediction, 2019.
[27]
Michael Mitzenmacher and Sergei Vassilvitskii. Algorithms with predictions, 2020.
[28]
Michael Mitzenmacher and Sergei Vassilvitskii. Algorithms with predictions. Communications of the ACM, 65(7): 33-35, 2022.
[29]
Dhruv Rohatgi. Near-optimal bounds for online caching with machine learned advice. In SODA, 2020.
[30]
Karthik Abinav Sankararaman and Aleksandrs Slivkins. Combinatorial semi-bandits with knapsacks. In International Conference on Artificial Intelligence and Statistics, pages 1760-1770. PMLR, 2018.
[31]
Vidyashankar Sivakumar, Shiliang Zuo, and Arindam Banerjee. Smoothed adversarial linear contextual bandits with knapsacks. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato, editors, Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pages 20253-20277. PMLR, 17-23 Jul 2022. URL https://proceedings.mlr.press/v162/sivakumar22a.html.
[32]
Aleksandrs Slivkins, Karthik Abinav Sankararaman, and Dylan J Foster. Contextual bandits with packing and covering constraints: A modular lagrangian approach via regression. In The Thirty Sixth Annual Conference on Learning Theory, pages 4633-4656. PMLR, 2023.
[33]
Aleksandrs Slivkins et al. Introduction to multi-armed bandits. Foundations and Trends® in Machine Learning, 12 (1-2):1-286, 2019.
[34]
Qian Wang, Zongjun Yang, Xiaotie Deng, and Yuqing Kong. Learning to bid in repeated first-price auctions with budgets. In Proceedings of the 40th International Conference on Machine Learning, ICML'23, 2023.
[35]
Alexander Wei. Better and simpler learning-augmented online caching. In APPROX/RANDOM, 2020.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
UAI '24: Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence
July 2024
4270 pages
  • Editors:
  • Negar Kiyavash,
  • Joris M. Mooij

Sponsors

  • HUAWEI
  • Google
  • DEShaw&Co
  • Barcelona School of Economics
  • Universitat Pompeu Fabra

Publisher

JMLR.org

Publication History

Published: 15 July 2024

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 24 Jan 2025

Other Metrics

Citations

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media