research-article

Multiobjective lipschitz bandits under lexicographic ordering

AUTHORs:

Qingfu ZhangAuthors Info & Claims

AAAI'24/IAAI'24/EAAI'24: Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence

Article No.: 1810, Pages 16238 - 16246

https://doi.org/10.1609/aaai.v38i15.29558

Published: 20 February 2024 Publication History

Abstract

This paper studies the multiobjective bandit problem under lexicographic ordering, wherein the learner aims to simultaneously maximize m objectives hierarchically. The only existing algorithm for this problem considers the multi-armed bandit model, and its regret bound is Õ((KT)^2/3) under a metric called priority-based regret. However, this bound is suboptimal, as the lower bound for single objective multi-armed bandits is Ω(K log T). Moreover, this bound becomes vacuous when the arm number K is infinite. To address these limitations, we investigate the multiobjective Lipschitz bandit model, which allows for an infinite arm set. Utilizing a newly designed multi-stage decision-making strategy, we develop an improved algorithm that achieves a general regret bound of $\widetilde{O}(T^{(d_z^i+1)/(d_z^i+2)})$ for the i-th objective, where dⁱ_z is the zooming dimension for the i-th objective, with i ∈ {1, 2,..., m}. This bound matches the lower bound of the single objective Lipschitz bandit problem in terms of T, indicating that our algorithm is almost optimal. Numerical experiments confirm the effectiveness of our algorithm.

References

[1]

Abbasi-yadkori, Y.; Pál, D.; and Szepesvári, C. 2011. Improved Algorithms for Linear Stochastic Bandits. In Advances in Neural Information Processing Systems 24, 2312-2320.

[2]

Agrawal, R. 1995. The Continuum-Armed Bandit Problem. SIAM Journal on Control and Optimization, 33(6): 1926-1951.

Digital Library

[3]

Auer, P. 2002. Using Confidence Bounds for Exploitation-Exploration Trade-offs. Journal of Machine Learning Research, 3(11): 397-422.

[4]

Auer, P.; Ortner, R.; and Szepesvári, C. 2007. Improved Rates for the Stochastic Continuum-Armed Bandit Problem. In Proceedings of the 20th Annual Conference on Learning Theory, 454-468.

[5]

Bubeck, S.; Dekel, O.; Koren, T.; and Peres, Y. 2015. Bandit Convex Optimization: √T Regret in One Dimension. In Proceedings of The 28th Conference on Learning Theory, 266-278.

[6]

Bubeck, S.; Munos, R.; Stoltz, G.; and Szepesvári, C. 2011. X-Armed Bandits. Journal of Machine Learning Research, 12(46): 1655-1695.

Digital Library

[7]

Bubeck, S.; Stoltz, G.; Szepesvári, C.; and Munos, R. 2008. Online Optimization in X-Armed Bandits. In Advances in Neural Information Processing Systems 21, 201-208.

[8]

Bubeck, S.; Stoltz, G.; and Yu, J. Y. 2011. Lipschitz Bandits Without the Lipschitz Constant. In In Proceedings of the 22nd International Conference on Algorithmic Learning Theory, 144-158.

[9]

Chapelle, O.; and Li, L. 2011. An Empirical Evaluation of Thompson Sampling. In Advances in Neural Information Processing Systems 24, 2249-2257.

[10]

Drugan, M. M.; and Nowe, A. 2013. Designing multi-objective multi-armed bandits algorithms: A study. In The 2013 International Joint Conference on Neural Networks, 1-8.

[11]

Ehrgott, M. 2005. Multicriteria Optimization. Berlin, Heidelberg: Springer-Verlag.

[12]

Feng, Y.; Huang, Z.; and Wang, T. 2022. Lipschitz Bandits with Batched Feedback. In Advances in Neural Information Processing Systems 35, 19836-19848.

[13]

Gou, Y.; Yi, J.; and Zhang, L. 2023. Stochastic Graphical Bandits with Heavy-Tailed Rewards. In Proceedings of the 39th Conference on Uncertainty in Artificial Intelligence, 734-744.

[14]

Hosseini, H.; Sikdar, S.; Vaish, R.; and Xia, L. 2021. Fair and Efficient Allocations under Lexicographic Preferences. Proceedings of the 35th AAAI Conference on Artificial Intelligence, 5472-5480.

[15]

Hüyük, A.; and Tekin, C. 2021. Multi-objective multi-armed bandit with lexicographically ordered and satisficing objectives. Machine Learning, 110(6): 1233-1266.

Digital Library

[16]

Jee, K.-W.; McShan, D. L.; and Fraass, B. A. 2007. Lexicographic ordering: intuitive multicriteria optimization for IMRT. Physics in Medicine & Biology, 52: 1845-1861.

[17]

Jun, K.-S.; Bhargava, A.; Nowak, R.; and Willett, R. 2017. Scalable Generalized Linear Bandits: Online Computation and Hashing. In Advances in Neural Information Processing Systems 30, 99-109.

[18]

Kleinberg, R. 2004. Nearly Tight Bounds for the Continuum-armed Bandit Problem. Advances in Neural Information Processing Systems 17, 697-704.

[19]

Kleinberg, R.; Slivkins, A.; and Upfal, E. 2008. Multi-armed Bandits in Metric Spaces. In Proceedings of the 40th Annual ACM Symposium on Theory of Computing, 681-690.

Digital Library

[20]

Lai, T. L.; and Robbins, H. 1985. Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 6(1): 4-22.

Digital Library

[21]

Lattimore, T.; and Szepesvari, C. 2020. Bandit Algorithms. Cambridge University Press.

[22]

Li, L.; Chu, W.; Langford, J.; Moon, T.; and Wang, X. 2012. An unbiased offline evaluation of contextual bandit algorithms with generalized linear models. In Proceedings of the Workshop on On-line Trading of Exploration and Exploitation 2, 19-36.

[23]

Li, L.; Chu, W.; Langford, J.; and Schapire, R. E. 2010. A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th international conference on World wide web, 661-670.

[24]

Lu, S.; Wang, G.; Hu, Y.; and Zhang, L. 2019a. Multi-Objective Generalized Linear Bandits. In Proceedings of the 28th International Joint Conference on Artificial Intelligence, 3080-3086.

[25]

Lu, S.; Wang, G.; Hu, Y.; and Zhang, L. 2019b. Optimal Algorithms for Lipschitz Bandits with Heavy-tailed Rewards. In Proceedings of the 36th International Conference on Machine Learning, 4154-4163.

[26]

Luo, H.; Wei, C.-Y.; Agarwal, A.; and Langford, J. 2018. Efficient Contextual Bandits in Non-stationary Worlds. In Proceedings of the 31st Conference On Learning Theory, 1739-1776.

[27]

Ma, X.; Zhao, L.; Huang, G.; Wang, Z.; Hu, Z.; Zhu, X.; and Gai, K. 2018. Entire Space Multi-Task Model: An Effective Approach for Estimating Post-Click Conversion Rate. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, 1137-1140.

[28]

Magureanu, S.; Combes, R.; and Proutiere, A. 2014. Lipschitz Bandits: Regret Lower Bound and Optimal Algorithms. In Proceedings of The 27th Conference on Learning Theory, 975-999.

[29]

Podimata, C.; and Slivkins, A. 2021. Adaptive Discretization for Adversarial Lipschitz Bandits. In Proceedings of the 34st Conference On Learning Theory, 3788-3805.

[30]

Q. Yahyaa, S.; M. Drugan, M.; and Manderick, B. 2014. Knowledge Gradient for Multi-Objective Multi-Armed Bandit Algorithms. In Proceedings of the 6th International Conference on Agents and Artificial Intelligence 1, 74-83.

[31]

Qin, Y.; Li, Y.; Pasqualetti, F.; Fazel, M.; and Oymak, S. 2023. Stochastic Contextual Bandits with Long Horizon Rewards. Proceedings of the 37th AAAI Conference on Artificial Intelligence, 9525-9533.

[32]

Robbins, H. 1952. Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc., 58(5): 527-535.

[33]

Shao, H.; Yu, X.; King, I.; and Lyu, M. R. 2018. Almost Optimal Algorithms for Linear Stochastic Bandits with Heavy-tailed Payoffs. In Advances in Neural Information Processing Systems 31, 8430-8439.

[34]

Skalse, J.; Hammond, L.; Griffin, C.; and Abate, A. 2022. Lexicographic Multi-Objective Reinforcement Learning. In Proceedings of the 31st International Joint Conference on Artificial Intelligence, 3430-3436.

[35]

Turgay, E.; Oner, D.; and Tekin, C. 2018. Multi-objective contextual bandit problem with similarity information. In Proceedings of the 21st International Conference on Artificial Intelligence and Statistics, 1673-1681.

[36]

Van Moffaert, K.; Van Vaerenbergh, K.; Vrancx, P.; and Nowe, A. 2014. Multi-objective Χ-Armed bandits. In 2014 International Joint Conference on Neural Networks, 2331-2338.

[37]

Villar, S. S.; Bowden, J.; and Wason, J. 2015. Multi-armed Bandit Models for the Optimal Design of Clinical Trials: Benefits and Challenges. Statistical Science, 30(2): 199-215.

[38]

Wang, T.; Ye, W.; Geng, D.; and Rudin, C. 2020. Towards Practical Lipschitz Bandits. In Proceedings of the 2020 ACM-IMS on Foundations of Data Science Conference, 129-138.

[39]

Wanigasekara, N.; and Yu, C. L. 2019. Nonparametric Contextual Bandits in Metric Spaces with Unknown Metric. In Advances in Neural Information Processing Systems 32, 14657-14667.

[40]

Weber, E.; Rizzoli, A. E.; Soncini-Sessa, R.; and Castelletti, A. 2002. Lexicographic Optimisation for Water Resources Planning: the Case of Lake Verbano, Italy. 235-240.

[41]

White, J. M. 2012. Bandit Algorithms for Website Optimization. O'Reilly Media, Inc.

[42]

Wray, K.; Zilberstein, S.; and Mouaddib, A.-I. 2015. Multi-Objective MDPs with Conditional Lexicographic Reward Preferences. Proceedings of the 29th AAAI Conference on Artificial Intelligence, 3418-3424.

[43]

Wray, K. H.; and Zilberstein, S. 2015. Multi-Objective POMDPs with Lexicographic Reward Preferences. In Proceedings of the 24th International Joint Conference on Artificial Intelligence, 1719-1725.

[44]

Xu, M.; and Klabjan, D. 2023. Pareto Regret Analyses in Multi-objective Multi-armed Bandit. In Proceedings of the 40th International Conference on International Conference on Machine Learning, 38499-38517.

[45]

Xue, B.; Wang, G.; Wang, Y.; and Zhang, L. 2020. Nearly Optimal Regret for Stochastic Linear Bandits with Heavy-Tailed Payoffs. In Proceedings of the 29th International Joint Conference on Artificial Intelligence, 2936-2942.

[46]

Yu, X.; Shao, H.; Lyu, M. R.; and King, I. 2018. Pure exploration of multi-armed bandits with heavy-tailed payoffs. In Proceedings of the 34th Conference on Uncertainty in Artificial Intelligence, 937-946.

[47]

Zhang, L.; Yang, T.; Jin, R.; Xiao, Y.; and Zhou, Z. 2016. Online Stochastic Linear Optimization under One-bit Feedback. In Proceedings of the 33rd International Conference on Machine Learning, 392-401.

[48]

Zhou, Z.; Xu, R.; and Blanchet, J. 2019. Learning in Generalized Linear Contextual Bandits with Stochastic Delays. In Advances in Neural Information Processing Systems 32, 5197-5208.

[49]

Zhu, Y.; and Mineiro, P. 2022. Contextual Bandits with Smooth Regret: Efficient Learning in Continuous Action Spaces. In Proceedings of the 39th International Conference on Machine Learning, 27574-27590.

Index Terms

Multiobjective lipschitz bandits under lexicographic ordering

Index terms have been assigned to the content through auto-classification.

Recommendations

Dueling bandits with weak regret
ICML'17: Proceedings of the 34th International Conference on Machine Learning - Volume 70

We consider online content recommendation with implicit feedback through pairwise comparisons, formalized as the so-called dueling bandit problem. We study the dueling bandit problem in the Condorcet winner setting, and consider two notions of regret: ...
Refined lower bounds for adversarial bandits
NIPS'16: Proceedings of the 30th International Conference on Neural Information Processing Systems

We provide new lower bounds on the regret that must be suffered by adversarial bandit algorithms. The new results show that recent upper bounds that either (a) hold with high-probability or (b) depend on the total loss of the best arm or (c) depend on ...
Robust lipschitz bandits to adversarial corruptions
NIPS '23: Proceedings of the 37th International Conference on Neural Information Processing Systems

Lipschitz bandit is a variant of stochastic bandits that deals with a continuous arm set defined on a metric space, where the reward function is subject to a Lipschitz constraint. In this paper, we introduce a new problem of Lipschitz bandits in the ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

AAAI'24/IAAI'24/EAAI'24: Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence

February 2024

23861 pages

ISBN:978-1-57735-887-9

Copyright © 2024 Association for the Advancement of Artificial Intelligence.

Sponsors

Association for the Advancement of Artificial Intelligence

Publisher

AAAI Press

Publication History

Published: 20 February 2024

Qualifiers

Research-article
Research
Refereed limited

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 25 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

Figures

Tables

Media

View Table of Conten