Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1609/aaai.v38i15.29558guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
research-article

Multiobjective lipschitz bandits under lexicographic ordering

Published: 20 February 2024 Publication History

Abstract

This paper studies the multiobjective bandit problem under lexicographic ordering, wherein the learner aims to simultaneously maximize m objectives hierarchically. The only existing algorithm for this problem considers the multi-armed bandit model, and its regret bound is Õ((KT)2/3) under a metric called priority-based regret. However, this bound is suboptimal, as the lower bound for single objective multi-armed bandits is Ω(K log T). Moreover, this bound becomes vacuous when the arm number K is infinite. To address these limitations, we investigate the multiobjective Lipschitz bandit model, which allows for an infinite arm set. Utilizing a newly designed multi-stage decision-making strategy, we develop an improved algorithm that achieves a general regret bound of $\widetilde{O}(T^{(d_z^i+1)/(d_z^i+2)})$ for the i-th objective, where diz is the zooming dimension for the i-th objective, with i ∈ {1, 2,..., m}. This bound matches the lower bound of the single objective Lipschitz bandit problem in terms of T, indicating that our algorithm is almost optimal. Numerical experiments confirm the effectiveness of our algorithm.

References

[1]
Abbasi-yadkori, Y.; Pál, D.; and Szepesvári, C. 2011. Improved Algorithms for Linear Stochastic Bandits. In Advances in Neural Information Processing Systems 24, 2312-2320.
[2]
Agrawal, R. 1995. The Continuum-Armed Bandit Problem. SIAM Journal on Control and Optimization, 33(6): 1926-1951.
[3]
Auer, P. 2002. Using Confidence Bounds for Exploitation-Exploration Trade-offs. Journal of Machine Learning Research, 3(11): 397-422.
[4]
Auer, P.; Ortner, R.; and Szepesvári, C. 2007. Improved Rates for the Stochastic Continuum-Armed Bandit Problem. In Proceedings of the 20th Annual Conference on Learning Theory, 454-468.
[5]
Bubeck, S.; Dekel, O.; Koren, T.; and Peres, Y. 2015. Bandit Convex Optimization: √T Regret in One Dimension. In Proceedings of The 28th Conference on Learning Theory, 266-278.
[6]
Bubeck, S.; Munos, R.; Stoltz, G.; and Szepesvári, C. 2011. X-Armed Bandits. Journal of Machine Learning Research, 12(46): 1655-1695.
[7]
Bubeck, S.; Stoltz, G.; Szepesvári, C.; and Munos, R. 2008. Online Optimization in X-Armed Bandits. In Advances in Neural Information Processing Systems 21, 201-208.
[8]
Bubeck, S.; Stoltz, G.; and Yu, J. Y. 2011. Lipschitz Bandits Without the Lipschitz Constant. In In Proceedings of the 22nd International Conference on Algorithmic Learning Theory, 144-158.
[9]
Chapelle, O.; and Li, L. 2011. An Empirical Evaluation of Thompson Sampling. In Advances in Neural Information Processing Systems 24, 2249-2257.
[10]
Drugan, M. M.; and Nowe, A. 2013. Designing multi-objective multi-armed bandits algorithms: A study. In The 2013 International Joint Conference on Neural Networks, 1-8.
[11]
Ehrgott, M. 2005. Multicriteria Optimization. Berlin, Heidelberg: Springer-Verlag.
[12]
Feng, Y.; Huang, Z.; and Wang, T. 2022. Lipschitz Bandits with Batched Feedback. In Advances in Neural Information Processing Systems 35, 19836-19848.
[13]
Gou, Y.; Yi, J.; and Zhang, L. 2023. Stochastic Graphical Bandits with Heavy-Tailed Rewards. In Proceedings of the 39th Conference on Uncertainty in Artificial Intelligence, 734-744.
[14]
Hosseini, H.; Sikdar, S.; Vaish, R.; and Xia, L. 2021. Fair and Efficient Allocations under Lexicographic Preferences. Proceedings of the 35th AAAI Conference on Artificial Intelligence, 5472-5480.
[15]
Hüyük, A.; and Tekin, C. 2021. Multi-objective multi-armed bandit with lexicographically ordered and satisficing objectives. Machine Learning, 110(6): 1233-1266.
[16]
Jee, K.-W.; McShan, D. L.; and Fraass, B. A. 2007. Lexicographic ordering: intuitive multicriteria optimization for IMRT. Physics in Medicine & Biology, 52: 1845-1861.
[17]
Jun, K.-S.; Bhargava, A.; Nowak, R.; and Willett, R. 2017. Scalable Generalized Linear Bandits: Online Computation and Hashing. In Advances in Neural Information Processing Systems 30, 99-109.
[18]
Kleinberg, R. 2004. Nearly Tight Bounds for the Continuum-armed Bandit Problem. Advances in Neural Information Processing Systems 17, 697-704.
[19]
Kleinberg, R.; Slivkins, A.; and Upfal, E. 2008. Multi-armed Bandits in Metric Spaces. In Proceedings of the 40th Annual ACM Symposium on Theory of Computing, 681-690.
[20]
Lai, T. L.; and Robbins, H. 1985. Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 6(1): 4-22.
[21]
Lattimore, T.; and Szepesvari, C. 2020. Bandit Algorithms. Cambridge University Press.
[22]
Li, L.; Chu, W.; Langford, J.; Moon, T.; and Wang, X. 2012. An unbiased offline evaluation of contextual bandit algorithms with generalized linear models. In Proceedings of the Workshop on On-line Trading of Exploration and Exploitation 2, 19-36.
[23]
Li, L.; Chu, W.; Langford, J.; and Schapire, R. E. 2010. A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th international conference on World wide web, 661-670.
[24]
Lu, S.; Wang, G.; Hu, Y.; and Zhang, L. 2019a. Multi-Objective Generalized Linear Bandits. In Proceedings of the 28th International Joint Conference on Artificial Intelligence, 3080-3086.
[25]
Lu, S.; Wang, G.; Hu, Y.; and Zhang, L. 2019b. Optimal Algorithms for Lipschitz Bandits with Heavy-tailed Rewards. In Proceedings of the 36th International Conference on Machine Learning, 4154-4163.
[26]
Luo, H.; Wei, C.-Y.; Agarwal, A.; and Langford, J. 2018. Efficient Contextual Bandits in Non-stationary Worlds. In Proceedings of the 31st Conference On Learning Theory, 1739-1776.
[27]
Ma, X.; Zhao, L.; Huang, G.; Wang, Z.; Hu, Z.; Zhu, X.; and Gai, K. 2018. Entire Space Multi-Task Model: An Effective Approach for Estimating Post-Click Conversion Rate. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, 1137-1140.
[28]
Magureanu, S.; Combes, R.; and Proutiere, A. 2014. Lipschitz Bandits: Regret Lower Bound and Optimal Algorithms. In Proceedings of The 27th Conference on Learning Theory, 975-999.
[29]
Podimata, C.; and Slivkins, A. 2021. Adaptive Discretization for Adversarial Lipschitz Bandits. In Proceedings of the 34st Conference On Learning Theory, 3788-3805.
[30]
Q. Yahyaa, S.; M. Drugan, M.; and Manderick, B. 2014. Knowledge Gradient for Multi-Objective Multi-Armed Bandit Algorithms. In Proceedings of the 6th International Conference on Agents and Artificial Intelligence 1, 74-83.
[31]
Qin, Y.; Li, Y.; Pasqualetti, F.; Fazel, M.; and Oymak, S. 2023. Stochastic Contextual Bandits with Long Horizon Rewards. Proceedings of the 37th AAAI Conference on Artificial Intelligence, 9525-9533.
[32]
Robbins, H. 1952. Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc., 58(5): 527-535.
[33]
Shao, H.; Yu, X.; King, I.; and Lyu, M. R. 2018. Almost Optimal Algorithms for Linear Stochastic Bandits with Heavy-tailed Payoffs. In Advances in Neural Information Processing Systems 31, 8430-8439.
[34]
Skalse, J.; Hammond, L.; Griffin, C.; and Abate, A. 2022. Lexicographic Multi-Objective Reinforcement Learning. In Proceedings of the 31st International Joint Conference on Artificial Intelligence, 3430-3436.
[35]
Turgay, E.; Oner, D.; and Tekin, C. 2018. Multi-objective contextual bandit problem with similarity information. In Proceedings of the 21st International Conference on Artificial Intelligence and Statistics, 1673-1681.
[36]
Van Moffaert, K.; Van Vaerenbergh, K.; Vrancx, P.; and Nowe, A. 2014. Multi-objective Χ-Armed bandits. In 2014 International Joint Conference on Neural Networks, 2331-2338.
[37]
Villar, S. S.; Bowden, J.; and Wason, J. 2015. Multi-armed Bandit Models for the Optimal Design of Clinical Trials: Benefits and Challenges. Statistical Science, 30(2): 199-215.
[38]
Wang, T.; Ye, W.; Geng, D.; and Rudin, C. 2020. Towards Practical Lipschitz Bandits. In Proceedings of the 2020 ACM-IMS on Foundations of Data Science Conference, 129-138.
[39]
Wanigasekara, N.; and Yu, C. L. 2019. Nonparametric Contextual Bandits in Metric Spaces with Unknown Metric. In Advances in Neural Information Processing Systems 32, 14657-14667.
[40]
Weber, E.; Rizzoli, A. E.; Soncini-Sessa, R.; and Castelletti, A. 2002. Lexicographic Optimisation for Water Resources Planning: the Case of Lake Verbano, Italy. 235-240.
[41]
White, J. M. 2012. Bandit Algorithms for Website Optimization. O'Reilly Media, Inc.
[42]
Wray, K.; Zilberstein, S.; and Mouaddib, A.-I. 2015. Multi-Objective MDPs with Conditional Lexicographic Reward Preferences. Proceedings of the 29th AAAI Conference on Artificial Intelligence, 3418-3424.
[43]
Wray, K. H.; and Zilberstein, S. 2015. Multi-Objective POMDPs with Lexicographic Reward Preferences. In Proceedings of the 24th International Joint Conference on Artificial Intelligence, 1719-1725.
[44]
Xu, M.; and Klabjan, D. 2023. Pareto Regret Analyses in Multi-objective Multi-armed Bandit. In Proceedings of the 40th International Conference on International Conference on Machine Learning, 38499-38517.
[45]
Xue, B.; Wang, G.; Wang, Y.; and Zhang, L. 2020. Nearly Optimal Regret for Stochastic Linear Bandits with Heavy-Tailed Payoffs. In Proceedings of the 29th International Joint Conference on Artificial Intelligence, 2936-2942.
[46]
Yu, X.; Shao, H.; Lyu, M. R.; and King, I. 2018. Pure exploration of multi-armed bandits with heavy-tailed payoffs. In Proceedings of the 34th Conference on Uncertainty in Artificial Intelligence, 937-946.
[47]
Zhang, L.; Yang, T.; Jin, R.; Xiao, Y.; and Zhou, Z. 2016. Online Stochastic Linear Optimization under One-bit Feedback. In Proceedings of the 33rd International Conference on Machine Learning, 392-401.
[48]
Zhou, Z.; Xu, R.; and Blanchet, J. 2019. Learning in Generalized Linear Contextual Bandits with Stochastic Delays. In Advances in Neural Information Processing Systems 32, 5197-5208.
[49]
Zhu, Y.; and Mineiro, P. 2022. Contextual Bandits with Smooth Regret: Efficient Learning in Continuous Action Spaces. In Proceedings of the 39th International Conference on Machine Learning, 27574-27590.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
AAAI'24/IAAI'24/EAAI'24: Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence
February 2024
23861 pages
ISBN:978-1-57735-887-9

Sponsors

  • Association for the Advancement of Artificial Intelligence

Publisher

AAAI Press

Publication History

Published: 20 February 2024

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 25 Jan 2025

Other Metrics

Citations

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media