Article

Thompson Sampling for Dynamic Multi-armed Bandits

Authors:

ICMLA '11: Proceedings of the 2011 10th International Conference on Machine Learning and Applications and Workshops - Volume 01

Pages 484 - 489

https://doi.org/10.1109/ICMLA.2011.144

Published: 18 December 2011 Publication History

Abstract

The importance of multi-armed bandit (MAB) problems is on the rise due to their recent application in a large variety of areas such as online advertising, news article selection, wireless networks, and medicinal trials, to name a few. The most common assumption made when solving such MAB problems is that the unknown reward probability theta k of each bandit arm k is fixed. However, this assumption rarely holds in practice simply because real-life problems often involve underlying processes that are dynamically evolving. In this paper, we model problems where reward probabilities theta k are drifting, and introduce a new method called Dynamic Thompson Sampling (DTS) that facilitates Order Statistics based Thompson Sampling for these dynamically evolving MABs. The DTS algorithm adapts its success probability estimates, hat theta k, faster than traditional Thompson Sampling schemes and thus leads to improved performance in terms of lower regret. Extensive experiments demonstrate that DTS outperforms current state-of-the-art approaches, namely pure Thompson Sampling, UCB-Normal and UCB_f, for the case of dynamic reward probabilities. Furthermore, this performance advantage increases persistently with the number of bandit arms.

Cited By

View all

Jeong TKoratikere PLeifsson LKoziel SPietrenko-Dabrowska A(2024)Adaptive Hyperparameter Tuning Within Neural Network-Based Efficient Global OptimizationComputational Science – ICCS 202410.1007/978-3-031-63775-9_6(74-89)Online publication date: 2-Jul-2024
https://dl.acm.org/doi/10.1007/978-3-031-63775-9_6
Min SRusso DKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)An information-theoretic analysis of nonstationary bandit learningProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619442(24831-24849)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3619442
Pavelski LKessaci MDelgado M(2021)Dynamic Learning in Hyper-Heuristics to Solve Flowshop ProblemsIntelligent Systems10.1007/978-3-030-91702-9_11(155-169)Online publication date: 29-Nov-2021
https://dl.acm.org/doi/10.1007/978-3-030-91702-9_11
Show More Cited By

Recommendations

Ballooning Multi-Armed Bandits
AAMAS '20: Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems

We introduce ballooning multi-armed bandits (BL-MAB), a novel extension to the classical stochastic MAB model. In the BL-MAB model, the set of available arms grows (or balloons) over time. The regret in a BL-MAB setting is computed with respect to the ...
Thompson sampling for budgeted multi-armed bandits
IJCAI'15: Proceedings of the 24th International Conference on Artificial Intelligence

Thompson sampling is one of the earliest randomized algorithms for multi-armed bandits (MAB). In this paper, we extend the Thompson sampling to Budgeted MAB, where there is random cost for pulling an arm and the total cost is constrained by a budget. We ...
Thompson sampling algorithms for cascading bandits

Motivated by the important and urgent need for efficient optimization in online recommender systems, we revisit the cascading bandit model proposed by Kveton et al. (2015a). While Thompson sampling (TS) algorithms have been shown to be empirically ...

Comments

Information & Contributors

Information

Published In

ICMLA '11: Proceedings of the 2011 10th International Conference on Machine Learning and Applications and Workshops - Volume 01

December 2011

507 pages

ISBN:9780769546070

Publisher

IEEE Computer Society

United States

Publication History

Published: 18 December 2011

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 12 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Jeong TKoratikere PLeifsson LKoziel SPietrenko-Dabrowska A(2024)Adaptive Hyperparameter Tuning Within Neural Network-Based Efficient Global OptimizationComputational Science – ICCS 202410.1007/978-3-031-63775-9_6(74-89)Online publication date: 2-Jul-2024
https://dl.acm.org/doi/10.1007/978-3-031-63775-9_6
Min SRusso DKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)An information-theoretic analysis of nonstationary bandit learningProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619442(24831-24849)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3619442
Pavelski LKessaci MDelgado M(2021)Dynamic Learning in Hyper-Heuristics to Solve Flowshop ProblemsIntelligent Systems10.1007/978-3-030-91702-9_11(155-169)Online publication date: 29-Nov-2021
https://dl.acm.org/doi/10.1007/978-3-030-91702-9_11
Russac YVernade CCappé OWallach HLarochelle HBeygelzimer Ad'Alché-Buc FFox E(2019)Weighted linear bandits for non-stationary environmentsProceedings of the 33rd International Conference on Neural Information Processing Systems10.5555/3454287.3455365(12040-12049)Online publication date: 8-Dec-2019
https://dl.acm.org/doi/10.5555/3454287.3455365
KhudaBukhsh ACarbonell JAndre EKoenig SDastani MSukthankar G(2018)Expertise Drift in Referral NetworksProceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems10.5555/3237383.3237449(425-433)Online publication date: 9-Jul-2018
https://dl.acm.org/doi/10.5555/3237383.3237449
Phillips MNarayanan VAine SLikhachev M(2015)Efficient search with an ensemble of heuristicsProceedings of the 24th International Conference on Artificial Intelligence10.5555/2832249.2832358(784-791)Online publication date: 25-Jul-2015
https://dl.acm.org/doi/10.5555/2832249.2832358
Ikonomovska EJafarpour SDasdan ACao LZhang CJoachims TWebb GMargineantu DWilliams G(2015)Real-Time Bid Prediction using Thompson Sampling-Based Expert SelectionProceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining10.1145/2783258.2788586(1869-1878)Online publication date: 10-Aug-2015
https://dl.acm.org/doi/10.1145/2783258.2788586

Abstract

Cited By

Recommendations

Ballooning Multi-Armed Bandits

Thompson sampling for budgeted multi-armed bandits

Thompson sampling algorithms for cascading bandits

Comments

Information

Published In

Publisher

Publication History

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

Share

Share this Publication link

Share on social media

Affiliations