On the Prior Sensitivity of Thompson Sampling

Liu, Che-Yu; Li, Lihong

doi:10.1007/978-3-319-46379-7_22

Che-Yu Liu¹⁶ &
Lihong Li¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9925))

Included in the following conference series:

International Conference on Algorithmic Learning Theory

1444 Accesses
3 Citations

Abstract

The empirically successful Thompson Sampling algorithm for stochastic bandits has drawn much interest in understanding its theoretical properties. One important benefit of the algorithm is that it allows domain knowledge to be conveniently encoded as a prior distribution to balance exploration and exploitation more effectively. While it is generally believed that the algorithm’s regret is low (high) when the prior is good (bad), little is known about the exact dependence. This paper is a first step towards answering this important question: focusing on a special yet representative case, we fully characterize the algorithm’s worst-case dependence of regret on the choice of prior. As a corollary, these results also provide useful insights into the general sensitivity of the algorithm to the choice of priors, when no structural assumptions are made. In particular, with p being the prior probability mass of the true reward-generating model, we prove $O(\sqrt{T/p})$ and $O(\sqrt{(1-p)T})$ regret upper bounds for the poor- and good-prior cases, respectively, as well as matching lower bounds. Our proofs rely on a fundamental property of Thompson Sampling and make heavy use of martingale theory, both of which appear novel in the Thompson-Sampling literature and may be useful for studying other behavior of the algorithm.

Most of this work was done when C.Y. Liu was an intern at Microsoft.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Contextual Combinatorial Cascading Thompson Sampling

Sub-sampling for Multi-armed Bandits

Thompson Sampling for Optimizing Stochastic Local Search

Notes

1.
Note that in this paper, we do not impose any continuity structure on the reward distributions $\nu (\theta )$ with respect to $\theta \in \varTheta $. Therefore, it is easy to see that when $\varTheta $ is uncountable, the (frequentist) regret of Thompson Sampling, as defined in Eq. 1, in the worst-case scenario is linear in time under most underlying models $\theta \in \varTheta $.

References

Abbasi-Yadkori, Y., Pál, D., Szepesvári, C.: Improved algorithms for linear stochastic bandits. In: NIPS, pp. 2312–2320 (2011)
Google Scholar
Agarwal, A., Hsu, D., Kale, S., Langford, J., Li, L., Schapire, R.E.: Taming the monster: a fast and simple algorithm for contextual bandits. In: ICML, pp. 1638–1646 (2014)
Google Scholar
Agrawal, S., Goyal, N.: Analysis of Thompson sampling for the multi-armed bandit problem. In: COLT, pp. 39.1–39.26 (2012)
Google Scholar
Agrawal, S., Goyal, N.: Further optimal regret bounds for Thompson sampling. In: AISTATS, pp. 99–107 (2013)
Google Scholar
Agrawal, S., Goyal, N.: Thompson sampling for contextual bandits with linear payoffs. In: ICML, pp. 127–135 (2013)
Google Scholar
Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.: The non-stochastic multi-armed bandit problem. SIAM J. Comput. 32(1), 48–77 (2002)
Article MathSciNet MATH Google Scholar
Bartroff, J., Lai, T.L., Shih, M.-C.: Sequential Experimentation in Clinical Trials: Design and Analysis, vol. 298. Springer, Heildelberg (2013)
MATH Google Scholar
Bubeck, S., Cesa-Bianchi, N.: Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Found. Trends Mach. Learn. 5(1), 1–122 (2012)
Article MATH Google Scholar
Bubeck, S., Liu, C.Y.: Prior-free and prior-dependent regret bounds for Thompson sampling. In: NIPS, pp. 638–646 (2013)
Google Scholar
Cesa-Bianchi, N., Lugosi, G.: Prediction, Learning, and Games. Cambridge University Press, Cambridge (2006)
Book MATH Google Scholar
Chapelle, O., Li, L.: An empirical evaluation of Thompson sampling. In: NIPS, pp. 2249–2257 (2011)
Google Scholar
Chu, W., Li, L., Reyzin, L., Schapire, R.E.: Contextual bandits with linear payoff functions. In: AISTATS, pp. 208–214 (2011)
Google Scholar
Gopalan, A., Mannor, S., Mansour, Y.: Thompson sampling for complex online problems. In: ICML, pp. 100–108 (2014)
Google Scholar
Graepel, T., Candela, J.Q., Borchert, T., Herbrich, R.: Web-scale Bayesian click-through rate prediction for sponsored search advertising in Microsoft’s Bing search engine. In: ICML, pp. 13–20 (2010)
Google Scholar
Gravin, N., Peres, Y., Sivan, B.: Towards optimal algorithms for prediction with expert advice. In: SODA, pp. 528–547 (2016)
Google Scholar
Guha, S., Munagala, K.: Approximation algorithms for Bayesian multi-armed bandit problems. arXiv preprint arXiv: 1306.3525v2 (2013)
Guha, S., Munagala, K.: Stochastic regret minimization via Thompson sampling. In: COLT, pp. 317–338 (2014)
Google Scholar
Honda, J., Takemura, A.: Optimality of Thompson sampling for Gaussian bandits depends on priors. In: AISTATS, pp. 375–383 (2014)
Google Scholar
Kaufmann, E., Korda, N., Munos, R.: Thompson sampling: an asymptotically optimal finite-time analysis. In: Bshouty, N.H., Stoltz, G., Vayatis, N., Zeugmann, T. (eds.) ALT 2012. LNCS, vol. 7568, pp. 199–213. Springer, Heidelberg (2012)
Chapter Google Scholar
Komiyama, J., Honda, J., Nakagawa, H.: Optimal regret analysis of Thompson sampling in stochastic multi-armed bandit problem with multiple plays. In: ICML, pp. 1152–1161 (2015)
Google Scholar
Lai, T.L., Robbins, H.: Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6, 4–22 (1985)
Article MathSciNet MATH Google Scholar
Lattimore, T.: The pareto regret frontier for bandits. In: NIPS, pp. 208–216 (2015)
Google Scholar
Li, L.: Generalized Thompson sampling for contextual bandits. Technical report MSR-TR-2013-136, Microsoft Research (2013)
Google Scholar
Liu, C.Y., Li, L.: On the prior sensitivity of Thompson sampling (2015). arXiv:1506.03378
May, B.C., Korda, N., Lee, A., Leslie, D.S.: Optimistic Bayesian sampling in contextual-bandit problems. J. Mach. Learn. Res. 13, 2069–2106 (2012)
MathSciNet MATH Google Scholar
Russo, D., Van Roy, B.: Learning to optimize via posterior sampling. Math. Oper. Res. 39(4), 1221–1243 (2014)
Article MathSciNet MATH Google Scholar
Russo, D., Van Roy, B.: An information-theoretic analysis of Thompson sampling. J. Mach. Learn. Res. 17(68), 1–30 (2016)
MathSciNet MATH Google Scholar
Scott, S.L.: A modern Bayesian look at the multi-armed bandit. Appl. Stoch. Models Bus. Ind. 26, 639–658 (2010)
Article MathSciNet Google Scholar
Thompson, W.: On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Bull. Am. Math. Soc. 25, 285–294 (1933)
MATH Google Scholar
Xia, Y., Li, H., Qin, T., Yu, N., Liu, T.-Y.: Thompson sampling for budgeted multi-armed bandits. In: IJCAI, pp. 3960–3966 (2015)
Google Scholar

Download references

Acknowledgments

We thank Sébastien Bubeck and the anonymous reviewers for helpful advice that improves the presentation of the paper.

Author information

Authors and Affiliations

ORFE, Princeton University, Princeton, NJ, 08544, USA
Che-Yu Liu
Microsoft Research, One Microsoft Way, Redmond, WA, 98052, USA
Lihong Li

Authors

Che-Yu Liu
View author publications
You can also search for this author in PubMed Google Scholar
Lihong Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lihong Li .

Editor information

Editors and Affiliations

Montanuniversität Leoben , Leoben, Austria
Ronald Ortner
Ruhr-Uni-Bochum , Bochum, Germany
Hans Ulrich Simon
University of Regina , Regina, Saskatchewan, Canada
Sandra Zilles

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, CY., Li, L. (2016). On the Prior Sensitivity of Thompson Sampling. In: Ortner, R., Simon, H., Zilles, S. (eds) Algorithmic Learning Theory. ALT 2016. Lecture Notes in Computer Science(), vol 9925. Springer, Cham. https://doi.org/10.1007/978-3-319-46379-7_22

Download citation

DOI: https://doi.org/10.1007/978-3-319-46379-7_22
Published: 21 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46378-0
Online ISBN: 978-3-319-46379-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

On the Prior Sensitivity of Thompson Sampling

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Contextual Combinatorial Cascading Thompson Sampling

Sub-sampling for Multi-armed Bandits

Thompson Sampling for Optimizing Stochastic Local Search

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

On the Prior Sensitivity of Thompson Sampling

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Contextual Combinatorial Cascading Thompson Sampling

Sub-sampling for Multi-armed Bandits

Thompson Sampling for Optimizing Stochastic Local Search

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation