The multi-armed bandit, with constraints

Denardo, Eric V.; Feinberg, Eugene A.; Rothblum, Uriel G.

doi:10.1007/s10479-012-1250-y

The multi-armed bandit, with constraints

Published: 13 November 2012

Volume 208, pages 37–62, (2013)
Cite this article

Annals of Operations Research Aims and scope Submit manuscript

Eric V. Denardo¹,
Eugene A. Feinberg² &
Uriel G. Rothblum³

701 Accesses
Explore all metrics

Abstract

Presented in this paper is a self-contained analysis of a Markov decision problem that is known as the multi-armed bandit. The analysis covers the cases of linear and exponential utility functions. The optimal policy is shown to have a simple and easily-implemented form. Procedures for computing such a policy are presented, as are procedures for computing the expected utility that it earns, given any starting state. For the case of linear utility, constraints that link the bandits are introduced, and the constrained optimization problem is solved via column generation. The methodology is novel in several respects, which include the use of elementary row operations to simplify arguments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Structured prior distributions for the covariance matrix in latent factor models

Article Open access 26 June 2024

A survey of Bayesian Network structure learning

Article Open access 17 January 2023

A simple introduction to Markov Chain Monte–Carlo sampling

Article Open access 11 March 2016

References

Altman, E. (1999). Constrained Markov decision processes. Boca Raton, USA: Chapman & Hall/CRC.
Google Scholar
Bergemann, D., & Välimäkim, J. (2008). Bandit problems. In S. Durlauf & L. Blume (Eds.), The new Palgrave dictionary of economics (Vol. 1, 2nd ed., pp. 336–340). New York: Palgrave Macmillan.
Chapter Google Scholar
Berry, D. A., & Friestedt, B. (1985). Bandit problems. London: Chapman & Hall.
Book Google Scholar
Bertsimas, D., & Niño-Mora, J. (1993). Conservation laws, extended polymatroids and multi-armed bandit problems: a polyhedral approach to indexable systems. Mathematics of Operations Research, 21, 257–306.
Article Google Scholar
Denardo, E. V. (1967). Contraction mappings in the theory underlying dynamic programming. SIAM Review, 9, 165–177.
Article Google Scholar
Denardo, E. V., Park, H., & Rothblum, U. G. (2007). Risk-sensitive and risk-neutral multiarmed bandits. Mathematics of Operations Research, 32, 374–394.
Article Google Scholar
Denardo, E. V., & Rothblum, U. G. (2006). A turnpike theorem for a risk-sensitive Markov decision problem with stopping. SIAM Journal on Control and Optimization, 45, 414–431.
Article Google Scholar
El Karoui, N., & Karatzas, I. (1994). Dynamic allocation indices in continuous time. The Annals of Applied Probability, 4, 255–286.
Article Google Scholar
Feinberg, E. A., & Rothblum, U. G. (2012). Splitting randomized stationary policies in total—reward Markov decision processes. Mathematics of Operations Research, 37, 129–153.
Article Google Scholar
Gittins, J. C. (1979). Bandit problems and dynamic allocation indices (with discussion). Journal of the Royal Statistical Society. Series B, 41, 148–177.
Google Scholar
Gittins, J. C. (1989). Multi-armed bandit allocation indices. Chichester: Wiley
Google Scholar
Gittins, J. C., & Jones, D. M. (1974). A dynamic allocation index for the sequential design experiments. In J. Gani, K. Sarkadu, & I. Vince (Eds.), Progress in statistics, European meeting of statisticians I (pp. 241–266). Amsterdam: North-Holland.
Google Scholar
Gittins, J. C., Glazebrook, K., & Weber, R. (2011). Multi-armed bandit allocation indices (2nd ed.). Chichester: Wiley
Book Google Scholar
Kaspi, H., & Mandelbaum, A. (1998). Multi-armed bandits in discrete and continuous time. The Annals of Applied Probability, 8, 1270–1290.
Article Google Scholar
Katehakis, M. N., & Derman, C. (1986). Computing optimal sequential allocation rules in clinical trials. In J. Van Ryzin (Ed.), IMS lecture notes—monograph series: Vol. 8. Adaptive statistical procedures and related topics (pp. 29–39). Hayward: Inst. Math. Stat.
Chapter Google Scholar
Katehakis, M. N., & Veinott, A. F. Jr. (1987). The multiarmed bandit problem: decomposition and computation. Mathematics of Operations Research, 22, 262–268.
Article Google Scholar
Niño-Mora, J. (2007). A (2/3)n ³ fast pivoting algorithm for the Gittins index and optimal stopping of a Markov chain. INFORMS Journal on Computing, 10, 596–606.
Article Google Scholar
Schlag, K. (1998). Why imitate, and if so, how? A bounded rational approach to multi-armed bandits. Journal of Economic Theory, 78, 130–156.
Article Google Scholar
Sonin, I. (2008). A generalized Gittins index for Markov chains and its recursive calculation. Statistics & Probability Letters, 78, 1526–1533.
Article Google Scholar
Tsitsiklis, J. (1994). A short proof of the Gittins index theorem. The Annals of Applied Probability, 4, 194–199.
Article Google Scholar
Variaya, P., Walrand, J., & Buyukkoc, C. (1985). Extensions of the multi-armed bandit problem: The discounted case. IEEE Transactions on Automatic Control, AC-30, 426–439.
Article Google Scholar
Veinott, A. F. Jr. (1969). Discrete dynamic programming with sensitive discount optimality criteria. The Annals of Mathematical Statistics, 40, 1635–1660.
Article Google Scholar
Weber, R. (1992). On the Gittins index for multiarmed bandits. The Annals of Applied Probability, 2, 1024–1033.
Article Google Scholar
Weiss, G. (1988). Branching bandit processes. Probability in the Engineering and Informational Sciences, 2, 269–278.
Article Google Scholar
Whittle, P. (1980). Multi-armed bandits and the Gittins index. Journal of the Royal Statistical Society. Series B, 43, 143–149.
Google Scholar

Download references

Acknowledgements

The authors are pleased to acknowledge that this paper has benefited immensely from the reactions of Dr. Pelin Cambolat to earlier drafts. This paper has also been improved markedly by two rounds of very careful, thoughtful and constructive refereeing. The research of the second author has been supported in part by NSF grant CMMI-0928490. The research of the third author has been supported in part by ISF Israel Science Foundation grant 901/10.

Author information

Authors and Affiliations

Center for Systems Sciences, Yale University, PO Box 208267, New Haven, CT, 06520, USA
Eric V. Denardo
Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, NY, 11794-3600, USA
Eugene A. Feinberg
Late of the Faculty of Industrial Engineering and Management, Technion—Israel Institute of Technology, Haifa, 32000, Israel
Uriel G. Rothblum

Authors

Eric V. Denardo
View author publications
You can also search for this author in PubMed Google Scholar
Eugene A. Feinberg
View author publications
You can also search for this author in PubMed Google Scholar
Uriel G. Rothblum
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Eugene A. Feinberg.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Denardo, E.V., Feinberg, E.A. & Rothblum, U.G. The multi-armed bandit, with constraints. Ann Oper Res 208, 37–62 (2013). https://doi.org/10.1007/s10479-012-1250-y

Download citation

Published: 13 November 2012
Issue Date: September 2013
DOI: https://doi.org/10.1007/s10479-012-1250-y

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The multi-armed bandit, with constraints

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Structured prior distributions for the covariance matrix in latent factor models

A survey of Bayesian Network structure learning

A simple introduction to Markov Chain Monte–Carlo sampling

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

The multi-armed bandit, with constraints

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Structured prior distributions for the covariance matrix in latent factor models

A survey of Bayesian Network structure learning

A simple introduction to Markov Chain Monte–Carlo sampling

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation