Discrete multi-armed bandits and multi-parameter processes

Mandelbaum, Avi

doi:10.1007/BF00366276

Discrete multi-armed bandits and multi-parameter processes

Published: January 1986

Volume 71, pages 129–147, (1986)
Cite this article

Download PDF

Probability Theory and Related Fields Aims and scope Submit manuscript

Discrete multi-armed bandits and multi-parameter processes

Download PDF

Avi Mandelbaum¹

323 Accesses
Explore all metrics

Summary

The general multi-armed bandit problem is reformulated and solved as a control problem over a partially ordered set. The approach taken provides a technically convenient framework for bandit-like problems. It also adds insight to the structure of strategies over partially ordered sets.

Article PDF

A practical guide to multi-objective reinforcement learning and planning

Article Open access 13 April 2022

Detecting and diagnosing prior and likelihood sensitivity with power-scaling

Article Open access 31 December 2023

Distributionally robust stochastic programs with side information based on trimmings

Article Open access 22 November 2021

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Bather, J.A.: Optimal Stopping of a Brownian Motion: A Comparison Technique. In: H. Chernoffs 60'th Birthday Festschrift. D. Siegmund et al. (eds.). New York: Academic Press 1983
Google Scholar
Cairoli, R., Walsh, J.B.: Stochastic integrals in the plane. Acta. Math. 134, 111–183 (1975)
Google Scholar
Gittins, J.C.: Bandit processes and dynamic allocation indices. J. R. Stat. Soc., Ser. B 41, 148–177 (1979)
Google Scholar
Herkenrath, V., Kalin, D., Vogel, W. (eds.): Mathematical Learning Models-Theory and Algorithms: Proceedings of a Conference. Lect. Notes Stat. Berlin-Heidelberg-New York: Springer 1983
Google Scholar
Karatzas, I.: Gittins indices in the dynamic allocation problem for diffusion processes. Ann. Probab. 12, 173–192 (1984)
Google Scholar
Krengel, V., Sucheston, L.: Stopping rules and tactics for processes indexed by a directed set. J. Multivariate Anal. 11, 199–229 (1981)
Google Scholar
Lawler, G.F., Vanderbei, R.J.: Markov strategies for optimal control problems indexed by a partially ordered set. Ann. Probab. 11, 642–647 (1983)
Google Scholar
Mandelbaum, A., Vanderbei, R.J.: Optimal stopping and supermartingales over partially ordered sets. Z. Wahrscheinlichkeitstheor. Verw. Geb. 57, 253–264 (1981)
Google Scholar
Mazziotto, G., Szpirglas, J.: Arrêt optimal sur le plan. Preprint (1981)
Neveu, J.: Discrete-Parameter Martingales. Amsterdam: North Holland 1975
Google Scholar
Snell, J.L.: Applications of Martingales system theorems. Trans. Am. Math. Soc. 73, 293–312 (1952)
Google Scholar
Varaiya, P., Walrand, J., Buyukkoc, C.: Extensions of the multi-armed bandit problem. The discounted case. To be published in IEEE Trans. Autom. Control (1984)
Walsh, J.B.: Martingales with a multi-dimensional parameter and stochastic integrals in the plane. Cours de eème Cycle. Laboratoire de Calcul de Probabilités, Université Paris VI, Année 76–77
Washburn, R.B., Willsky, A.S.: Optional sampling of supermartingales indexed by partially ordered sets. Ann. Probab. 9, 957–970 (1981)
Google Scholar
Whittle, P.: Optimization over Time: Dynamic Programming and Stochastic Control, Vol. I. New York: Wiley 1982
Google Scholar

Download references

Author information

Authors and Affiliations

Graduate School of Business, Stanford University, 94305, Stanford, CA, USA
Avi Mandelbaum

Authors

Avi Mandelbaum
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mandelbaum, A. Discrete multi-armed bandits and multi-parameter processes. Probab. Th. Rel. Fields 71, 129–147 (1986). https://doi.org/10.1007/BF00366276

Download citation

Received: 03 September 1984
Issue Date: January 1986
DOI: https://doi.org/10.1007/BF00366276

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Discrete multi-armed bandits and multi-parameter processes

Summary

Article PDF

Similar content being viewed by others

A practical guide to multi-objective reinforcement learning and planning

Detecting and diagnosing prior and likelihood sensitivity with power-scaling

Distributionally robust stochastic programs with side information based on trimmings

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Discrete multi-armed bandits and multi-parameter processes

Summary

Article PDF

Similar content being viewed by others

A practical guide to multi-objective reinforcement learning and planning

Detecting and diagnosing prior and likelihood sensitivity with power-scaling

Distributionally robust stochastic programs with side information based on trimmings

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation