Summary
The general multi-armed bandit problem is reformulated and solved as a control problem over a partially ordered set. The approach taken provides a technically convenient framework for bandit-like problems. It also adds insight to the structure of strategies over partially ordered sets.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Bather, J.A.: Optimal Stopping of a Brownian Motion: A Comparison Technique. In: H. Chernoffs 60'th Birthday Festschrift. D. Siegmund et al. (eds.). New York: Academic Press 1983
Cairoli, R., Walsh, J.B.: Stochastic integrals in the plane. Acta. Math. 134, 111–183 (1975)
Gittins, J.C.: Bandit processes and dynamic allocation indices. J. R. Stat. Soc., Ser. B 41, 148–177 (1979)
Herkenrath, V., Kalin, D., Vogel, W. (eds.): Mathematical Learning Models-Theory and Algorithms: Proceedings of a Conference. Lect. Notes Stat. Berlin-Heidelberg-New York: Springer 1983
Karatzas, I.: Gittins indices in the dynamic allocation problem for diffusion processes. Ann. Probab. 12, 173–192 (1984)
Krengel, V., Sucheston, L.: Stopping rules and tactics for processes indexed by a directed set. J. Multivariate Anal. 11, 199–229 (1981)
Lawler, G.F., Vanderbei, R.J.: Markov strategies for optimal control problems indexed by a partially ordered set. Ann. Probab. 11, 642–647 (1983)
Mandelbaum, A., Vanderbei, R.J.: Optimal stopping and supermartingales over partially ordered sets. Z. Wahrscheinlichkeitstheor. Verw. Geb. 57, 253–264 (1981)
Mazziotto, G., Szpirglas, J.: Arrêt optimal sur le plan. Preprint (1981)
Neveu, J.: Discrete-Parameter Martingales. Amsterdam: North Holland 1975
Snell, J.L.: Applications of Martingales system theorems. Trans. Am. Math. Soc. 73, 293–312 (1952)
Varaiya, P., Walrand, J., Buyukkoc, C.: Extensions of the multi-armed bandit problem. The discounted case. To be published in IEEE Trans. Autom. Control (1984)
Walsh, J.B.: Martingales with a multi-dimensional parameter and stochastic integrals in the plane. Cours de eème Cycle. Laboratoire de Calcul de Probabilités, Université Paris VI, Année 76–77
Washburn, R.B., Willsky, A.S.: Optional sampling of supermartingales indexed by partially ordered sets. Ann. Probab. 9, 957–970 (1981)
Whittle, P.: Optimization over Time: Dynamic Programming and Stochastic Control, Vol. I. New York: Wiley 1982
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Mandelbaum, A. Discrete multi-armed bandits and multi-parameter processes. Probab. Th. Rel. Fields 71, 129–147 (1986). https://doi.org/10.1007/BF00366276
Received:
Issue Date:
DOI: https://doi.org/10.1007/BF00366276