Nonparametric Gaussian Mixture Models for the Multi-Armed Bandit

Urteaga, Iñigo; Wiggins, Chris H.

Statistics > Machine Learning

arXiv:1808.02932 (stat)

[Submitted on 8 Aug 2018 (v1), last revised 25 Aug 2022 (this version, v4)]

Title:Nonparametric Gaussian Mixture Models for the Multi-Armed Bandit

Authors:Iñigo Urteaga, Chris H. Wiggins

View PDF

Abstract:We here adopt Bayesian nonparametric mixture models to extend multi-armed bandits in general, and Thompson sampling in particular, to scenarios where there is reward model uncertainty. In the stochastic multi-armed bandit, the reward for the played arm is generated from an unknown distribution. Reward uncertainty, i.e., the lack of knowledge about the reward-generating distribution, induces the exploration-exploitation trade-off: a bandit agent needs to simultaneously learn the properties of the reward distribution and sequentially decide which action to take next.
In this work, we extend Thompson sampling to scenarios where there is reward model uncertainty by adopting Bayesian nonparametric Gaussian mixture models for flexible reward density estimation. The proposed Bayesian nonparametric mixture model Thompson sampling sequentially learns the reward model that best approximates the true, yet unknown, per-arm reward distribution, achieving successful regret performance. We derive, based on a novel posterior convergence based analysis, an asymptotic regret bound for the proposed method. In addition, we empirically evaluate its performance in diverse and previously elusive bandit environments, e.g., with rewards not in the exponential family, subject to outliers, and with different per-arm reward distributions.
We show that the proposed Bayesian nonparametric Thompson sampling outperforms, both in averaged cumulative regret and in regret volatility, state-of-the-art alternatives. The proposed method is valuable in the presence of bandit reward model uncertainty, as it avoids stringent case-by-case model design choices, yet provides important regret savings.

Comments:	The software used for this study is publicly available at this https URL
Subjects:	Machine Learning (stat.ML); Machine Learning (cs.LG); Computation (stat.CO)
ACM classes:	I.2.6
Cite as:	arXiv:1808.02932 [stat.ML]
	(or arXiv:1808.02932v4 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.1808.02932

Submission history

From: Iñigo Urteaga [view email]
[v1] Wed, 8 Aug 2018 20:40:15 UTC (1,160 KB)
[v2] Thu, 31 Oct 2019 18:13:54 UTC (5,389 KB)
[v3] Mon, 12 Apr 2021 22:02:51 UTC (19,567 KB)
[v4] Thu, 25 Aug 2022 16:29:14 UTC (25,346 KB)

Statistics > Machine Learning

Title:Nonparametric Gaussian Mixture Models for the Multi-Armed Bandit

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:Nonparametric Gaussian Mixture Models for the Multi-Armed Bandit

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators