Conformal Off-Policy Prediction in Contextual Bandits

Taufiq, Muhammad Faaiz; Ton, Jean-Francois; Cornish, Rob; Teh, Yee Whye; Doucet, Arnaud

Statistics > Machine Learning

arXiv:2206.04405v1 (stat)

[Submitted on 9 Jun 2022 (this version), latest version 26 Oct 2022 (v2)]

Title:Conformal Off-Policy Prediction in Contextual Bandits

Authors:Muhammad Faaiz Taufiq, Jean-Francois Ton, Rob Cornish, Yee Whye Teh, Arnaud Doucet

View PDF

Abstract:Most off-policy evaluation methods for contextual bandits have focused on the expected outcome of a policy, which is estimated via methods that at best provide only asymptotic guarantees. However, in many applications, the expectation may not be the best measure of performance as it does not capture the variability of the outcome. In addition, particularly in safety-critical settings, stronger guarantees than asymptotic correctness may be required. To address these limitations, we consider a novel application of conformal prediction to contextual bandits. Given data collected under a behavioral policy, we propose \emph{conformal off-policy prediction} (COPP), which can output reliable predictive intervals for the outcome under a new target policy. We provide theoretical finite-sample guarantees without making any additional assumptions beyond the standard contextual bandit setup, and empirically demonstrate the utility of COPP compared with existing methods on synthetic and real-world data.

Subjects:	Machine Learning (stat.ML); Machine Learning (cs.LG)
Cite as:	arXiv:2206.04405 [stat.ML]
	(or arXiv:2206.04405v1 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.2206.04405

Submission history

From: Jean-Francois Ton [view email]
[v1] Thu, 9 Jun 2022 10:39:33 UTC (3,570 KB)
[v2] Wed, 26 Oct 2022 11:47:19 UTC (3,394 KB)

Statistics > Machine Learning

Title:Conformal Off-Policy Prediction in Contextual Bandits

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:Conformal Off-Policy Prediction in Contextual Bandits

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators