Neural Dueling Bandits

Verma, Arun; Dai, Zhongxiang; Lin, Xiaoqiang; Jaillet, Patrick; Low, Bryan Kian Hsiang

Computer Science > Machine Learning

arXiv:2407.17112 (cs)

[Submitted on 24 Jul 2024]

Title:Neural Dueling Bandits

Authors:Arun Verma, Zhongxiang Dai, Xiaoqiang Lin, Patrick Jaillet, Bryan Kian Hsiang Low

View PDF HTML (experimental)

Abstract:Contextual dueling bandit is used to model the bandit problems, where a learner's goal is to find the best arm for a given context using observed noisy preference feedback over the selected arms for the past contexts. However, existing algorithms assume the reward function is linear, which can be complex and non-linear in many real-life applications like online recommendations or ranking web search results. To overcome this challenge, we use a neural network to estimate the reward function using preference feedback for the previously selected arms. We propose upper confidence bound- and Thompson sampling-based algorithms with sub-linear regret guarantees that efficiently select arms in each round. We then extend our theoretical results to contextual bandit problems with binary feedback, which is in itself a non-trivial contribution. Experimental results on the problem instances derived from synthetic datasets corroborate our theoretical results.

Comments:	Accepted at ICML 2024 Workshop on Foundations of Reinforcement Learning and Control
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Cite as:	arXiv:2407.17112 [cs.LG]
	(or arXiv:2407.17112v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2407.17112

Submission history

From: Arun Verma [view email]
[v1] Wed, 24 Jul 2024 09:23:22 UTC (15,238 KB)

Computer Science > Machine Learning

Title:Neural Dueling Bandits

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Neural Dueling Bandits

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators