On Reward-Free RL with Kernel and Neural Function Approximations: Single-Agent MDP and Markov Game

Qiu, Shuang; Ye, Jieping; Wang, Zhaoran; Yang, Zhuoran

Computer Science > Machine Learning

arXiv:2110.09771 (cs)

[Submitted on 19 Oct 2021 (v1), last revised 13 Feb 2022 (this version, v2)]

Title:On Reward-Free RL with Kernel and Neural Function Approximations: Single-Agent MDP and Markov Game

Authors:Shuang Qiu, Jieping Ye, Zhaoran Wang, Zhuoran Yang

View PDF

Abstract:To achieve sample efficiency in reinforcement learning (RL), it necessitates efficiently exploring the underlying environment. Under the offline setting, addressing the exploration challenge lies in collecting an offline dataset with sufficient coverage. Motivated by such a challenge, we study the reward-free RL problem, where an agent aims to thoroughly explore the environment without any pre-specified reward function. Then, given any extrinsic reward, the agent computes the policy via a planning algorithm with offline data collected in the exploration phase. Moreover, we tackle this problem under the context of function approximation, leveraging powerful function approximators.
Specifically, we propose to explore via an optimistic variant of the value-iteration algorithm incorporating kernel and neural function approximations, where we adopt the associated exploration bonus as the exploration reward. Moreover, we design exploration and planning algorithms for both single-agent MDPs and zero-sum Markov games and prove that our methods can achieve $\widetilde{\mathcal{O}}(1 /\varepsilon^2)$ sample complexity for generating a $\varepsilon$-suboptimal policy or $\varepsilon$-approximate Nash equilibrium when given an arbitrary extrinsic reward. To the best of our knowledge, we establish the first provably efficient reward-free RL algorithm with kernel and neural function approximators.

Comments:	ICML 2021
Subjects:	Machine Learning (cs.LG); Computer Science and Game Theory (cs.GT); Machine Learning (stat.ML)
Cite as:	arXiv:2110.09771 [cs.LG]
	(or arXiv:2110.09771v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2110.09771

Submission history

From: Shuang Qiu [view email]
[v1] Tue, 19 Oct 2021 07:26:33 UTC (76 KB)
[v2] Sun, 13 Feb 2022 17:47:08 UTC (77 KB)

Computer Science > Machine Learning

Title:On Reward-Free RL with Kernel and Neural Function Approximations: Single-Agent MDP and Markov Game

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:On Reward-Free RL with Kernel and Neural Function Approximations: Single-Agent MDP and Markov Game

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators