Branching Reinforcement Learning

Du, Yihan; Chen, Wei

Computer Science > Machine Learning

arXiv:2202.07995 (cs)

[Submitted on 16 Feb 2022 (v1), last revised 15 Jun 2022 (this version, v2)]

Title:Branching Reinforcement Learning

Authors:Yihan Du, Wei Chen

View PDF

Abstract:In this paper, we propose a novel Branching Reinforcement Learning (Branching RL) model, and investigate both Regret Minimization (RM) and Reward-Free Exploration (RFE) metrics for this model. Unlike standard RL where the trajectory of each episode is a single $H$-step path, branching RL allows an agent to take multiple base actions in a state such that transitions branch out to multiple successor states correspondingly, and thus it generates a tree-structured trajectory. This model finds important applications in hierarchical recommendation systems and online advertising. For branching RL, we establish new Bellman equations and key lemmas, i.e., branching value difference lemma and branching law of total variance, and also bound the total variance by only $O(H^2)$ under an exponentially-large trajectory. For RM and RFE metrics, we propose computationally efficient algorithms BranchVI and BranchRFE, respectively, and derive nearly matching upper and lower bounds. Our results are only polynomial in problem parameters despite exponentially-large trajectories.

Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2202.07995 [cs.LG]
	(or arXiv:2202.07995v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2202.07995

Submission history

From: Yihan Du [view email]
[v1] Wed, 16 Feb 2022 11:19:03 UTC (1,187 KB)
[v2] Wed, 15 Jun 2022 13:10:48 UTC (2,089 KB)

Computer Science > Machine Learning

Title:Branching Reinforcement Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Branching Reinforcement Learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators