Offline Reinforcement Learning as One Big Sequence Modeling Problem

Janner, Michael; Li, Qiyang; Levine, Sergey

Computer Science > Machine Learning

arXiv:2106.02039 (cs)

[Submitted on 3 Jun 2021 (v1), last revised 29 Nov 2021 (this version, v4)]

Title:Offline Reinforcement Learning as One Big Sequence Modeling Problem

Authors:Michael Janner, Qiyang Li, Sergey Levine

View PDF

Abstract:Reinforcement learning (RL) is typically concerned with estimating stationary policies or single-step models, leveraging the Markov property to factorize problems in time. However, we can also view RL as a generic sequence modeling problem, with the goal being to produce a sequence of actions that leads to a sequence of high rewards. Viewed in this way, it is tempting to consider whether high-capacity sequence prediction models that work well in other domains, such as natural-language processing, can also provide effective solutions to the RL problem. To this end, we explore how RL can be tackled with the tools of sequence modeling, using a Transformer architecture to model distributions over trajectories and repurposing beam search as a planning algorithm. Framing RL as sequence modeling problem simplifies a range of design decisions, allowing us to dispense with many of the components common in offline RL algorithms. We demonstrate the flexibility of this approach across long-horizon dynamics prediction, imitation learning, goal-conditioned RL, and offline RL. Further, we show that this approach can be combined with existing model-free algorithms to yield a state-of-the-art planner in sparse-reward, long-horizon tasks.

Comments:	NeurIPS 2021 (spotlight). Project page and code at: this https URL
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2106.02039 [cs.LG]
	(or arXiv:2106.02039v4 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2106.02039

Submission history

From: Michael Janner [view email]
[v1] Thu, 3 Jun 2021 17:58:51 UTC (18,372 KB)
[v2] Wed, 21 Jul 2021 06:04:33 UTC (5,758 KB)
[v3] Thu, 18 Nov 2021 09:42:36 UTC (6,984 KB)
[v4] Mon, 29 Nov 2021 00:56:52 UTC (6,984 KB)

Computer Science > Machine Learning

Title:Offline Reinforcement Learning as One Big Sequence Modeling Problem

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Offline Reinforcement Learning as One Big Sequence Modeling Problem

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators