Computer Science > Computation and Language
[Submitted on 18 Oct 2019 (v1), last revised 1 Feb 2021 (this version, v6)]
Title:RTFM: Generalising to Novel Environment Dynamics via Reading
View PDFAbstract:Obtaining policies that can generalise to new environments in reinforcement learning is challenging. In this work, we demonstrate that language understanding via a reading policy learner is a promising vehicle for generalisation to new environments. We propose a grounded policy learning problem, Read to Fight Monsters (RTFM), in which the agent must jointly reason over a language goal, relevant dynamics described in a document, and environment observations. We procedurally generate environment dynamics and corresponding language descriptions of the dynamics, such that agents must read to understand new environment dynamics instead of memorising any particular information. In addition, we propose txt2$\pi$, a model that captures three-way interactions between the goal, document, and observations. On RTFM, txt2$\pi$ generalises to new environments with dynamics not seen during training via reading. Furthermore, our model outperforms baselines such as FiLM and language-conditioned CNNs on RTFM. Through curriculum learning, txt2$\pi$ produces policies that excel on complex RTFM tasks requiring several reasoning and coreference steps.
Submission history
From: Victor Zhong [view email][v1] Fri, 18 Oct 2019 00:49:15 UTC (2,301 KB)
[v2] Tue, 7 Jan 2020 21:26:05 UTC (7,735 KB)
[v3] Fri, 10 Jan 2020 21:49:27 UTC (2,300 KB)
[v4] Tue, 28 Jan 2020 18:37:02 UTC (2,601 KB)
[v5] Wed, 12 Feb 2020 20:22:15 UTC (2,525 KB)
[v6] Mon, 1 Feb 2021 20:46:03 UTC (2,525 KB)
Current browse context:
cs.CL
References & Citations
Bibliographic and Citation Tools
Bibliographic Explorer (What is the Explorer?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)
Code, Data and Media Associated with this Article
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)
Demos
Recommenders and Search Tools
Influence Flower (What are Influence Flowers?)
Connected Papers (What is Connected Papers?)
CORE Recommender (What is CORE?)
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.