Computer Science > Machine Learning
[Submitted on 19 Jun 2021 (v1), revised 10 Oct 2021 (this version, v2), latest version 11 Feb 2022 (v3)]
Title:More Efficient Adversarial Imitation Learning Algorithms With Known and Unknown Transitions
View PDFAbstract:In this work, we design provably (more) efficient imitation learning algorithms that directly optimize policies from expert demonstrations. Firstly, when the transition function is known, we build on the nearly minimax optimal algorithm MIMIC-MD and relax a projection operator in it. Based on this change, we develop an adversarial imitation learning (AIL) algorithm named \emph{TAIL} with a gradient-based optimization procedure. Accordingly, TAIL has the same sample complexity (i.e., the number of expert trajectories) $\widetilde{\mathcal{O}}(H^{3/2} |\mathcal{S}|/\varepsilon)$ with MIMIC-MD, where $H$ is the planning horizon, $|\mathcal{S}|$ is the state space size and $\varepsilon$ is desired policy value gap. In addition, TAIL is more practical than MIMIC-MD as the former has a space complexity $\mathcal{O} (|\mathcal{S}||\mathcal{A}|H)$ while the latter's is about $\mathcal{O} (|\mathcal{S}|^2 |\mathcal{A}|^2 H^2)$. Secondly, under the scenario where the transition function is unknown but the interaction is allowed, we present an extension of TAIL named \emph{MB-TAIL}. The sample complexity of MB-TAIL is still $\widetilde{\mathcal{O}}(H^{3/2} |\mathcal{S}|/\varepsilon)$ while the interaction complexity (i.e., the number of interaction episodes) is $\widetilde{\mathcal{O}} (H^3 |\mathcal{S}|^2 |\mathcal{A}| / \varepsilon^2)$. In particular, MB-TAIL is significantly better than the best-known OAL algorithm, which has a sample complexity $\widetilde{\mathcal{O}}(H^{2} |\mathcal{S}|/\varepsilon^2)$ and interaction complexity $\widetilde{\mathcal{O}} (H^4 |\mathcal{S}|^2 |\mathcal{A}| / \varepsilon^2)$. The advances in MB-TAIL are based on a new framework that connects reward-free exploration and AIL. To our understanding, MB-TAIL is the first algorithm that shifts the advances in the known transition setting to the unknown transition setting.
Submission history
From: Yang Yu [view email][v1] Sat, 19 Jun 2021 04:41:33 UTC (663 KB)
[v2] Sun, 10 Oct 2021 07:59:45 UTC (679 KB)
[v3] Fri, 11 Feb 2022 03:07:57 UTC (1,033 KB)
References & Citations
Bibliographic and Citation Tools
Bibliographic Explorer (What is the Explorer?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)
Code, Data and Media Associated with this Article
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)
Demos
Recommenders and Search Tools
Influence Flower (What are Influence Flowers?)
Connected Papers (What is Connected Papers?)
CORE Recommender (What is CORE?)
IArxiv Recommender
(What is IArxiv?)
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.