Semi-Supervised Imitation Learning of Team Policies from Suboptimal Demonstrations

Seo, Sangwon; Unhelkar, Vaibhav V.

Computer Science > Artificial Intelligence

arXiv:2205.02959v3 (cs)

[Submitted on 5 May 2022 (v1), revised 10 May 2022 (this version, v3), latest version 20 Sep 2022 (v6)]

Title:Semi-Supervised Imitation Learning of Team Policies from Suboptimal Demonstrations

Authors:Sangwon Seo, Vaibhav V. Unhelkar

View PDF

Abstract:We present Bayesian Team Imitation Learner (BTIL), an imitation learning algorithm to model behavior of teams performing sequential tasks in Markovian domains. In contrast to existing multi-agent imitation learning techniques, BTIL explicitly models and infers the time-varying mental states of team members, thereby enabling learning of decentralized team policies from demonstrations of suboptimal teamwork. Further, to allow for sample- and label-efficient policy learning from small datasets, BTIL employs a Bayesian perspective and is capable of learning from semi-supervised demonstrations. We demonstrate and benchmark the performance of BTIL on synthetic multi-agent tasks as well as a novel dataset of human-agent teamwork. Our experiments show that BTIL can successfully learn team policies from demonstrations despite the influence of team members' (time-varying and potentially misaligned) mental states on their behavior.

Comments:	Extended version of an identically-titled paper accepted at IJCAI 2022
Subjects:	Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multiagent Systems (cs.MA)
Cite as:	arXiv:2205.02959 [cs.AI]
	(or arXiv:2205.02959v3 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2205.02959

Submission history

From: Sangwon Seo [view email]
[v1] Thu, 5 May 2022 23:18:32 UTC (307 KB)
[v2] Mon, 9 May 2022 02:53:14 UTC (3,061 KB)
[v3] Tue, 10 May 2022 02:35:43 UTC (3,061 KB)
[v4] Wed, 11 May 2022 02:39:36 UTC (3,061 KB)
[v5] Mon, 13 Jun 2022 21:23:10 UTC (3,062 KB)
[v6] Tue, 20 Sep 2022 02:39:15 UTC (3,062 KB)

Computer Science > Artificial Intelligence

Title:Semi-Supervised Imitation Learning of Team Policies from Suboptimal Demonstrations

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title:Semi-Supervised Imitation Learning of Team Policies from Suboptimal Demonstrations

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators