Unifying Behavioral and Response Diversity for Open-ended Learning in Zero-sum Games

Liu, Xiangyu; Jia, Hangtian; Wen, Ying; Yang, Yaodong; Hu, Yujing; Chen, Yingfeng; Fan, Changjie; Hu, Zhipeng

Computer Science > Multiagent Systems

arXiv:2106.04958 (cs)

[Submitted on 9 Jun 2021 (v1), last revised 10 Jun 2021 (this version, v2)]

Title:Unifying Behavioral and Response Diversity for Open-ended Learning in Zero-sum Games

Authors:Xiangyu Liu, Hangtian Jia, Ying Wen, Yaodong Yang, Yujing Hu, Yingfeng Chen, Changjie Fan, Zhipeng Hu

View PDF

Abstract:Measuring and promoting policy diversity is critical for solving games with strong non-transitive dynamics where strategic cycles exist, and there is no consistent winner (e.g., Rock-Paper-Scissors). With that in mind, maintaining a pool of diverse policies via open-ended learning is an attractive solution, which can generate auto-curricula to avoid being exploited. However, in conventional open-ended learning algorithms, there are no widely accepted definitions for diversity, making it hard to construct and evaluate the diverse policies. In this work, we summarize previous concepts of diversity and work towards offering a unified measure of diversity in multi-agent open-ended learning to include all elements in Markov games, based on both Behavioral Diversity (BD) and Response Diversity (RD). At the trajectory distribution level, we re-define BD in the state-action space as the discrepancies of occupancy measures. For the reward dynamics, we propose RD to characterize diversity through the responses of policies when encountering different opponents. We also show that many current diversity measures fall in one of the categories of BD or RD but not both. With this unified diversity measure, we design the corresponding diversity-promoting objective and population effectivity when seeking the best responses in open-ended learning. We validate our methods in both relatively simple games like matrix game, non-transitive mixture model, and the complex \textit{Google Research Football} environment. The population found by our methods reveals the lowest exploitability, highest population effectivity in matrix game and non-transitive mixture model, as well as the largest goal difference when interacting with opponents of various levels in \textit{Google Research Football}.

Comments:	We investigate a new perspective on unifying diversity measures for open-ended learning in zero-sum games, which shapes an auto-curriculum to induce diverse yet effective behaviors
Subjects:	Multiagent Systems (cs.MA); Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG)
Cite as:	arXiv:2106.04958 [cs.MA]
	(or arXiv:2106.04958v2 [cs.MA] for this version)
	https://doi.org/10.48550/arXiv.2106.04958

Submission history

From: Ying Wen [view email]
[v1] Wed, 9 Jun 2021 10:11:06 UTC (1,259 KB)
[v2] Thu, 10 Jun 2021 16:00:18 UTC (1,440 KB)

Computer Science > Multiagent Systems

Title:Unifying Behavioral and Response Diversity for Open-ended Learning in Zero-sum Games

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Multiagent Systems

Title:Unifying Behavioral and Response Diversity for Open-ended Learning in Zero-sum Games

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators