Learning Zero-Sum Linear Quadratic Games with Improved Sample Complexity and Last-Iterate Convergence

Wu, Jiduan; Barakat, Anas; Fatkhullin, Ilyas; He, Niao

Electrical Engineering and Systems Science > Systems and Control

arXiv:2309.04272 (eess)

[Submitted on 8 Sep 2023 (v1), last revised 31 Oct 2023 (this version, v3)]

Title:Learning Zero-Sum Linear Quadratic Games with Improved Sample Complexity and Last-Iterate Convergence

Authors:Jiduan Wu, Anas Barakat, Ilyas Fatkhullin, Niao He

View PDF

Abstract:Zero-sum Linear Quadratic (LQ) games are fundamental in optimal control and can be used (i)~as a dynamic game formulation for risk-sensitive or robust control and (ii)~as a benchmark setting for multi-agent reinforcement learning with two competing agents in continuous state-control spaces. In contrast to the well-studied single-agent linear quadratic regulator problem, zero-sum LQ games entail solving a challenging nonconvex-nonconcave min-max problem with an objective function that lacks coercivity. Recently, Zhang et al. showed that an~$\epsilon$-Nash equilibrium (NE) of finite horizon zero-sum LQ games can be learned via nested model-free Natural Policy Gradient (NPG) algorithms with poly$(1/\epsilon)$ sample complexity. In this work, we propose a simpler nested Zeroth-Order (ZO) algorithm improving sample complexity by several orders of magnitude and guaranteeing convergence of the last iterate. Our main results are two-fold: (i) in the deterministic setting, we establish the first global last-iterate linear convergence result for the nested algorithm that seeks NE of zero-sum LQ games; (ii) in the model-free setting, we establish a~$\widetilde{\mathcal{O}}(\epsilon^{-2})$ sample complexity using a single-point ZO estimator. For our last-iterate convergence results, our analysis leverages the Implicit Regularization (IR) property and a new gradient domination condition for the primal function. Our key improvements in the sample complexity rely on a more sample-efficient nested algorithm design and a finer control of the ZO natural gradient estimation error utilizing the structure endowed by the finite-horizon setting.

Subjects:	Systems and Control (eess.SY); Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG)
Cite as:	arXiv:2309.04272 [eess.SY]
	(or arXiv:2309.04272v3 [eess.SY] for this version)
	https://doi.org/10.48550/arXiv.2309.04272

Submission history

From: Jiduan Wu [view email]
[v1] Fri, 8 Sep 2023 11:47:31 UTC (5,264 KB)
[v2] Sun, 29 Oct 2023 21:02:23 UTC (5,270 KB)
[v3] Tue, 31 Oct 2023 16:34:09 UTC (5,270 KB)

Electrical Engineering and Systems Science > Systems and Control

Title:Learning Zero-Sum Linear Quadratic Games with Improved Sample Complexity and Last-Iterate Convergence

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Electrical Engineering and Systems Science > Systems and Control

Title:Learning Zero-Sum Linear Quadratic Games with Improved Sample Complexity and Last-Iterate Convergence

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators