Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Learning‐based control for discrete‐time constrained nonzero‐sum games

Published: 10 March 2021 Publication History

Abstract

A generalized policy‐iteration‐based solution to a class of discrete‐time multi‐player non‐zero‐sum games concerning the control constraints was proposed. Based on initial admissible control policies, the iterative value function of each player converges to the optimum approximately, which is structured by the iterative control policies satisfying the Nash equilibrium. Afterwards, the stability analysis is shown to illustrate that the iterative control policies can stabilize the system and minimize the performance index function of each player. Meanwhile, neural networks are implemented to approximate the iterative control policies and value functions with the impact of control constraints. Finally, two numerical simulations of the discrete‐time two‐player non‐zero‐sum games for linear and non‐linear systems are shown to illustrate the effectiveness of the proposed scheme.

References

[1]
Lewis, F.L., Vrabie, D.: Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circ. Syst. Mag. 9(3), 32–50 (2009)
[2]
Modares, H., Lewis, F.L., Naghibi, S.M.B.: Integral reinforcement learning and experience replay for adaptive optimal control of partially‐unknown constrained‐input continuous‐time systems. Automatica. 50(1), 193–202 (2014)
[3]
Bhatnagar, S., et al.: Natural actor‐ritic algorithms. Automatica. 45(11), 2471–2482 (2009)
[4]
Bai, W., et al.: Adaptive reinforcement learning neural network control for uncertain nonlinear system with input saturation. IEEE Trans. Cyber. 50(8), 3433–3443 (2019)
[5]
Werbos, P.: Stable adaptive control using new critic designs. Proc. SPIE ‐ Int. Soc. Opt. Eng. 3728 (1998)
[6]
Jiang, Y., Jiang, Z.: Global adaptive dynamic programming for continuous‐time nonlinear systems. IEEE Trans. Auto. Control. 60(11), 2917–2929 (2015)
[7]
Yang, X., et al.: Adaptive dynamic programming for robust neural control of unknown continuous‐time non‐linear systems. IET Contr. Theor. App. 11(14), 2307–2316 (2017)
[8]
Sutton, R., Barto, A.: Reinforcement Learning: An Introduction. MIT Press (1998)
[9]
Lewis, F.L., Vrabie, D., Vamvoudakis, K.G.: Reinforcement learning and feedback control: Using natural decision methods to design optimal adaptive controllers. IEEE Contr. Syst. Mag. 32(6), 76–105 (2012)
[10]
Zhang, H., et al.: Discrete‐time nonzero‐sum games for multiplayer using policy‐iteration‐based adaptive dynamic programming algorithms. IEEE Trans. Cyber. 47(10), 3331–3340 (2017)
[11]
Liu, D., Wei, Q., Yan, P.: Generalized policy iteration adaptive dynamic programming for discrete‐time nonlinear systems. IEEE Trans. Syst. Man Cyber.: Syst. 45(12), 1577–1591 (2015)
[12]
Vamvoudakis, K.G., Lewis, F.L., Hudas, G.R.: Multi‐agent differential graphical games: Online adaptive learning solution for synchronization with optimality. Automatica. 48(8), 1598–1611 (2012)
[13]
Mu, C., Zhao, Q., Sun, C.: Optimal model‐free output synchronization of heterogeneous multi‐agent systems under switching topologies. IEEE Trans. Ind. Electr. 67(12), 10951–10964 (2019). https://doi.org/10.1109/TIE.2019.2958277
[14]
Mu, C., et al.: Cooperative differential game‐based optimal control and its application to power systems. IEEE Trans. Ind. Inf. (16(8), 5169–5179 2019). https://doi.org/10.1109/TII.2019.2955966. in press
[15]
Wei, Q., Song, R., Yan, P.: Data‐driven zero‐sum neuro‐optimal control for a class of continuous‐time unknown nonlinear systems with disturbance using ADP. IEEE Trans. Neural Netw. Learn. Syst. 27(2), 444–458 (2016)
[16]
Vrabie, D., Lewis, F.: Integral reinforcement learning for online computation of feedback Nash strategies of nonzero‐sum differential games. In: 49th IEEE Conference on Decision and Control (CDC), pp. 3066–3071 (2010)
[17]
Vamvoudakis, K.G., Lewis, F.L.: Multi‐player non‐zero‐sum games: Online adaptive learning solution of coupled Hamilton‐Jacobi equations. Automatica. 47(8), 1556–1569 (2011)
[18]
Zhang, H., Cui, L., Luo, Y.: Near‐optimal control for nonzero‐sum differential games of continuous‐time nonlinear systems using single‐network ADP. IEEE Trans. Cyber. 43(1), 206–216 (2013)
[19]
Song, R., Lewis, F.L., Wei, Q.: Off‐policy integral reinforcement learning method to solve nonlinear continuous‐time multiplayer nonzero‐sum games. IEEE Trans. Neural Netw. Learn. Syst. 28(3), 704–713 (2017)
[20]
Zhao, D., et al.: Experience replay for optimal control of nonzero‐sum game systems with unknown dynamics. IEEE Trans. Cyber. 46(3), 854–865 (2016)
[21]
Abu.Khalaf, M., Lewis, F.L.: Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach. Automatica. 41(5), 779–791 (2005)
[22]
Mu, C., Wang, K., Sun, C.: Policy‐iteration‐based learning for nonlinear player game systems with constrained inputs. IEEE Trans. Syst. Man Cyber.: Syst. (2019). https://doi.org/10.1109/TSMC.2019.2962629. in press
[23]
Modares, H., Lewis, F.L., Naghibi‐Sistani, M.: Adaptive optimal control of unknown constrained‐input systems using policy iteration and neural networks. IEEE Trans. Neural Netw. Learn. Syst. 24(10), 1513–1525 (2013)
[24]
Al‐Tamimi, A., Lewis, F.L., Abu‐Khalaf, M.: Discrete‐time nonlinear HJB solution using approximate dynamic programming: Convergence proof. IEEE Trans. Syst. Man Cyber. Part B. 38(4), 943–949 (2008)
[25]
Chen, J., et al.: Gradient‐based particle filter algorithm for an ARX model with nonlinear communication output. IEEE Trans. Syst. Man Cyber.: Syst. 50(6), 2198–2207 (2018). https://doi.org/10.1109/TSMC.2018.2810277
[26]
Freiling, G., Jank, G., Abou‐Kandil, H.: On global existence of solutions to coupled matrix riccati equations in closed‐loop Nash games. IEEE Trans. Auto. Contr. 41(2), 264–269 (1996)
[27]
Schmitendorf, W.E.: Designing stabilizing controllers for uncertain systems using the Riccati equation approach. IEEE Trans. Auto. Contr. 33(4), 376–379 (1988)
[28]
Si, J., Wang, Y‐T: Online learning control by association and reinforcement. IEEE Trans. Neural Netw. 12(2), 264–276 (2001)

Recommendations

Comments

Information & Contributors

Information

Published In

cover image CAAI Transactions on Intelligence Technology
CAAI Transactions on Intelligence Technology  Volume 6, Issue 2
June 2021
118 pages
EISSN:2468-2322
DOI:10.1049/cit2.v6.2
Issue’s Table of Contents
This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.

Publisher

John Wiley & Sons, Inc.

United States

Publication History

Published: 10 March 2021

Author Tags

  1. approximation theory
  2. discrete time systems
  3. dynamic programming
  4. game theory
  5. iterative methods
  6. learning (artificial intelligence)
  7. linear systems
  8. optimal control
  9. performance index
  10. stability

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 22 Sep 2024

Other Metrics

Citations

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media