article

A three-network architecture for on-line learning and optimization based on adaptive dynamic programming

Authors:

Jian FuAuthors Info & Claims

Neurocomputing, Volume 78, Issue 1

Pages 3 - 13

https://doi.org/10.1016/j.neucom.2011.05.031

Published: 01 February 2012 Publication History

Abstract

In this paper, we propose a novel adaptive dynamic programming (ADP) architecture with three networks, an action network, a critic network, and a reference network, to develop internal goal-representation for online learning and optimization. Unlike the traditional ADP design normally with an action network and a critic network, our approach integrates the third network, a reference network, into the actor-critic design framework to automatically and adaptively build an internal reinforcement signal to facilitate learning and optimization overtime to accomplish goals. We present the detailed design architecture and its associated learning algorithm to explain how effective learning and optimization can be achieved in this new ADP architecture. Furthermore, we test the performance of our architecture both on the cart-pole balancing task and the triple-link inverted pendulum balancing task, which are the popular benchmarks in the community to demonstrate its learning and control performance over time.

References

[1]

Werbos, P.J., Intelligence in the brain: a theory of how it works and how to build it. Neural Netw. 200-212.

[2]

He, H., Self-Adaptive Systems for Machine Intelligence. 2011. Wiley.

[3]

Werbos, P.J., Using ADP to understand and replicate brain intelligence: the next level design. In: IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, pp. 209-216.

[4]

Si, J., Barto, A.G., Powell, W.B. and Wunsch, D.C., Handbook of Learning and Approximate Dynamic Programming. 2004. IEEE Press.

[5]

Prokhorov, D.V. and Wunsch, D.C., Adaptive critic designs. IEEE Trans. Neural Netw. v8 i5. 997-1007.

[6]

White, D.A. and Sofge, D.A., Handbook of Intelligent Control. 1992. Van Nostrand, New York.

[7]

Powell, W.B., Approximate Dynamic Programming: Solving the Curses of Dimensionality. 2007. Wiley-Interscience.

[8]

Liu, D. and Jin, N., Adaptive dynamic programming for discrete-time systems with infinite horizon and epsilon-error bound in the performance cost. In: Proceedings of the IEEE International Conference on Neural Networks,

[9]

Wang, F.Y., Zhang, H. and Liu, D., Adaptive dynamic programming: an introduction. IEEE Comput. Intel. Mag. v4 i2. 39-47.

[10]

Vamvoudakis, K. and Lewis, F.L., Online actor critic algorithm to solve the continuous-time infinite horizon optimal control problem. In: Proceedings of the IEEE International Conference on Neural Networks,

[11]

Balakrishnan, S.N., Ding, J. and Lewis, F.L., Issues on stability of ADP feedback controllers for dynamical systems. IEEE Trans. Syst. Man Cybern., Part B. v38 i4. 913-917.

[12]

Al-Tamimi, A., Abu-Khalaf, M. and Lewis, F.L., Adaptive critic designs for discrete-time zero-sum games with application to h-infinity control. IEEE Trans. Syst. Man Cybern. Part B. v37 i1. 240-247.

[13]

Venayagamoorthy, G.K. and Harley, R.G., Handbook of learning and approximate dynamic programming. In: Application of Approximate Dynamic Programming in Power System Control, IEEE Press. pp. 479-515.

[14]

Ray, S., Venayagamoorthy, G.K., Chaudhuri, B. and Majumder, R., Comparison of adaptive critics and classical approaches based wide area controllers for a power system. IEEE Trans. Syst. Man Cybern. Part B. v38 i4. 1002-1007.

[15]

Zhang, H.G., Luo, Y.H. and Liu, D., Neural-network-based near-optimal control for a class of discrete-time affine nonlinear systems with control constraints. IEEE Trans. Neural Netw. v20 i9. 1490-1503.

[16]

Wang, F.Y., Jin, N., Liu, D. and Wei, Q., Adaptive dynamic programming for finite-horizon optimal control of discrete-time nonlinear systems with ¿-error bound. IEEE Trans. Neural Netw. v22 i1. 24-36.

[17]

Liu, D., Zhang, Y. and Zhang, H.G., A self-learning call admission control scheme for CDMA cellular networks. IEEE Trans. Neural Netw. v16 i5. 1219-1228.

[18]

He, H., Fu, J. and Zhou, X., Adaptive learning and control for MIMO system based on adaptive dynamic programming. IEEE Trans. Neural Netw. v22 i7. 1133-1148.

[19]

Bellman, R.E., Dynamic Programming. 1957. Princeton University Press, Princeton, NJ.

[20]

Werbos, P.J., Backpropagation through time: what it does and how to do it. In: Proc/ IEEE, vol. 78. pp. 1550-1560.

[21]

Werbos, P.J., Backpropagation: basics and new developments. In: The Handbook of Brain Theory and Neural Networks, MIT Press, Cambridge, MA. pp. 134-139.

[22]

Ferrari, S. and Stengel, R.F., Model-based adaptive critic designs. In: Handbook of Learning and Approximate Dynamic Programming, IEEE Press.

[23]

Werbos, P.J., Neuralcontrol and supervised learning. In: Handbook of Intelligent Control, Van Nostrand, New York.

[24]

Si, J. and Wang, Y.T., On-line learning control by association and reinforcement. IEEE Trans. Neural Netw. v12 i2. 264-276.

[25]

Sutton, R.S. and Barto, A.G., Reinforcement Learning: An Introduction. 1998. MIT Press, Cambridge, MA.

[26]

Werbos, P.J., Applications of advances in nonlinear sensitivity analysis. In: System Modeling and Optimization,

[27]

P.J. Werbos, Stable adaptive control using new critic designs," {online}, available: {http://arxiv.org as adap-org/9810001}, 2008.

[28]

Enns, R. and Si, J., Helicopter flight control using direct neural dynamic programming. In: Handbook of Learning and Approximate Dynamic Programming, IEEE Press. pp. 535-559.

[29]

Si, J. and Liu, D., Direct neural dynamic programming. In: Handbook of Learning and Approximate Dynamic Programming, IEEE Press. pp. 125-151.

[30]

Eltohamy, K.D. and Kuo, C.-Y., Nonlinear optimal control of a triple link inverted pendulum with single control input. Int. J. Contr. v69 i2. 239-256.

[31]

Ni, Z., He, H., Prokhorov, D.V. and Fu, J., An online actor-critic learning approach with Levenberg-Marquardt algorithm. In: Proceedings of the International Joint Conference on Neural Networks (IJCNN'11),

Cited By

Xia XLi CZhang TFang Y(2024)Finite‐time optimal control for uncertain strict‐feedback nonlinear systems with input saturation and output constraintsInternational Journal of Adaptive Control and Signal Processing10.1002/acs.371438:2(580-603)Online publication date: 1-Feb-2024
https://dl.acm.org/doi/10.1002/acs.3714
Zhao MWang DQiao JHa MRen J(2023)Advanced value iteration for discrete-time intelligent critic control: A surveyArtificial Intelligence Review10.1007/s10462-023-10497-156:10(12315-12346)Online publication date: 21-May-2023
https://dl.acm.org/doi/10.1007/s10462-023-10497-1
Meleshenko PNesterov VSemenov MSolovyov ASypalo K(2022)Stabilization of a System of Unstable Pendulums: Discrete and Continuous CaseJournal of Computer and Systems Sciences International10.1134/S106423072202011361:2(135-154)Online publication date: 1-Apr-2022
https://dl.acm.org/doi/10.1134/S1064230722020113
Show More Cited By

Recommendations

Adaptive cruise control via adaptive dynamic programming with experience replay

The adaptive cruise control (ACC) problem can be transformed to an optimal tracking control problem for complex nonlinear systems. In this paper, a novel highly efficient model-free adaptive dynamic programming (ADP) approach with experience replay ...
Reinforcement learning based interconnection routing for adaptive traffic optimization
NOCS '19: Proceedings of the 13th IEEE/ACM International Symposium on Networks-on-Chip

Applying Machine Learning (ML) techniques to design and optimize computer architectures is a promising research direction. Optimizing the runtime performance of a Network-on-Chip (NoC) necessitates a continuous learning framework. In this work, we ...
Explanation-Based Learning and Reinforcement Learning: A Unified View

In speedup-learning problems, where full descriptions of operators are known, both explanation-based learning (EBL) and reinforcement learning (RL) methods can be applied. This paper shows that both methods involve fundamentally the same process of ...

Comments

Information & Contributors

Information

Published In

Copyright © Elsevier B.V. © 2011.

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 01 February 2012

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

25
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 17 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Xia XLi CZhang TFang Y(2024)Finite‐time optimal control for uncertain strict‐feedback nonlinear systems with input saturation and output constraintsInternational Journal of Adaptive Control and Signal Processing10.1002/acs.371438:2(580-603)Online publication date: 1-Feb-2024
https://dl.acm.org/doi/10.1002/acs.3714
Zhao MWang DQiao JHa MRen J(2023)Advanced value iteration for discrete-time intelligent critic control: A surveyArtificial Intelligence Review10.1007/s10462-023-10497-156:10(12315-12346)Online publication date: 21-May-2023
https://dl.acm.org/doi/10.1007/s10462-023-10497-1
Meleshenko PNesterov VSemenov MSolovyov ASypalo K(2022)Stabilization of a System of Unstable Pendulums: Discrete and Continuous CaseJournal of Computer and Systems Sciences International10.1134/S106423072202011361:2(135-154)Online publication date: 1-Apr-2022
https://dl.acm.org/doi/10.1134/S1064230722020113
Wang DHa MZhao M(2022)The intelligent critic framework for advanced optimal controlArtificial Intelligence Review10.1007/s10462-021-10118-955:1(1-22)Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.1007/s10462-021-10118-9
Khater AEl-Nagar AEl-Bardini MEl-Rabaie N(2020)Online learning based on adaptive learning rate for a class of recurrent fuzzy neural networkNeural Computing and Applications10.1007/s00521-019-04372-w32:12(8691-8710)Online publication date: 1-Jun-2020
https://dl.acm.org/doi/10.1007/s00521-019-04372-w
Jiang HZhang HXiao GCui X(2018)Data-based approximate optimal control for nonzero-sum games of multi-player systems using adaptive dynamic programmingNeurocomputing10.5555/3198485.3198823275:C(192-199)Online publication date: 31-Jan-2018
https://dl.acm.org/doi/10.5555/3198485.3198823
Jiang HZhang HZhang KCui X(2018)Data-driven adaptive dynamic programming schemes for non-zero-sum games of unknown discrete-time nonlinear systemsNeurocomputing10.1016/j.neucom.2017.09.020275:C(649-658)Online publication date: 31-Jan-2018
https://dl.acm.org/doi/10.1016/j.neucom.2017.09.020
Liang YZhang HXiao GJiang H(2018)Reinforcement learning-based online adaptive controller design for a class of unknown nonlinear discrete-time systems with time delaysNeural Computing and Applications10.1007/s00521-018-3537-730:6(1733-1745)Online publication date: 1-Sep-2018
https://dl.acm.org/doi/10.1007/s00521-018-3537-7
Jiang YYang CNa JLi GLi YZhong J(2017)A Brief Review of Neural Networks Based Learning and Control and Their Applications for RobotsComplexity10.1155/2017/18958972017Online publication date: 1-Jan-2017
https://dl.acm.org/doi/10.1155/2017/1895897
Malla NNi Z(2017)A new history experience replay design for model-free adaptive dynamic programmingNeurocomputing10.1016/j.neucom.2017.04.069266:C(141-149)Online publication date: 29-Nov-2017
https://dl.acm.org/doi/10.1016/j.neucom.2017.04.069
Show More Cited By

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents