Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

A three-network architecture for on-line learning and optimization based on adaptive dynamic programming

Published: 01 February 2012 Publication History

Abstract

In this paper, we propose a novel adaptive dynamic programming (ADP) architecture with three networks, an action network, a critic network, and a reference network, to develop internal goal-representation for online learning and optimization. Unlike the traditional ADP design normally with an action network and a critic network, our approach integrates the third network, a reference network, into the actor-critic design framework to automatically and adaptively build an internal reinforcement signal to facilitate learning and optimization overtime to accomplish goals. We present the detailed design architecture and its associated learning algorithm to explain how effective learning and optimization can be achieved in this new ADP architecture. Furthermore, we test the performance of our architecture both on the cart-pole balancing task and the triple-link inverted pendulum balancing task, which are the popular benchmarks in the community to demonstrate its learning and control performance over time.

References

[1]
Werbos, P.J., Intelligence in the brain: a theory of how it works and how to build it. Neural Netw. 200-212.
[2]
He, H., Self-Adaptive Systems for Machine Intelligence. 2011. Wiley.
[3]
Werbos, P.J., Using ADP to understand and replicate brain intelligence: the next level design. In: IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, pp. 209-216.
[4]
Si, J., Barto, A.G., Powell, W.B. and Wunsch, D.C., Handbook of Learning and Approximate Dynamic Programming. 2004. IEEE Press.
[5]
Prokhorov, D.V. and Wunsch, D.C., Adaptive critic designs. IEEE Trans. Neural Netw. v8 i5. 997-1007.
[6]
White, D.A. and Sofge, D.A., Handbook of Intelligent Control. 1992. Van Nostrand, New York.
[7]
Powell, W.B., Approximate Dynamic Programming: Solving the Curses of Dimensionality. 2007. Wiley-Interscience.
[8]
Liu, D. and Jin, N., Adaptive dynamic programming for discrete-time systems with infinite horizon and epsilon-error bound in the performance cost. In: Proceedings of the IEEE International Conference on Neural Networks,
[9]
Wang, F.Y., Zhang, H. and Liu, D., Adaptive dynamic programming: an introduction. IEEE Comput. Intel. Mag. v4 i2. 39-47.
[10]
Vamvoudakis, K. and Lewis, F.L., Online actor critic algorithm to solve the continuous-time infinite horizon optimal control problem. In: Proceedings of the IEEE International Conference on Neural Networks,
[11]
Balakrishnan, S.N., Ding, J. and Lewis, F.L., Issues on stability of ADP feedback controllers for dynamical systems. IEEE Trans. Syst. Man Cybern., Part B. v38 i4. 913-917.
[12]
Al-Tamimi, A., Abu-Khalaf, M. and Lewis, F.L., Adaptive critic designs for discrete-time zero-sum games with application to h-infinity control. IEEE Trans. Syst. Man Cybern. Part B. v37 i1. 240-247.
[13]
Venayagamoorthy, G.K. and Harley, R.G., Handbook of learning and approximate dynamic programming. In: Application of Approximate Dynamic Programming in Power System Control, IEEE Press. pp. 479-515.
[14]
Ray, S., Venayagamoorthy, G.K., Chaudhuri, B. and Majumder, R., Comparison of adaptive critics and classical approaches based wide area controllers for a power system. IEEE Trans. Syst. Man Cybern. Part B. v38 i4. 1002-1007.
[15]
Zhang, H.G., Luo, Y.H. and Liu, D., Neural-network-based near-optimal control for a class of discrete-time affine nonlinear systems with control constraints. IEEE Trans. Neural Netw. v20 i9. 1490-1503.
[16]
Wang, F.Y., Jin, N., Liu, D. and Wei, Q., Adaptive dynamic programming for finite-horizon optimal control of discrete-time nonlinear systems with ¿-error bound. IEEE Trans. Neural Netw. v22 i1. 24-36.
[17]
Liu, D., Zhang, Y. and Zhang, H.G., A self-learning call admission control scheme for CDMA cellular networks. IEEE Trans. Neural Netw. v16 i5. 1219-1228.
[18]
He, H., Fu, J. and Zhou, X., Adaptive learning and control for MIMO system based on adaptive dynamic programming. IEEE Trans. Neural Netw. v22 i7. 1133-1148.
[19]
Bellman, R.E., Dynamic Programming. 1957. Princeton University Press, Princeton, NJ.
[20]
Werbos, P.J., Backpropagation through time: what it does and how to do it. In: Proc/ IEEE, vol. 78. pp. 1550-1560.
[21]
Werbos, P.J., Backpropagation: basics and new developments. In: The Handbook of Brain Theory and Neural Networks, MIT Press, Cambridge, MA. pp. 134-139.
[22]
Ferrari, S. and Stengel, R.F., Model-based adaptive critic designs. In: Handbook of Learning and Approximate Dynamic Programming, IEEE Press.
[23]
Werbos, P.J., Neuralcontrol and supervised learning. In: Handbook of Intelligent Control, Van Nostrand, New York.
[24]
Si, J. and Wang, Y.T., On-line learning control by association and reinforcement. IEEE Trans. Neural Netw. v12 i2. 264-276.
[25]
Sutton, R.S. and Barto, A.G., Reinforcement Learning: An Introduction. 1998. MIT Press, Cambridge, MA.
[26]
Werbos, P.J., Applications of advances in nonlinear sensitivity analysis. In: System Modeling and Optimization,
[27]
P.J. Werbos, Stable adaptive control using new critic designs," {online}, available: {http://arxiv.org as adap-org/9810001}, 2008.
[28]
Enns, R. and Si, J., Helicopter flight control using direct neural dynamic programming. In: Handbook of Learning and Approximate Dynamic Programming, IEEE Press. pp. 535-559.
[29]
Si, J. and Liu, D., Direct neural dynamic programming. In: Handbook of Learning and Approximate Dynamic Programming, IEEE Press. pp. 125-151.
[30]
Eltohamy, K.D. and Kuo, C.-Y., Nonlinear optimal control of a triple link inverted pendulum with single control input. Int. J. Contr. v69 i2. 239-256.
[31]
Ni, Z., He, H., Prokhorov, D.V. and Fu, J., An online actor-critic learning approach with Levenberg-Marquardt algorithm. In: Proceedings of the International Joint Conference on Neural Networks (IJCNN'11),

Cited By

View all
  • (2024)Finite‐time optimal control for uncertain strict‐feedback nonlinear systems with input saturation and output constraintsInternational Journal of Adaptive Control and Signal Processing10.1002/acs.371438:2(580-603)Online publication date: 1-Feb-2024
  • (2023)Advanced value iteration for discrete-time intelligent critic control: A surveyArtificial Intelligence Review10.1007/s10462-023-10497-156:10(12315-12346)Online publication date: 21-May-2023
  • (2022)Stabilization of a System of Unstable Pendulums: Discrete and Continuous CaseJournal of Computer and Systems Sciences International10.1134/S106423072202011361:2(135-154)Online publication date: 1-Apr-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 01 February 2012

Author Tags

  1. Actor-critic design
  2. Adaptive dynamic programming
  3. Goal representation
  4. Multi-state optimization
  5. Online learning and control
  6. Reinforcement learning
  7. Three-network architecture

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Finite‐time optimal control for uncertain strict‐feedback nonlinear systems with input saturation and output constraintsInternational Journal of Adaptive Control and Signal Processing10.1002/acs.371438:2(580-603)Online publication date: 1-Feb-2024
  • (2023)Advanced value iteration for discrete-time intelligent critic control: A surveyArtificial Intelligence Review10.1007/s10462-023-10497-156:10(12315-12346)Online publication date: 21-May-2023
  • (2022)Stabilization of a System of Unstable Pendulums: Discrete and Continuous CaseJournal of Computer and Systems Sciences International10.1134/S106423072202011361:2(135-154)Online publication date: 1-Apr-2022
  • (2022)The intelligent critic framework for advanced optimal controlArtificial Intelligence Review10.1007/s10462-021-10118-955:1(1-22)Online publication date: 1-Jan-2022
  • (2020)Online learning based on adaptive learning rate for a class of recurrent fuzzy neural networkNeural Computing and Applications10.1007/s00521-019-04372-w32:12(8691-8710)Online publication date: 1-Jun-2020
  • (2018)Data-based approximate optimal control for nonzero-sum games of multi-player systems using adaptive dynamic programmingNeurocomputing10.5555/3198485.3198823275:C(192-199)Online publication date: 31-Jan-2018
  • (2018)Data-driven adaptive dynamic programming schemes for non-zero-sum games of unknown discrete-time nonlinear systemsNeurocomputing10.1016/j.neucom.2017.09.020275:C(649-658)Online publication date: 31-Jan-2018
  • (2018)Reinforcement learning-based online adaptive controller design for a class of unknown nonlinear discrete-time systems with time delaysNeural Computing and Applications10.1007/s00521-018-3537-730:6(1733-1745)Online publication date: 1-Sep-2018
  • (2017)A Brief Review of Neural Networks Based Learning and Control and Their Applications for RobotsComplexity10.1155/2017/18958972017Online publication date: 1-Jan-2017
  • (2017)A new history experience replay design for model-free adaptive dynamic programmingNeurocomputing10.1016/j.neucom.2017.04.069266:C(141-149)Online publication date: 29-Nov-2017
  • Show More Cited By

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media