Heuristic dynamic programming with internal goal representation

Ni, Zhen; He, Haibo

doi:10.1007/s00500-013-1112-9

Heuristic dynamic programming with internal goal representation

Focus
Published: 03 September 2013

Volume 17, pages 2101–2108, (2013)
Cite this article

Soft Computing Aims and scope Submit manuscript

Zhen Ni¹ &
Haibo He¹

649 Accesses
22 Citations
Explore all metrics

Abstract

In this paper, we analyze an internal goal structure based on heuristic dynamic programming, named GrHDP, to tackle the 2-D maze navigation problem. Classical reinforcement learning approaches have been introduced to solve this problem in literature, yet no intermediate reward has been assigned before reaching the final goal. In this paper, we integrated one additional network, namely goal network, into the traditional heuristic dynamic programming (HDP) design to provide the internal reward/goal representation. The architecture of our proposed approach is presented, followed by the simulation of 2-D maze navigation (10*10) problem. For fair comparison, we conduct the same simulation environment settings for the traditional HDP approach. Simulation results show that our proposed GrHDP can obtain faster convergent speed with respect to the sum of square error, and also achieve lower error eventually.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hierarchical reinforcement learning for handling sparse rewards in multi-goal navigation

Article Open access 28 May 2024

PBCS: Efficient Exploration and Exploitation Using a Synergy Between Reinforcement Learning and Motion Planning

Hybrid Learning Using Profit Sharing and Genetic Algorithm for Partially Observable Markov Decision Processes

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Fang X, He H, Ni Z, Tang Y (2012) Learning and control in virtual reality for machine intelligence. In: International conference intelligent control and information processing (ICICIP’12), IEEE, Dalian, China, pp 63–67
Fu J, He H, Zhou X (2011) Adaptive learning and control for mimo system based on adaptive dynamic programming. IEEE Trans Neural Netw 22(7):1133–1148
Article Google Scholar
Fu J, He H, Liu Q, Ni Z (2011) An adaptive dynamic programming approach for closely-coupled mimo system control. In: Int symp neural networks (ISNN’11), pp 1–10
Fu J, He H, Ni Z (2011) Adaptive dynamic programming with balanced weights seeking strategy. In: IEEE symposium on adaptive dynamic programming and reinforcement learning (ADPRL), IEEE symposium series on computational intelligence (SSCI), France
He P, Jagannathan S (2007) Reinforcement learning neural-network-based controller for nonlinear discrete-time systems with input contraints. IEEE Trans Syst Man Cybern Part B-Cybern 37(2):425–436
Article Google Scholar
He H (2011) Self-adaptive systems for machine intelligence. Wiley, New York
Book Google Scholar
He H, Ni Z, Fu J (2012) A three-network architecture for on-line learning and optimization based on adaptive dynamic programming. Neurocomputing 78(1):3–13
Article Google Scholar
He H, Ni Z, Zhao D (2012) Reinforcement learning and approximate dynamic programming for feedback control, ch. learning and optimization in hierarchical adaptive critic design. Wiley-IEEE Press, Hoboken
Google Scholar
He H, Ni Z, Prokhorov DV (2011) Actor-critic design for on-line learning and optimization for machine intelligence. In: International conference on cognitive and neural systems (ICCNS’11), Boston
He H, Ni Z, Zhao D (2012) Data-driven learning and control with multiple critic networks. In: The 10th world congress on, intelligent control and automation (WCICA’12), pp 523–527
Ilin R, Kozma R, Werbos P (2008) Beyond feedforward models trained by backpropagation: a practical training tool for a more efficient universal approximator. Neural Netw IEEE Trans 19(6):929–937
Article Google Scholar
Ilin R, Kozma R, Werbos (2006) Cellular SRN trained by extended Kalman filter shows promise for ADP. In: Proceedings of the IEEE international joint conference on neural networks (IJCNN), IEEE, pp 506–510
Ilin R, Kozma R, Werbos P (2007) Efficient learning in cellular simultaneous recurrent neural networks-the case of maze navigation problem. In: IEEE international symposium on approximate dynamic programming and reinforcement learning (ADPRL), IEEE, pp 324–329
Lewis F, Liu D (eds) (2013) Reinforcement learning and approximate dynamic programming for feedback control. Wiley-IEEE Press, Hoboken
Google Scholar
Liu F, Sun J, Si J, Guo W, Mei S (2012) A boundedness result for the direct heuristic dynamic programming. Neural Netw 32:229–235
Article MATH Google Scholar
Liu D, Wang D, Zhao D, Wei Q, Jin N (2012) Neural-network-based optimal control for a class of unknown discrete-time nonlinear systems using globalized dual heuristic programming. IEEE Trans Autom Sci Eng 9(3):628–634
Article Google Scholar
Liu D, Wei Q (2013) Finite-approximation-error-based optimal control approach for discrete-time nonlinear systems. IEEE Trans Cybern 43(2):779–789
Article Google Scholar
Mitchell TM (1997) Machine learning. McGraw-Hill, Inc, New York
MATH Google Scholar
Ni Z, He H, Wen J (2013) Adaptive learning in tracking control based on the dual critic network design. IEEE Trans Neural Netw Learn Syst 6(24):913–928
Article Google Scholar
Ni Z, Fang X, He H, Zhao D, Xu X (2013) Real-time tracking control on adaptive critic design with uniformly ultimately bounded condition. In: IEEE symposium on adaptive dynamic programming and reinforcement learning (ADPRL’13). IEEE symposium series on computational intelligence (SSCI), USA
Ni Z, He H, Prokhorov DV, Fu J (2011) An online actor-critic learning approach with Levenberg-Marquardt algorithm. In: The 2011 international joint conference on neural networks (IJCNN), IEEE, pp 2333–2340
Ni Z, He H, Prokhorov DV (2012) Adaptive learning with goal generator network based on heuristic dynamic programming. In: Internatinal conference on cognitive and neural systems (ICCNS’12), Boston
Ni Z, He H, Wen J, Xu X (2013) Goal representation heuristic dynamic programming on maze navigation. IEEE Trans Neural Netw Learn Syst (to be published)
Ni Z, He H, Zhao D, Prokhorov D (2012) Reinforcement learning control based on multi-goal representation using hierarchical heuristic dynamic programming. In: The 2012 international joint conference on neural networks (IJCNN), IEEE, pp 1–8
Pang X, Werbos PJ (1996) Neural network design for j function approximation in dynamic programming. In: Mathematical modelling and scientific computing. http://arxiv.org/pdf/adap-org/9806001.pdf
Prokhorov DV (1997) Adaptive critic designs and their applications, PhD. Dissertation. PhD thesis
Prokhorov DV, Santiago RA, Wunsch DC II (1995) Adaptive critic designs: a case study for neurocontrol. Neural Netw 8(9):1367–1372
Article Google Scholar
Prokhorov D, Wunsch D (1997) Adaptive critic designs. IEEE Trans Neural Netw 8(5):997–1007
Article Google Scholar
Si J, Wang Y-T (2001) Online learning control by association and reinforcement. IEEE Trans Neural Netw 12(2):264–276
Article MathSciNet Google Scholar
Si J, Barto AG, Powell WB, Wunsch DC (eds) (2004) Handbook of learning and approximate dynamic programming. Wiley, New York
Google Scholar
Sutton R, Barto A (1998) Reinforcement learning: an introduction. MIT Press, Cambridge
Wang D, Liu D, Wei Q, Zhao D, Jin N (2012) Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming. Automatica 48:1825–1832
Article MathSciNet MATH Google Scholar
Werbos PJ (1990) Consistency of HDP applied to a simple reinforcement learning problem. Neural Netw 3(2):179–189
Article Google Scholar
Werbos PJ (1992) Handbook of itelligent control, ch. Approximate dynamic programming for real-teim control and nerual modeling. Van Nostrand Reinhold, New York
Werbos PJ (2008) Adp: the key direction for future research in intelligent control and understanding brain intelligence. IEEE Trans Syst Man Cybern Part B-Cybern 38(4):898–900
Google Scholar
Werbos PJ (2009) Intelligence in the brain: a theory of how it works and how to build it. Neural Netw 22(3):200–212
Article Google Scholar
Werbos P (2013) Reinforcement learning and approximate dynamic programming for feedback control, ch. reinforcement learning and approximate dynamic programming (RLADP)-foundations, common misconceptions and challenges ahead. Wiley-IEEE Press, Hoboken
Google Scholar
Werbos P, Pang X (1996) Generalized maze navigation: SRN critics solve what feedforward or Hebbian nets cannot. In: Systems, man, and cybernetics, 1996, IEEE international conference on, vol 3, pp 1764–1769
Wiering M, Van Hasselt H (2007) Two novel on-policy reinforcement learning algorithms based on td ($\lambda $)-methods. In: IEEE international symposium on approximate dynamic programming and reinforcement learning (ADPRL), IEEE, pp 280–287
Wunsch D (2000) The cellular simultaneous recurrent network adaptive critic design for the generalized maze problem has a simple closed-form solution. In: Proceedings of the IEEE international joint conference on neural networks (IJCNN), IEEE, vol 3, pp 79–82
Yang L, Si J, Tsakalis KS, Rodriguez AA (2009) Direct heuristic dynamic programming for nonlinear tracking conrol with filtered tracking error. IEEE Trans Syst Man Cybern Part B-Cybern 39(6):1617–1622
Article Google Scholar

Download references

Acknowledgments

This work was supported by the National Science Foundation (NSF) under grant CAREER ECCS 1053717, Army Research Office (ARO) under grant W911NF-12-1-0378, and NSF-DFG Collaborative Research on “Autonomous Learning” (a supplement grant to CNS 1117314).

Author information

Authors and Affiliations

Department of Electrical, Computer and Biomedical Engineering, University of Rhode Island, Kingston, RI, 02881, USA
Zhen Ni & Haibo He

Authors

Zhen Ni
View author publications
You can also search for this author in PubMed Google Scholar
Haibo He
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Haibo He.

Additional information

Communicated by C. Alippi, D. Zhao and D. Liu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ni, Z., He, H. Heuristic dynamic programming with internal goal representation. Soft Comput 17, 2101–2108 (2013). https://doi.org/10.1007/s00500-013-1112-9

Download citation

Published: 03 September 2013
Issue Date: November 2013
DOI: https://doi.org/10.1007/s00500-013-1112-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Heuristic dynamic programming with internal goal representation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Hierarchical reinforcement learning for handling sparse rewards in multi-goal navigation

PBCS: Efficient Exploration and Exploitation Using a Synergy Between Reinforcement Learning and Motion Planning

Hybrid Learning Using Profit Sharing and Genetic Algorithm for Partially Observable Markov Decision Processes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Heuristic dynamic programming with internal goal representation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Hierarchical reinforcement learning for handling sparse rewards in multi-goal navigation

PBCS: Efficient Exploration and Exploitation Using a Synergy Between Reinforcement Learning and Motion Planning

Hybrid Learning Using Profit Sharing and Genetic Algorithm for Partially Observable Markov Decision Processes

Explore related subjects

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation