Abstract
In this paper, we analyze an internal goal structure based on heuristic dynamic programming, named GrHDP, to tackle the 2-D maze navigation problem. Classical reinforcement learning approaches have been introduced to solve this problem in literature, yet no intermediate reward has been assigned before reaching the final goal. In this paper, we integrated one additional network, namely goal network, into the traditional heuristic dynamic programming (HDP) design to provide the internal reward/goal representation. The architecture of our proposed approach is presented, followed by the simulation of 2-D maze navigation (10*10) problem. For fair comparison, we conduct the same simulation environment settings for the traditional HDP approach. Simulation results show that our proposed GrHDP can obtain faster convergent speed with respect to the sum of square error, and also achieve lower error eventually.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Fang X, He H, Ni Z, Tang Y (2012) Learning and control in virtual reality for machine intelligence. In: International conference intelligent control and information processing (ICICIP’12), IEEE, Dalian, China, pp 63–67
Fu J, He H, Zhou X (2011) Adaptive learning and control for mimo system based on adaptive dynamic programming. IEEE Trans Neural Netw 22(7):1133–1148
Fu J, He H, Liu Q, Ni Z (2011) An adaptive dynamic programming approach for closely-coupled mimo system control. In: Int symp neural networks (ISNN’11), pp 1–10
Fu J, He H, Ni Z (2011) Adaptive dynamic programming with balanced weights seeking strategy. In: IEEE symposium on adaptive dynamic programming and reinforcement learning (ADPRL), IEEE symposium series on computational intelligence (SSCI), France
He P, Jagannathan S (2007) Reinforcement learning neural-network-based controller for nonlinear discrete-time systems with input contraints. IEEE Trans Syst Man Cybern Part B-Cybern 37(2):425–436
He H (2011) Self-adaptive systems for machine intelligence. Wiley, New York
He H, Ni Z, Fu J (2012) A three-network architecture for on-line learning and optimization based on adaptive dynamic programming. Neurocomputing 78(1):3–13
He H, Ni Z, Zhao D (2012) Reinforcement learning and approximate dynamic programming for feedback control, ch. learning and optimization in hierarchical adaptive critic design. Wiley-IEEE Press, Hoboken
He H, Ni Z, Prokhorov DV (2011) Actor-critic design for on-line learning and optimization for machine intelligence. In: International conference on cognitive and neural systems (ICCNS’11), Boston
He H, Ni Z, Zhao D (2012) Data-driven learning and control with multiple critic networks. In: The 10th world congress on, intelligent control and automation (WCICA’12), pp 523–527
Ilin R, Kozma R, Werbos P (2008) Beyond feedforward models trained by backpropagation: a practical training tool for a more efficient universal approximator. Neural Netw IEEE Trans 19(6):929–937
Ilin R, Kozma R, Werbos (2006) Cellular SRN trained by extended Kalman filter shows promise for ADP. In: Proceedings of the IEEE international joint conference on neural networks (IJCNN), IEEE, pp 506–510
Ilin R, Kozma R, Werbos P (2007) Efficient learning in cellular simultaneous recurrent neural networks-the case of maze navigation problem. In: IEEE international symposium on approximate dynamic programming and reinforcement learning (ADPRL), IEEE, pp 324–329
Lewis F, Liu D (eds) (2013) Reinforcement learning and approximate dynamic programming for feedback control. Wiley-IEEE Press, Hoboken
Liu F, Sun J, Si J, Guo W, Mei S (2012) A boundedness result for the direct heuristic dynamic programming. Neural Netw 32:229–235
Liu D, Wang D, Zhao D, Wei Q, Jin N (2012) Neural-network-based optimal control for a class of unknown discrete-time nonlinear systems using globalized dual heuristic programming. IEEE Trans Autom Sci Eng 9(3):628–634
Liu D, Wei Q (2013) Finite-approximation-error-based optimal control approach for discrete-time nonlinear systems. IEEE Trans Cybern 43(2):779–789
Mitchell TM (1997) Machine learning. McGraw-Hill, Inc, New York
Ni Z, He H, Wen J (2013) Adaptive learning in tracking control based on the dual critic network design. IEEE Trans Neural Netw Learn Syst 6(24):913–928
Ni Z, Fang X, He H, Zhao D, Xu X (2013) Real-time tracking control on adaptive critic design with uniformly ultimately bounded condition. In: IEEE symposium on adaptive dynamic programming and reinforcement learning (ADPRL’13). IEEE symposium series on computational intelligence (SSCI), USA
Ni Z, He H, Prokhorov DV, Fu J (2011) An online actor-critic learning approach with Levenberg-Marquardt algorithm. In: The 2011 international joint conference on neural networks (IJCNN), IEEE, pp 2333–2340
Ni Z, He H, Prokhorov DV (2012) Adaptive learning with goal generator network based on heuristic dynamic programming. In: Internatinal conference on cognitive and neural systems (ICCNS’12), Boston
Ni Z, He H, Wen J, Xu X (2013) Goal representation heuristic dynamic programming on maze navigation. IEEE Trans Neural Netw Learn Syst (to be published)
Ni Z, He H, Zhao D, Prokhorov D (2012) Reinforcement learning control based on multi-goal representation using hierarchical heuristic dynamic programming. In: The 2012 international joint conference on neural networks (IJCNN), IEEE, pp 1–8
Pang X, Werbos PJ (1996) Neural network design for j function approximation in dynamic programming. In: Mathematical modelling and scientific computing. http://arxiv.org/pdf/adap-org/9806001.pdf
Prokhorov DV (1997) Adaptive critic designs and their applications, PhD. Dissertation. PhD thesis
Prokhorov DV, Santiago RA, Wunsch DC II (1995) Adaptive critic designs: a case study for neurocontrol. Neural Netw 8(9):1367–1372
Prokhorov D, Wunsch D (1997) Adaptive critic designs. IEEE Trans Neural Netw 8(5):997–1007
Si J, Wang Y-T (2001) Online learning control by association and reinforcement. IEEE Trans Neural Netw 12(2):264–276
Si J, Barto AG, Powell WB, Wunsch DC (eds) (2004) Handbook of learning and approximate dynamic programming. Wiley, New York
Sutton R, Barto A (1998) Reinforcement learning: an introduction. MIT Press, Cambridge
Wang D, Liu D, Wei Q, Zhao D, Jin N (2012) Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming. Automatica 48:1825–1832
Werbos PJ (1990) Consistency of HDP applied to a simple reinforcement learning problem. Neural Netw 3(2):179–189
Werbos PJ (1992) Handbook of itelligent control, ch. Approximate dynamic programming for real-teim control and nerual modeling. Van Nostrand Reinhold, New York
Werbos PJ (2008) Adp: the key direction for future research in intelligent control and understanding brain intelligence. IEEE Trans Syst Man Cybern Part B-Cybern 38(4):898–900
Werbos PJ (2009) Intelligence in the brain: a theory of how it works and how to build it. Neural Netw 22(3):200–212
Werbos P (2013) Reinforcement learning and approximate dynamic programming for feedback control, ch. reinforcement learning and approximate dynamic programming (RLADP)-foundations, common misconceptions and challenges ahead. Wiley-IEEE Press, Hoboken
Werbos P, Pang X (1996) Generalized maze navigation: SRN critics solve what feedforward or Hebbian nets cannot. In: Systems, man, and cybernetics, 1996, IEEE international conference on, vol 3, pp 1764–1769
Wiering M, Van Hasselt H (2007) Two novel on-policy reinforcement learning algorithms based on td (\(\lambda \))-methods. In: IEEE international symposium on approximate dynamic programming and reinforcement learning (ADPRL), IEEE, pp 280–287
Wunsch D (2000) The cellular simultaneous recurrent network adaptive critic design for the generalized maze problem has a simple closed-form solution. In: Proceedings of the IEEE international joint conference on neural networks (IJCNN), IEEE, vol 3, pp 79–82
Yang L, Si J, Tsakalis KS, Rodriguez AA (2009) Direct heuristic dynamic programming for nonlinear tracking conrol with filtered tracking error. IEEE Trans Syst Man Cybern Part B-Cybern 39(6):1617–1622
Acknowledgments
This work was supported by the National Science Foundation (NSF) under grant CAREER ECCS 1053717, Army Research Office (ARO) under grant W911NF-12-1-0378, and NSF-DFG Collaborative Research on “Autonomous Learning” (a supplement grant to CNS 1117314).
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by C. Alippi, D. Zhao and D. Liu.
Rights and permissions
About this article
Cite this article
Ni, Z., He, H. Heuristic dynamic programming with internal goal representation. Soft Comput 17, 2101–2108 (2013). https://doi.org/10.1007/s00500-013-1112-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-013-1112-9