Article

Neural fitted q iteration – first experiences with a data efficient neural reinforcement learning method

Author:

Martin RiedmillerAuthors Info & Claims

ECML'05: Proceedings of the 16th European conference on Machine Learning

Pages 317 - 328

https://doi.org/10.1007/11564096_32

Published: 03 October 2005 Publication History

Publisher Site

Abstract

This paper introduces NFQ, an algorithm for efficient and effective training of a Q-value function represented by a multi-layer perceptron. Based on the principle of storing and reusing transition experiences, a model-free, neural network based Reinforcement Learning algorithm is proposed. The method is evaluated on three benchmark problems. It is shown empirically, that reasonably few interactions with the plant are needed to generate control policies of high quality.

References

[1]

Boyan and Moore. Generalization in reinforcement learning: Safely approximating the value function. In Advances in Neural Information Processing Systems 7. Morgan Kaufmann, 1995.

Google Scholar

[2]

D. Ernst and and L. Wehenkel P. Geurts. Tree-based batch mode reinforcement learning. Journal of Machine Learning Research, 6:503-556, 2005.

Digital Library

Google Scholar

[3]

G. J. Gordon. Stable function approximation in dynamic programming. In A. Prieditis and S. Russell, editors, Proceedings of the ICML, San Francisco, CA, 1995.

Google Scholar

[4]

L.-J. Lin. Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine Learning, 8:293-321, 1992.

Digital Library

Google Scholar

[5]

M. Lagoudakis and R. Parr. Least-squares policy iteration. Journal of Machine Learning Research, 4:1107-1149, 2003.

Digital Library

Google Scholar

[6]

M. Riedmiller and H. Braun. A direct adaptive method for faster backpropagation learning: The RPROP algorithm. In H. Ruspini, editor, Proceedings of the IEEE International Conference on Neural Networks (ICNN), pages 586-591, San Francisco, 1993.

Crossref

Google Scholar

[7]

M. Riedmiller. Concepts and facilities of a neural reinforcement learning control architecture for technical process control. Journal of Neural Computing and Application, 8:323-338, 2000.

Crossref

Google Scholar

[8]

R. S. Sutton and A. G. Barto. Reinforcement Learning. MIT Press, Cambridge, MA, 1998.

Digital Library

Google Scholar

[9]

G. Tesauro. Practical issues in temporal difference learning. Machine Learning, (8):257-277, 1992.

Digital Library

Google Scholar

Cited By

View all

Ayoub AWang KLiu VRobertson SMcInerney JLiang DKallus NSzepesvári CSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Switching the loss reduces the cost in batch reinforcement learningProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692155(2135-2158)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3692155
Wang YGe YLi ZLi LChen R(2024)M3Rec: A Context-Aware Offline Meta-Level Model-Based Reinforcement Learning Approach for Cold-Start RecommendationACM Transactions on Information Systems10.1145/365994742:6(1-27)Online publication date: 19-Aug-2024
https://dl.acm.org/doi/10.1145/3659947
Bozkus TMitra U(2024)Multi-Timescale Ensemble $Q$-Learning for Markov Decision Process Policy OptimizationIEEE Transactions on Signal Processing10.1109/TSP.2024.337269972(1427-1442)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TSP.2024.3372699
Show More Cited By

Neural fitted q iteration – first experiences with a data efficient neural reinforcement learning method
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches

Recommendations

On efficient learning algorithms for neural networks
AdaBoost-based artificial neural network learning

A boosting-based method of learning a feed-forward artificial neural network (ANN) with a single layer of hidden neurons and a single output neuron is presented. Initially, an algorithm called Boostron is described that learns a single-layer perceptron ...
An Improved Reinforcement Q-Learning Method with BP Neural Networks in Robot Soccer
ISCID '11: Proceedings of the 2011 Fourth International Symposium on Computational Intelligence and Design - Volume 01

In traditional reinforcement Q-Learning method, there exists two problems: difficulty of dividing the state information, complexity of extreme large dimension input. To solve these two problems, this paper proposed an improved reinforcement Q-Learning ...

Comments

Information & Contributors

Information

Published In

ECML'05: Proceedings of the 16th European conference on Machine Learning

October 2005

769 pages

ISBN:3540292438

Editors:
João Gama
Faculty of Economics of the University of Porto, Portugal
,
Rui Camacho
Faculdade de Engenharia & LIAAD, Universidade do Porto, Portugal
,
Pavel B. Brazdil
LIAAD-INESC Porto L.A./Faculty of Economics, University of Porto, Rua de Ceuta, 118-6, Porto, Portugal
,
Alípio Mário Jorge
LIACC/FEP, Universidade do Porto, Rua de Ceuta, 118-6, Porto, Portugal
,
Luís Torgo
LIAAD-INESC Porto LA / FEP, University of Porto, R. de Ceuta, 118, 6., Porto, Portugal

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 03 October 2005

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

108
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 18 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Ayoub AWang KLiu VRobertson SMcInerney JLiang DKallus NSzepesvári CSalakhutdinov RKolter ZHeller KWeller AOliver NScarlett JBerkenkamp F(2024)Switching the loss reduces the cost in batch reinforcement learningProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692155(2135-2158)Online publication date: 21-Jul-2024
https://dl.acm.org/doi/10.5555/3692070.3692155
Wang YGe YLi ZLi LChen R(2024)M3Rec: A Context-Aware Offline Meta-Level Model-Based Reinforcement Learning Approach for Cold-Start RecommendationACM Transactions on Information Systems10.1145/365994742:6(1-27)Online publication date: 19-Aug-2024
https://dl.acm.org/doi/10.1145/3659947
Bozkus TMitra U(2024)Multi-Timescale Ensemble $Q$-Learning for Markov Decision Process Policy OptimizationIEEE Transactions on Signal Processing10.1109/TSP.2024.337269972(1427-1442)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TSP.2024.3372699
Mohamed Ahmed ANguyen TAbdelrazek MAryal S(2024)Reinforcement learning-based autonomous attacker to uncover computer network vulnerabilitiesNeural Computing and Applications10.1007/s00521-024-09668-036:23(14341-14360)Online publication date: 1-Aug-2024
https://dl.acm.org/doi/10.1007/s00521-024-09668-0
Zhang YLiao KLiao ZGuo L(2024)Enhancing Policy Gradient for Traveling Salesman Problem with Data Augmented Behavior CloningAdvances in Knowledge Discovery and Data Mining10.1007/978-981-97-2253-2_26(327-338)Online publication date: 7-May-2024
https://dl.acm.org/doi/10.1007/978-981-97-2253-2_26
Laidlaw CRussell SDragan AOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Bridging reinforcement learning theory and practice with the effective horizonProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668695(58953-59007)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3668695
Pavse BHanna JOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)State-action similarity-based representations for off-policy evaluationProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667956(42298-42329)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3667956
Hui DCourville ABacon POh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Double gumbel Q-learningProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666239(2580-2616)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3666239
Demir CNgomo AElkind E(2023)Neuro-symbolic class expression learningProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence10.24963/ijcai.2023/403(3624-3632)Online publication date: 19-Aug-2023
https://dl.acm.org/doi/10.24963/ijcai.2023/403
Tang XZhang MLiu WDu BLiu Z(2023)Towards a model of human-cyber–physical automata and a synthesis framework for control policiesJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2023.102989144:COnline publication date: 1-Nov-2023
https://dl.acm.org/doi/10.1016/j.sysarc.2023.102989
Show More Cited By

Abstract

References

Cited By

Recommendations

On efficient learning algorithms for neural networks

AdaBoost-based artificial neural network learning

An Improved Reinforcement Q-Learning Method with BP Neural Networks in Robot Soccer

Comments

Information

Published In

Sponsors

Publisher

Publication History

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

Share

Share this Publication link

Share on social media

Affiliations