Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1007/11564096_32guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Neural fitted q iteration – first experiences with a data efficient neural reinforcement learning method

Published: 03 October 2005 Publication History

Abstract

This paper introduces NFQ, an algorithm for efficient and effective training of a Q-value function represented by a multi-layer perceptron. Based on the principle of storing and reusing transition experiences, a model-free, neural network based Reinforcement Learning algorithm is proposed. The method is evaluated on three benchmark problems. It is shown empirically, that reasonably few interactions with the plant are needed to generate control policies of high quality.

References

[1]
Boyan and Moore. Generalization in reinforcement learning: Safely approximating the value function. In Advances in Neural Information Processing Systems 7. Morgan Kaufmann, 1995.
[2]
D. Ernst and and L. Wehenkel P. Geurts. Tree-based batch mode reinforcement learning. Journal of Machine Learning Research, 6:503-556, 2005.
[3]
G. J. Gordon. Stable function approximation in dynamic programming. In A. Prieditis and S. Russell, editors, Proceedings of the ICML, San Francisco, CA, 1995.
[4]
L.-J. Lin. Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine Learning, 8:293-321, 1992.
[5]
M. Lagoudakis and R. Parr. Least-squares policy iteration. Journal of Machine Learning Research, 4:1107-1149, 2003.
[6]
M. Riedmiller and H. Braun. A direct adaptive method for faster backpropagation learning: The RPROP algorithm. In H. Ruspini, editor, Proceedings of the IEEE International Conference on Neural Networks (ICNN), pages 586-591, San Francisco, 1993.
[7]
M. Riedmiller. Concepts and facilities of a neural reinforcement learning control architecture for technical process control. Journal of Neural Computing and Application, 8:323-338, 2000.
[8]
R. S. Sutton and A. G. Barto. Reinforcement Learning. MIT Press, Cambridge, MA, 1998.
[9]
G. Tesauro. Practical issues in temporal difference learning. Machine Learning, (8):257-277, 1992.

Cited By

View all
  • (2024)Switching the loss reduces the cost in batch reinforcement learningProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692155(2135-2158)Online publication date: 21-Jul-2024
  • (2024)M3Rec: A Context-Aware Offline Meta-Level Model-Based Reinforcement Learning Approach for Cold-Start RecommendationACM Transactions on Information Systems10.1145/365994742:6(1-27)Online publication date: 19-Aug-2024
  • (2024)Multi-Timescale Ensemble $Q$-Learning for Markov Decision Process Policy OptimizationIEEE Transactions on Signal Processing10.1109/TSP.2024.337269972(1427-1442)Online publication date: 1-Jan-2024
  • Show More Cited By
  1. Neural fitted q iteration – first experiences with a data efficient neural reinforcement learning method

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    ECML'05: Proceedings of the 16th European conference on Machine Learning
    October 2005
    769 pages
    ISBN:3540292438
    • Editors:
    • João Gama,
    • Rui Camacho,
    • Pavel B. Brazdil,
    • Alípio Mário Jorge,
    • Luís Torgo

    Sponsors

    • FCT: Foundation for Science and Technology
    • FEUP: Faculdade de Engenharia da Univ. do Porto
    • KDubiq: Knowledge Discovery in Ubiquitous Environments
    • Faculdade de Economia do Porto: Faculdade de Economia do Porto
    • LIACC-NIAAD: LIACC-NIAAD

    Publisher

    Springer-Verlag

    Berlin, Heidelberg

    Publication History

    Published: 03 October 2005

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 18 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Switching the loss reduces the cost in batch reinforcement learningProceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692155(2135-2158)Online publication date: 21-Jul-2024
    • (2024)M3Rec: A Context-Aware Offline Meta-Level Model-Based Reinforcement Learning Approach for Cold-Start RecommendationACM Transactions on Information Systems10.1145/365994742:6(1-27)Online publication date: 19-Aug-2024
    • (2024)Multi-Timescale Ensemble $Q$-Learning for Markov Decision Process Policy OptimizationIEEE Transactions on Signal Processing10.1109/TSP.2024.337269972(1427-1442)Online publication date: 1-Jan-2024
    • (2024)Reinforcement learning-based autonomous attacker to uncover computer network vulnerabilitiesNeural Computing and Applications10.1007/s00521-024-09668-036:23(14341-14360)Online publication date: 1-Aug-2024
    • (2024)Enhancing Policy Gradient for Traveling Salesman Problem with Data Augmented Behavior CloningAdvances in Knowledge Discovery and Data Mining10.1007/978-981-97-2253-2_26(327-338)Online publication date: 7-May-2024
    • (2023)Bridging reinforcement learning theory and practice with the effective horizonProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668695(58953-59007)Online publication date: 10-Dec-2023
    • (2023)State-action similarity-based representations for off-policy evaluationProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667956(42298-42329)Online publication date: 10-Dec-2023
    • (2023)Double gumbel Q-learningProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3666239(2580-2616)Online publication date: 10-Dec-2023
    • (2023)Neuro-symbolic class expression learningProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence10.24963/ijcai.2023/403(3624-3632)Online publication date: 19-Aug-2023
    • (2023)Towards a model of human-cyber–physical automata and a synthesis framework for control policiesJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2023.102989144:COnline publication date: 1-Nov-2023
    • Show More Cited By

    View Options

    View options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media