Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Backward Q-learning: The combination of Sarsa algorithm and Q-learning

Published: 01 October 2013 Publication History

Abstract

Reinforcement learning (RL) has been applied to many fields and applications, but there are still some dilemmas between exploration and exploitation strategy for action selection policy. The well-known areas of reinforcement learning are the Q-learning and the Sarsa algorithms, but they possess different characteristics. Generally speaking, the Sarsa algorithm has faster convergence characteristics, while the Q-learning algorithm has a better final performance. However, Sarsa algorithm is easily stuck in the local minimum and Q-learning needs longer time to learn. Most literatures investigated the action selection policy. Instead of studying an action selection strategy, this paper focuses on how to combine Q-learning with the Sarsa algorithm, and presents a new method, called backward Q-learning, which can be implemented in the Sarsa algorithm and Q-learning. The backward Q-learning algorithm directly tunes the Q-values, and then the Q-values will indirectly affect the action selection policy. Therefore, the proposed RL algorithms can enhance learning speed and improve final performance. Finally, three experimental results including cliff walk, mountain car, and cart-pole balancing control system are utilized to verify the feasibility and effectiveness of the proposed scheme. All the simulations illustrate that the backward Q-learning based RL algorithm outperforms the well-known Q-learning and the Sarsa algorithm.

References

[1]
Dynamic scheduling of maintenance tasks in the petroleum industry: a reinforcement approach. Engineering Applications of Artificial Intelligence. v22. 1089-1103.
[2]
Forecasting of short-term traffic-flow based on improved neurofuzzy models via emotional temporal difference learning algorithm. Engineering Applications of Artificial Intelligence. v25. 1022-1042.
[3]
Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics. v13 i5. 834-846.
[4]
Generalization in reinforcement learning: safely approximating the value function. In: Tesauro, G., Touretzky, D.S., Leen, T.K. (Eds.), Advances in Neural Information Processing Systems, vol. 7. MIT Press, Cambridge MA. pp. 369-376.
[5]
Transfer of control skill by machine learning. Engineering Applications of Artificial Intelligence. v10 i1. 63-71.
[6]
Design of Mamdani fuzzy logic controllers with rule base minimisation using genetic algorithm. Engineering Applications of Artificial Intelligence. v18 i7. 875-880.
[7]
Accelerating autonomous learning by using heuristic selection of actions. Journal of Heuristics. v14 i2. 135-168.
[8]
A new mobile robot navigation method using fuzzy logic and a modified Q-learning algorithm. Journal of Intelligent & Fuzzy Systems. v21 i2010. 113-119.
[9]
Fuzzy iteration methodology for reservoir flood control operation. Journal of the American Water Resources Association. v37 i5. 1381-1388.
[10]
Application of a PSO-based neural network in analysis of outcomes of construction claims. Automation in Construction. v16 i5. 642-646.
[11]
Passive dynamic walker controller design employing an RLS-based natural actor-critic learning algorithm. Engineering Applications of Artificial Intelligence. v21. 1027-1034.
[12]
Fuzzy Sarsa learning and the proof of existence of its stationary points. Asian Journal of Control. v10 i5. 535-549.
[13]
Quantum reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics - Part B: Cybernetics. v38 i5. 1207-1220.
[14]
Exploration and exploitation balance management in fuzzy reinforcement learning. Fuzzy Sets Systems. v161 i4. 578-595.
[15]
Adaptive fuzzy control of the inverted pendulum problem. IEEE Transactions on Control Systems Technology. v5 i2. 254-260.
[16]
Gordon, G., 1995. Stable Function Approximation in Dynamic Programming. Carnegie Mellon University, Pittsburgh, PA, Technical Report CMU-CS-95-103.
[17]
A new Q-learning algorithm based on the metropolis criterion. IEEE Transactions on Systems, Man, and Cybernetics - Part B: Cybernetics. v34 i5. 2140-2143.
[18]
Gurumoorthy R, Sanders S. R., 1993. Controlling nonminimum phase nonlinear system-the inverted pendulum on a cart example. In: Proceedings of the American Control Conference, pp. 680-685.
[19]
Cooperative strategy based on adaptive Q-learning for robot soccer systems. IEEE Transactions on Fuzzy Systems. v12 i4. 569-576.
[20]
Optimal local basis: a reinforcement learning approach for face recognition. International Journal of Computer Vision. v81 i2. 191-204.
[21]
Combination of online clustering and q-value based GA for reinforcement fuzzy system design. IEEE Transactions on Fuzzy Systems. v13 i3. 289-302.
[22]
Reinforcement learning: a survey. Journal of Artificial Intelligence Research. v4 i1996. 237-287.
[23]
Adaptation technique for integrating genetic programming and reinforcement learning for real robots. IEEE Transactions on Evolutionary Computation. v9 i3. 318-333.
[24]
Switching-type fuzzy sliding mode control of a cart-pole system. Mechatronics. v10 i1-2. 91-109.
[25]
Reinforcement learning based on local state feature learning and policy adjustment. Information Sciences. v154 i2003. 59-70.
[26]
Walking motion generation, synthesis, and control for biped robot by using PGRL, LPI, and fuzzy logic. IEEE Transactions on Systems, Man, and Cybernetics - Part B: Cybernetics. v41 i3. 736-748.
[27]
Rapid, safe, and incremental learning of navigation strategies. IEEE Transactions on Systems, Man, and Cybernetics - Part B: Cybernetics. v26 i3. 408-420.
[28]
Neural network and genetic programming for modelling coastal algal blooms. International Journal of Environment and Pollution. v28 i3-4. 223-238.
[29]
Incremental state aggregation for value function estimation in reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics - Part B: Cybernetics. v41 i5. 1407-1416.
[30]
Parameter estimation of fuzzy controller and its application to inverted pendulum. Engineering Applications of Artificial Intelligence. v17 i1. 37-60.
[31]
An adaptive Q-learning algorithm developed for agent-based computational modeling of electricity market. IEEE Transactions on Systems, Man, and Cybernetics - Part C: Applied Review. v40 i5. 547-556.
[32]
Learning to predict by the methods of temporal differences. Machine Learning. v3 i1. 9-44.
[33]
Generalization in reinforcement learning: successful examples using sparse coarse coding. In: Touretzky, D.S., Mozer, M.C., Hasselmo, M.E. (Eds.), Advances in Neural Information Processing Systems, 1996. MIT Press, Cambridge MA. pp. 1038-1045.
[34]
Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA.
[35]
A Markov game-adaptive fuzzy controller for robot manipulators. IEEE Transactions on Fuzzy Systems. v16 i1. 171-186.
[36]
Artificial neural network simulation of hourly groundwater levels in a coastal aquifer system of the Venice lagoon. Engineering Applications of Artificial Intelligence. v25. 1670-1676.
[37]
Q-learning. Machine Learning. v8 i1992. 279-292.
[38]
Two novel on-policy reinforcement learning algorithms based on TD(λ)-methods. Proceedings of the IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning. 280-287.
[39]
A machine-learning approach to multi-robot coordination. Engineering Applications of Artificial Intelligence. v21. 470-484.
[40]
Ensemble algorithms in reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics - Part B: Cybernetics. v38 i4. 930-936.
[41]
Predicting monthly streamflow using data-driven models coupled with data-preprocessing techniques. Water Resources Research. v45 i2009. W08432
[42]
Bridging the gap between feature- and grid-based SLAM. Robotics and Autonomous Systems. v58 i2. 140-148.
[43]
Multilayer ensemble pruning via novel multi-sub-swarm particle swarm optimization. Journal of Universal Computer Science. v15 i4. 840-858.

Cited By

View all
  • (2024)Multivariate time series forecasting of daily urban water demand using reinforcement learning and gated recurrent unit networkProceedings of the 2024 7th International Conference on Data Storage and Data Engineering10.1145/3653924.3653931(45-51)Online publication date: 27-Feb-2024
  • (2024)Schedule Disruption Recovery in Liner Shipping Service Based on a Reinforcement Learning-Enabled Adaptive Genetic AlgorithmIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2024.347799625:12(21622-21633)Online publication date: 21-Oct-2024
  • (2024)Obtaining the optimal shortest path between two points on a quasi-developable Bézier-type surface using the Geodesic-based Q-learning algorithmEngineering Applications of Artificial Intelligence10.1016/j.engappai.2024.108821136:PAOnline publication date: 1-Oct-2024
  • Show More Cited By
  1. Backward Q-learning: The combination of Sarsa algorithm and Q-learning

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Engineering Applications of Artificial Intelligence
    Engineering Applications of Artificial Intelligence  Volume 26, Issue 9
    October, 2013
    242 pages

    Publisher

    Pergamon Press, Inc.

    United States

    Publication History

    Published: 01 October 2013

    Author Tags

    1. Backward Q-learning
    2. Q-learning
    3. Reinforcement learning
    4. Sarsa algorithm

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 13 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Multivariate time series forecasting of daily urban water demand using reinforcement learning and gated recurrent unit networkProceedings of the 2024 7th International Conference on Data Storage and Data Engineering10.1145/3653924.3653931(45-51)Online publication date: 27-Feb-2024
    • (2024)Schedule Disruption Recovery in Liner Shipping Service Based on a Reinforcement Learning-Enabled Adaptive Genetic AlgorithmIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2024.347799625:12(21622-21633)Online publication date: 21-Oct-2024
    • (2024)Obtaining the optimal shortest path between two points on a quasi-developable Bézier-type surface using the Geodesic-based Q-learning algorithmEngineering Applications of Artificial Intelligence10.1016/j.engappai.2024.108821136:PAOnline publication date: 1-Oct-2024
    • (2024)Multi-task Scheduling of Multiple Agricultural Machinery via Reinforcement Learning and Genetic AlgorithmAdvanced Intelligent Computing Technology and Applications10.1007/978-981-97-5578-3_6(70-81)Online publication date: 5-Aug-2024
    • (2023)Deep reinforcement learning-based air combat maneuver decision-making: literature review, implementation tutorial and future directionArtificial Intelligence Review10.1007/s10462-023-10620-257:1Online publication date: 28-Dec-2023
    • (2022)Proximal policy optimization with model-based methodsJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-21193542:6(5399-5410)Online publication date: 1-Jan-2022
    • (2022)A self-learning bee colony and genetic algorithm hybrid for cloud manufacturing servicesComputing10.1007/s00607-022-01079-0104:9(1977-2003)Online publication date: 1-Sep-2022
    • (2021)Energy-Efficient Mode Selection and Resource Allocation for D2D-Enabled Heterogeneous Networks: A Deep Reinforcement Learning ApproachIEEE Transactions on Wireless Communications10.1109/TWC.2020.303143620:2(1175-1187)Online publication date: 10-Feb-2021
    • (2021)A Dueling-DDPG Architecture for Mobile Robots Path Planning Based on Laser Range FindingsPRICAI 2021: Trends in Artificial Intelligence10.1007/978-3-030-89188-6_12(154-168)Online publication date: 8-Nov-2021
    • (2020)An optimized on-ramp metering method for urban expressway based on reinforcement learningJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-17955638:3(2703-2715)Online publication date: 1-Jan-2020
    • Show More Cited By

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media