article

Backward Q-learning: The combination of Sarsa algorithm and Q-learning

Authors:

Tzuu-Hseng S. Li,

Chih-Jui LinAuthors Info & Claims

Engineering Applications of Artificial Intelligence, Volume 26, Issue 9

Pages 2184 - 2193

https://doi.org/10.1016/j.engappai.2013.06.016

Published: 01 October 2013 Publication History

Abstract

Reinforcement learning (RL) has been applied to many fields and applications, but there are still some dilemmas between exploration and exploitation strategy for action selection policy. The well-known areas of reinforcement learning are the Q-learning and the Sarsa algorithms, but they possess different characteristics. Generally speaking, the Sarsa algorithm has faster convergence characteristics, while the Q-learning algorithm has a better final performance. However, Sarsa algorithm is easily stuck in the local minimum and Q-learning needs longer time to learn. Most literatures investigated the action selection policy. Instead of studying an action selection strategy, this paper focuses on how to combine Q-learning with the Sarsa algorithm, and presents a new method, called backward Q-learning, which can be implemented in the Sarsa algorithm and Q-learning. The backward Q-learning algorithm directly tunes the Q-values, and then the Q-values will indirectly affect the action selection policy. Therefore, the proposed RL algorithms can enhance learning speed and improve final performance. Finally, three experimental results including cliff walk, mountain car, and cart-pole balancing control system are utilized to verify the feasibility and effectiveness of the proposed scheme. All the simulations illustrate that the backward Q-learning based RL algorithm outperforms the well-known Q-learning and the Sarsa algorithm.

References

[1]

Dynamic scheduling of maintenance tasks in the petroleum industry: a reinforcement approach. Engineering Applications of Artificial Intelligence. v22. 1089-1103.

Digital Library

[2]

Forecasting of short-term traffic-flow based on improved neurofuzzy models via emotional temporal difference learning algorithm. Engineering Applications of Artificial Intelligence. v25. 1022-1042.

Digital Library

[3]

Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics. v13 i5. 834-846.

[4]

Generalization in reinforcement learning: safely approximating the value function. In: Tesauro, G., Touretzky, D.S., Leen, T.K. (Eds.), Advances in Neural Information Processing Systems, vol. 7. MIT Press, Cambridge MA. pp. 369-376.

[5]

Transfer of control skill by machine learning. Engineering Applications of Artificial Intelligence. v10 i1. 63-71.

[6]

Design of Mamdani fuzzy logic controllers with rule base minimisation using genetic algorithm. Engineering Applications of Artificial Intelligence. v18 i7. 875-880.

Digital Library

[7]

Accelerating autonomous learning by using heuristic selection of actions. Journal of Heuristics. v14 i2. 135-168.

Digital Library

[8]

A new mobile robot navigation method using fuzzy logic and a modified Q-learning algorithm. Journal of Intelligent & Fuzzy Systems. v21 i2010. 113-119.

Digital Library

[9]

Fuzzy iteration methodology for reservoir flood control operation. Journal of the American Water Resources Association. v37 i5. 1381-1388.

[10]

Application of a PSO-based neural network in analysis of outcomes of construction claims. Automation in Construction. v16 i5. 642-646.

[11]

Passive dynamic walker controller design employing an RLS-based natural actor-critic learning algorithm. Engineering Applications of Artificial Intelligence. v21. 1027-1034.

Digital Library

[12]

Fuzzy Sarsa learning and the proof of existence of its stationary points. Asian Journal of Control. v10 i5. 535-549.

[13]

Quantum reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics - Part B: Cybernetics. v38 i5. 1207-1220.

Digital Library

[14]

Exploration and exploitation balance management in fuzzy reinforcement learning. Fuzzy Sets Systems. v161 i4. 578-595.

Digital Library

[15]

Adaptive fuzzy control of the inverted pendulum problem. IEEE Transactions on Control Systems Technology. v5 i2. 254-260.

[16]

Gordon, G., 1995. Stable Function Approximation in Dynamic Programming. Carnegie Mellon University, Pittsburgh, PA, Technical Report CMU-CS-95-103.

Digital Library

[17]

A new Q-learning algorithm based on the metropolis criterion. IEEE Transactions on Systems, Man, and Cybernetics - Part B: Cybernetics. v34 i5. 2140-2143.

Digital Library

[18]

Gurumoorthy R, Sanders S. R., 1993. Controlling nonminimum phase nonlinear system-the inverted pendulum on a cart example. In: Proceedings of the American Control Conference, pp. 680-685.

[19]

Cooperative strategy based on adaptive Q-learning for robot soccer systems. IEEE Transactions on Fuzzy Systems. v12 i4. 569-576.

Digital Library

[20]

Optimal local basis: a reinforcement learning approach for face recognition. International Journal of Computer Vision. v81 i2. 191-204.

Digital Library

[21]

Combination of online clustering and q-value based GA for reinforcement fuzzy system design. IEEE Transactions on Fuzzy Systems. v13 i3. 289-302.

Digital Library

[22]

Reinforcement learning: a survey. Journal of Artificial Intelligence Research. v4 i1996. 237-287.

Digital Library

[23]

Adaptation technique for integrating genetic programming and reinforcement learning for real robots. IEEE Transactions on Evolutionary Computation. v9 i3. 318-333.

Digital Library

[24]

Switching-type fuzzy sliding mode control of a cart-pole system. Mechatronics. v10 i1-2. 91-109.

[25]

Reinforcement learning based on local state feature learning and policy adjustment. Information Sciences. v154 i2003. 59-70.

Digital Library

[26]

Walking motion generation, synthesis, and control for biped robot by using PGRL, LPI, and fuzzy logic. IEEE Transactions on Systems, Man, and Cybernetics - Part B: Cybernetics. v41 i3. 736-748.

Digital Library

[27]

Rapid, safe, and incremental learning of navigation strategies. IEEE Transactions on Systems, Man, and Cybernetics - Part B: Cybernetics. v26 i3. 408-420.

Digital Library

[28]

Neural network and genetic programming for modelling coastal algal blooms. International Journal of Environment and Pollution. v28 i3-4. 223-238.

[29]

Incremental state aggregation for value function estimation in reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics - Part B: Cybernetics. v41 i5. 1407-1416.

Digital Library

[30]

Parameter estimation of fuzzy controller and its application to inverted pendulum. Engineering Applications of Artificial Intelligence. v17 i1. 37-60.

[31]

An adaptive Q-learning algorithm developed for agent-based computational modeling of electricity market. IEEE Transactions on Systems, Man, and Cybernetics - Part C: Applied Review. v40 i5. 547-556.

Digital Library

[32]

Learning to predict by the methods of temporal differences. Machine Learning. v3 i1. 9-44.

Digital Library

[33]

Generalization in reinforcement learning: successful examples using sparse coarse coding. In: Touretzky, D.S., Mozer, M.C., Hasselmo, M.E. (Eds.), Advances in Neural Information Processing Systems, 1996. MIT Press, Cambridge MA. pp. 1038-1045.

[34]

Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA.

[35]

A Markov game-adaptive fuzzy controller for robot manipulators. IEEE Transactions on Fuzzy Systems. v16 i1. 171-186.

Digital Library

[36]

Artificial neural network simulation of hourly groundwater levels in a coastal aquifer system of the Venice lagoon. Engineering Applications of Artificial Intelligence. v25. 1670-1676.

Digital Library

[37]

Q-learning. Machine Learning. v8 i1992. 279-292.

Digital Library

[38]

Two novel on-policy reinforcement learning algorithms based on TD(λ)-methods. Proceedings of the IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning. 280-287.

[39]

A machine-learning approach to multi-robot coordination. Engineering Applications of Artificial Intelligence. v21. 470-484.

Digital Library

[40]

Ensemble algorithms in reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics - Part B: Cybernetics. v38 i4. 930-936.

Digital Library

[41]

Predicting monthly streamflow using data-driven models coupled with data-preprocessing techniques. Water Resources Research. v45 i2009. W08432

[42]

Bridging the gap between feature- and grid-based SLAM. Robotics and Autonomous Systems. v58 i2. 140-148.

Digital Library

[43]

Multilayer ensemble pruning via novel multi-sub-swarm particle swarm optimization. Journal of Universal Computer Science. v15 i4. 840-858.

Cited By

Wang QWang PCai M(2024)Multivariate time series forecasting of daily urban water demand using reinforcement learning and gated recurrent unit networkProceedings of the 2024 7th International Conference on Data Storage and Data Engineering10.1145/3653924.3653931(45-51)Online publication date: 27-Feb-2024
https://dl.acm.org/doi/10.1145/3653924.3653931
Hu YLiu JYan HGuo X(2024)Schedule Disruption Recovery in Liner Shipping Service Based on a Reinforcement Learning-Enabled Adaptive Genetic AlgorithmIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2024.347799625:12(21622-21633)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1109/TITS.2024.3477996
Bulut VOnan ASenyayla B(2024)Obtaining the optimal shortest path between two points on a quasi-developable Bézier-type surface using the Geodesic-based Q-learning algorithmEngineering Applications of Artificial Intelligence10.1016/j.engappai.2024.108821136:PAOnline publication date: 1-Oct-2024
https://dl.acm.org/doi/10.1016/j.engappai.2024.108821
Show More Cited By

Backward Q-learning: The combination of Sarsa algorithm and Q-learning
1. Computing methodologies

Recommendations

Deep Reinforcement Learning: From Q-Learning to Deep Q-Learning
Neural Information Processing
Abstract
As the two hottest branches of machine learning, deep learning and reinforcement learning both play a vital role in the field of artificial intelligence. Combining deep learning with reinforcement learning, deep reinforcement learning is a method ...
Evaluation of reinforcement learning techniques
IITM '10: Proceedings of the First International Conference on Intelligent Interactive Technologies and Multimedia

Reinforcement learning is became one of the most important approaches to machine intelligence. Now RL is widely use by different research field as intelligent control, robotics and neuroscience. It provides us possible solution within unknown ...
Introspective Reinforcement Learning and Learning from Demonstration
AAMAS '18: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems

Reinforcement learning is a paradigm to model how an autonomous agent learns to maximise its cumulative reward by interacting with the environment. One challenge faced by reinforcement learning is that in many environments the reward signal is sparse, ...

Comments

Information & Contributors

Information

Published In

cover image Engineering Applications of Artificial Intelligence

Engineering Applications of Artificial Intelligence Volume 26, Issue 9

October, 2013

242 pages

ISSN:0952-1976

Issue’s Table of Contents

Copyright © Elsevier Ltd © 2013.

Publisher

Pergamon Press, Inc.

United States

Publication History

Published: 01 October 2013

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

16
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 13 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wang QWang PCai M(2024)Multivariate time series forecasting of daily urban water demand using reinforcement learning and gated recurrent unit networkProceedings of the 2024 7th International Conference on Data Storage and Data Engineering10.1145/3653924.3653931(45-51)Online publication date: 27-Feb-2024
https://dl.acm.org/doi/10.1145/3653924.3653931
Hu YLiu JYan HGuo X(2024)Schedule Disruption Recovery in Liner Shipping Service Based on a Reinforcement Learning-Enabled Adaptive Genetic AlgorithmIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2024.347799625:12(21622-21633)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1109/TITS.2024.3477996
Bulut VOnan ASenyayla B(2024)Obtaining the optimal shortest path between two points on a quasi-developable Bézier-type surface using the Geodesic-based Q-learning algorithmEngineering Applications of Artificial Intelligence10.1016/j.engappai.2024.108821136:PAOnline publication date: 1-Oct-2024
https://dl.acm.org/doi/10.1016/j.engappai.2024.108821
Li LJia LLiu SKong BLiu Y(2024)Multi-task Scheduling of Multiple Agricultural Machinery via Reinforcement Learning and Genetic AlgorithmAdvanced Intelligent Computing Technology and Applications10.1007/978-981-97-5578-3_6(70-81)Online publication date: 5-Aug-2024
https://dl.acm.org/doi/10.1007/978-981-97-5578-3_6
Wang XWang YSu XWang LLu CPeng HLiu J(2023)Deep reinforcement learning-based air combat maneuver decision-making: literature review, implementation tutorial and future directionArtificial Intelligence Review10.1007/s10462-023-10620-257:1Online publication date: 28-Dec-2023
https://dl.acm.org/doi/10.1007/s10462-023-10620-2
Li SZhang WZhang HZhang XLeng Y(2022)Proximal policy optimization with model-based methodsJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-21193542:6(5399-5410)Online publication date: 1-Jan-2022
https://dl.acm.org/doi/10.3233/JIFS-211935
Li TYin YYang BHou JZhou K(2022)A self-learning bee colony and genetic algorithm hybrid for cloud manufacturing servicesComputing10.1007/s00607-022-01079-0104:9(1977-2003)Online publication date: 1-Sep-2022
https://dl.acm.org/doi/10.1007/s00607-022-01079-0
Zhang TZhu KWang J(2021)Energy-Efficient Mode Selection and Resource Allocation for D2D-Enabled Heterogeneous Networks: A Deep Reinforcement Learning ApproachIEEE Transactions on Wireless Communications10.1109/TWC.2020.303143620:2(1175-1187)Online publication date: 10-Feb-2021
https://dl.acm.org/doi/10.1109/TWC.2020.3031436
Zhao PZheng JZhou QLyu CLyu L(2021)A Dueling-DDPG Architecture for Mobile Robots Path Planning Based on Laser Range FindingsPRICAI 2021: Trends in Artificial Intelligence10.1007/978-3-030-89188-6_12(154-168)Online publication date: 8-Nov-2021
https://dl.acm.org/doi/10.1007/978-3-030-89188-6_12
Chai GCao JXu SFarouk A(2020)An optimized on-ramp metering method for urban expressway based on reinforcement learningJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-17955638:3(2703-2715)Online publication date: 1-Jan-2020
https://dl.acm.org/doi/10.3233/JIFS-179556
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents