Abstract
Evolutionary Algorithms (EAs) and Deep Reinforcement Learning (DRL) have recently been integrated to take advantage of both methods for better exploration and exploitation. The evolutionary part of these hybrid methods maintains a population of policy networks. However, existing methods focus on optimizing the parameters of policy network, which is usually high-dimensional and tricky for EA. In this paper, we shift the target of evolution from high-dimensional parameter space to low-dimensional action space. We propose Evolutionary Action Selection-Twin Delayed Deep Deterministic Policy Gradient (EAS-TD3), a novel hybrid method of EA and DRL. In EAS, we focus on optimizing the action chosen by the policy network and attempt to obtain high-quality actions to promote policy learning through an evolutionary algorithm. We conduct several experiments on challenging continuous control tasks. The result shows that EAS-TD3 shows superior performance over other state-of-art methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ackley, D.: Interactions between learning and evolution. Artif. Life II 10, 487ā509 (1992)
Bodnar, C., Day, B., LiĆ³, P.: Proximal distilled evolutionary reinforcement learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 3283ā3290 (2020)
Brockman, G., et al.: OpenAI gym. arXiv preprint arXiv:1606.01540 (2016)
Casas, N.: Deep deterministic policy gradient for urban traffic light control. arXiv preprint arXiv:1703.09035 (2017)
Colas, C., Sigaud, O., Oudeyer, P.Y.: GEP-PG: decoupling exploration and exploitation in deep reinforcement learning algorithms. In: International Conference on Machine Learning, pp. 1039ā1048. PMLR (2018)
Cully, A., Clune, J., Tarapore, D., Mouret, J.B.: Robots that can adapt like animals. Nature 521(7553), 503ā507 (2015)
De Boer, P.T., Kroese, D.P., Mannor, S., Rubinstein, R.Y.: A tutorial on the cross-entropy method. Ann. Oper. Res. 134(1), 19ā67 (2005). https://doi.org/10.1007/s10479-005-5724-z
Fujimoto, S., Hoof, H., Meger, D.: Addressing function approximation error in actor-critic methods. In: International Conference on Machine Learning, pp. 1587ā1596. PMLR (2018)
Grefenstette, J.J., Moriarty, D.E., Schultz, A.C.: Evolutionary algorithms for reinforcement learning. arXiv e-prints, p. arXiv-1106 (2011)
Haarnoja, T., et al.: Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905 (2018)
Kennedy, J., Eberhart, R.: Particle swarm optimization. In: Proceedings of ICNNā95-International Conference on Neural Networks, vol. 4, pp. 1942ā1948. IEEE (1995)
Khadka, S., et al.: Collaborative evolutionary reinforcement learning. In: International Conference on Machine Learning, pp. 3341ā3350. PMLR (2019)
Khadka, S., Tumer, K.: Evolution-guided policy gradient in reinforcement learning. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp. 1196ā1208 (2018)
Lee, K., Lee, B.U., Shin, U., Kweon, I.S.: An efficient asynchronous method for integrating evolutionary and gradient-based policy search. arXiv preprint arXiv:2012.05417 (2020)
Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015)
Majid, A.Y., Saaybi, S., van Rietbergen, T., Francois-Lavet, V., Prasad, R.V., Verhoeven, C.: Deep reinforcement learning versus evolution strategies: a comparative survey. arXiv preprint arXiv:2110.01411 (2021)
Marchesini, E., Corsi, D., Farinelli, A.: Genetic soft updates for policy evolution in deep reinforcement learning. In: International Conference on Learning Representations (2021)
Mitchell, M.: An Introduction to Genetic Algorithms. MIT Press, Cambridge (1998)
Mnih, V., et al.: Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013)
Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 6292ā6299. IEEE (2018)
Pierrot, T., et al.: Diversity policy gradient for sample efficient quality-diversity optimization. In: ICLR Workshop on Agent Learning in Open-Endedness (2022)
Pourchot, A., Sigaud, O.: CEM-RL: combining evolutionary and gradient-based methods for policy search. In: International Conference on Learning Representations (2019)
Qian, H., Yu, Y.: Derivative-free reinforcement learning: a review. arXiv preprint arXiv:2102.05710 (2021)
Salimans, T., Ho, J., Chen, X., Sidor, S., Sutskever, I.: Evolution strategies as a scalable alternative to reinforcement learning. arXiv preprint arXiv:1703.03864 (2017)
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
Fujimoto, S.: Open-source implementation for TD3. https://github.com/sfujim/TD3 (2018)
Sigaud, O.: Combining evolution and deep reinforcement learning for policy search: a survey. arXiv preprint arXiv:2203.14009 (2022)
Silver, D., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484ā489 (2016)
Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., Riedmiller, M.: Deterministic policy gradient algorithms. In: International Conference on Machine Learning, pp. 387ā395. PMLR (2014)
Such, F.P., Madhavan, V., Conti, E., Lehman, J., Stanley, K.O., Clune, J.: Deep neuroevolution: genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning. arXiv preprint arXiv:1712.06567 (2017)
Todorov, E., Erez, T., Tassa, Y.: MuJoCo: a physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026ā5033. IEEE (2012)
Wang, D., Tan, D., Liu, L.: Particle swarm optimization algorithm: an overview. Soft. Comput. 22(2), 387ā408 (2018)
Acknowledgments
This work was supported in part by Shanghai Municipal Science and Technology Major Project (No. 2021SHZDZX0103), in part by Ji Hua Laboratory (No. X190011TB190), in part by the Shanghai Engineering Research Center of AI and Robotics, and in part by the Engineering Research Center of AI and Robotics, Ministry of Education, China.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
Ā© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Ma, Y., Liu, T., Wei, B., Liu, Y., Xu, K., Li, W. (2023). Evolutionary Action Selection forĀ Gradient-Based Policy Learning. In: Tanveer, M., Agarwal, S., Ozawa, S., Ekbal, A., Jatowt, A. (eds) Neural Information Processing. ICONIP 2022. Lecture Notes in Computer Science, vol 13625. Springer, Cham. https://doi.org/10.1007/978-3-031-30111-7_49
Download citation
DOI: https://doi.org/10.1007/978-3-031-30111-7_49
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-30110-0
Online ISBN: 978-3-031-30111-7
eBook Packages: Computer ScienceComputer Science (R0)