Evolutionary Action Selection for Gradient-Based Policy Learning

Ma, Yan; Liu, Tianxing; Wei, Bingsheng; Liu, Yi; Xu, Kang; Li, Wei

doi:10.1007/978-3-031-30111-7_49

Yan Ma¹²,
Tianxing Liu¹²,
Bingsheng Wei¹²,
Yi Liu¹²,
Kang Xu¹² &
…
Wei Li^12,13

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13625))

Included in the following conference series:

International Conference on Neural Information Processing

1108 Accesses
1 Citations

Abstract

Evolutionary Algorithms (EAs) and Deep Reinforcement Learning (DRL) have recently been integrated to take advantage of both methods for better exploration and exploitation. The evolutionary part of these hybrid methods maintains a population of policy networks. However, existing methods focus on optimizing the parameters of policy network, which is usually high-dimensional and tricky for EA. In this paper, we shift the target of evolution from high-dimensional parameter space to low-dimensional action space. We propose Evolutionary Action Selection-Twin Delayed Deep Deterministic Policy Gradient (EAS-TD3), a novel hybrid method of EA and DRL. In EAS, we focus on optimizing the action chosen by the policy network and attempt to obtain high-quality actions to promote policy learning through an evolutionary algorithm. We conduct several experiments on challenging continuous control tasks. The result shows that EAS-TD3 shows superior performance over other state-of-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Proximal evolutionary strategy: improving deep reinforcement learning through evolutionary policy optimization

Article Open access 17 August 2024

Policy-based optimization: single-step policy gradient method seen as an evolution strategy

Article 14 September 2022

Combining a gradient-based method and an evolution strategy for multi-objective reinforcement learning

Article 01 June 2020

References

Ackley, D.: Interactions between learning and evolution. Artif. Life II 10, 487–509 (1992)
Google Scholar
Bodnar, C., Day, B., Lió, P.: Proximal distilled evolutionary reinforcement learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 3283–3290 (2020)
Google Scholar
Brockman, G., et al.: OpenAI gym. arXiv preprint arXiv:1606.01540 (2016)
Casas, N.: Deep deterministic policy gradient for urban traffic light control. arXiv preprint arXiv:1703.09035 (2017)
Colas, C., Sigaud, O., Oudeyer, P.Y.: GEP-PG: decoupling exploration and exploitation in deep reinforcement learning algorithms. In: International Conference on Machine Learning, pp. 1039–1048. PMLR (2018)
Google Scholar
Cully, A., Clune, J., Tarapore, D., Mouret, J.B.: Robots that can adapt like animals. Nature 521(7553), 503–507 (2015)
Article Google Scholar
De Boer, P.T., Kroese, D.P., Mannor, S., Rubinstein, R.Y.: A tutorial on the cross-entropy method. Ann. Oper. Res. 134(1), 19–67 (2005). https://doi.org/10.1007/s10479-005-5724-z
Article MathSciNet MATH Google Scholar
Fujimoto, S., Hoof, H., Meger, D.: Addressing function approximation error in actor-critic methods. In: International Conference on Machine Learning, pp. 1587–1596. PMLR (2018)
Google Scholar
Grefenstette, J.J., Moriarty, D.E., Schultz, A.C.: Evolutionary algorithms for reinforcement learning. arXiv e-prints, p. arXiv-1106 (2011)
Google Scholar
Haarnoja, T., et al.: Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905 (2018)
Kennedy, J., Eberhart, R.: Particle swarm optimization. In: Proceedings of ICNN’95-International Conference on Neural Networks, vol. 4, pp. 1942–1948. IEEE (1995)
Google Scholar
Khadka, S., et al.: Collaborative evolutionary reinforcement learning. In: International Conference on Machine Learning, pp. 3341–3350. PMLR (2019)
Google Scholar
Khadka, S., Tumer, K.: Evolution-guided policy gradient in reinforcement learning. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp. 1196–1208 (2018)
Google Scholar
Lee, K., Lee, B.U., Shin, U., Kweon, I.S.: An efficient asynchronous method for integrating evolutionary and gradient-based policy search. arXiv preprint arXiv:2012.05417 (2020)
Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015)
Majid, A.Y., Saaybi, S., van Rietbergen, T., Francois-Lavet, V., Prasad, R.V., Verhoeven, C.: Deep reinforcement learning versus evolution strategies: a comparative survey. arXiv preprint arXiv:2110.01411 (2021)
Marchesini, E., Corsi, D., Farinelli, A.: Genetic soft updates for policy evolution in deep reinforcement learning. In: International Conference on Learning Representations (2021)
Google Scholar
Mitchell, M.: An Introduction to Genetic Algorithms. MIT Press, Cambridge (1998)
Book MATH Google Scholar
Mnih, V., et al.: Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013)
Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 6292–6299. IEEE (2018)
Google Scholar
Pierrot, T., et al.: Diversity policy gradient for sample efficient quality-diversity optimization. In: ICLR Workshop on Agent Learning in Open-Endedness (2022)
Google Scholar
Pourchot, A., Sigaud, O.: CEM-RL: combining evolutionary and gradient-based methods for policy search. In: International Conference on Learning Representations (2019)
Google Scholar
Qian, H., Yu, Y.: Derivative-free reinforcement learning: a review. arXiv preprint arXiv:2102.05710 (2021)
Salimans, T., Ho, J., Chen, X., Sidor, S., Sutskever, I.: Evolution strategies as a scalable alternative to reinforcement learning. arXiv preprint arXiv:1703.03864 (2017)
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
Fujimoto, S.: Open-source implementation for TD3. https://github.com/sfujim/TD3 (2018)
Sigaud, O.: Combining evolution and deep reinforcement learning for policy search: a survey. arXiv preprint arXiv:2203.14009 (2022)
Silver, D., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)
Article Google Scholar
Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., Riedmiller, M.: Deterministic policy gradient algorithms. In: International Conference on Machine Learning, pp. 387–395. PMLR (2014)
Google Scholar
Such, F.P., Madhavan, V., Conti, E., Lehman, J., Stanley, K.O., Clune, J.: Deep neuroevolution: genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning. arXiv preprint arXiv:1712.06567 (2017)
Todorov, E., Erez, T., Tassa, Y.: MuJoCo: a physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026–5033. IEEE (2012)
Google Scholar
Wang, D., Tan, D., Liu, L.: Particle swarm optimization algorithm: an overview. Soft. Comput. 22(2), 387–408 (2018)
Article Google Scholar

Download references

Acknowledgments

This work was supported in part by Shanghai Municipal Science and Technology Major Project (No. 2021SHZDZX0103), in part by Ji Hua Laboratory (No. X190011TB190), in part by the Shanghai Engineering Research Center of AI and Robotics, and in part by the Engineering Research Center of AI and Robotics, Ministry of Education, China.

Author information

Authors and Affiliations

Academy for Engineering and Technology, Fudan University, Shanghai, China
Yan Ma, Tianxing Liu, Bingsheng Wei, Yi Liu, Kang Xu & Wei Li
Ji Hua Laboratory, Foshan, China
Wei Li

Authors

Yan Ma
View author publications
You can also search for this author in PubMed Google Scholar
Tianxing Liu
View author publications
You can also search for this author in PubMed Google Scholar
Bingsheng Wei
View author publications
You can also search for this author in PubMed Google Scholar
Yi Liu
View author publications
You can also search for this author in PubMed Google Scholar
Kang Xu
View author publications
You can also search for this author in PubMed Google Scholar
Wei Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wei Li .

Editor information

Editors and Affiliations

Indian Institute of Technology Indore, Indore, India
Mohammad Tanveer
Indian Institute of Information Technology - Allahabad, Prayagraj, India
Sonali Agarwal
Kobe University, Kobe, Japan
Seiichi Ozawa
Indian Institute of Technology Patna, Patna, India
Asif Ekbal
University of Innsbruck, Innsbruck, Austria
Adam Jatowt

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ma, Y., Liu, T., Wei, B., Liu, Y., Xu, K., Li, W. (2023). Evolutionary Action Selection for Gradient-Based Policy Learning. In: Tanveer, M., Agarwal, S., Ozawa, S., Ekbal, A., Jatowt, A. (eds) Neural Information Processing. ICONIP 2022. Lecture Notes in Computer Science, vol 13625. Springer, Cham. https://doi.org/10.1007/978-3-031-30111-7_49

Download citation

DOI: https://doi.org/10.1007/978-3-031-30111-7_49
Published: 13 April 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-30110-0
Online ISBN: 978-3-031-30111-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Evolutionary Action Selection for Gradient-Based Policy Learning

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Proximal evolutionary strategy: improving deep reinforcement learning through evolutionary policy optimization

Policy-based optimization: single-step policy gradient method seen as an evolution strategy

Combining a gradient-based method and an evolution strategy for multi-objective reinforcement learning

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Evolutionary Action Selection for Gradient-Based Policy Learning

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Proximal evolutionary strategy: improving deep reinforcement learning through evolutionary policy optimization

Policy-based optimization: single-step policy gradient method seen as an evolution strategy

Combining a gradient-based method and an evolution strategy for multi-objective reinforcement learning

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation