Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Evolutionary Action Selection forĀ Gradient-Based Policy Learning

  • Conference paper
  • First Online:
Neural Information Processing (ICONIP 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13625))

Included in the following conference series:

Abstract

Evolutionary Algorithms (EAs) and Deep Reinforcement Learning (DRL) have recently been integrated to take advantage of both methods for better exploration and exploitation. The evolutionary part of these hybrid methods maintains a population of policy networks. However, existing methods focus on optimizing the parameters of policy network, which is usually high-dimensional and tricky for EA. In this paper, we shift the target of evolution from high-dimensional parameter space to low-dimensional action space. We propose Evolutionary Action Selection-Twin Delayed Deep Deterministic Policy Gradient (EAS-TD3), a novel hybrid method of EA and DRL. In EAS, we focus on optimizing the action chosen by the policy network and attempt to obtain high-quality actions to promote policy learning through an evolutionary algorithm. We conduct several experiments on challenging continuous control tasks. The result shows that EAS-TD3 shows superior performance over other state-of-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Ackley, D.: Interactions between learning and evolution. Artif. Life II 10, 487ā€“509 (1992)

    Google ScholarĀ 

  2. Bodnar, C., Day, B., LiĆ³, P.: Proximal distilled evolutionary reinforcement learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 3283ā€“3290 (2020)

    Google ScholarĀ 

  3. Brockman, G., et al.: OpenAI gym. arXiv preprint arXiv:1606.01540 (2016)

  4. Casas, N.: Deep deterministic policy gradient for urban traffic light control. arXiv preprint arXiv:1703.09035 (2017)

  5. Colas, C., Sigaud, O., Oudeyer, P.Y.: GEP-PG: decoupling exploration and exploitation in deep reinforcement learning algorithms. In: International Conference on Machine Learning, pp. 1039ā€“1048. PMLR (2018)

    Google ScholarĀ 

  6. Cully, A., Clune, J., Tarapore, D., Mouret, J.B.: Robots that can adapt like animals. Nature 521(7553), 503ā€“507 (2015)

    ArticleĀ  Google ScholarĀ 

  7. De Boer, P.T., Kroese, D.P., Mannor, S., Rubinstein, R.Y.: A tutorial on the cross-entropy method. Ann. Oper. Res. 134(1), 19ā€“67 (2005). https://doi.org/10.1007/s10479-005-5724-z

    ArticleĀ  MathSciNetĀ  MATHĀ  Google ScholarĀ 

  8. Fujimoto, S., Hoof, H., Meger, D.: Addressing function approximation error in actor-critic methods. In: International Conference on Machine Learning, pp. 1587ā€“1596. PMLR (2018)

    Google ScholarĀ 

  9. Grefenstette, J.J., Moriarty, D.E., Schultz, A.C.: Evolutionary algorithms for reinforcement learning. arXiv e-prints, p. arXiv-1106 (2011)

    Google ScholarĀ 

  10. Haarnoja, T., et al.: Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905 (2018)

  11. Kennedy, J., Eberhart, R.: Particle swarm optimization. In: Proceedings of ICNNā€™95-International Conference on Neural Networks, vol. 4, pp. 1942ā€“1948. IEEE (1995)

    Google ScholarĀ 

  12. Khadka, S., et al.: Collaborative evolutionary reinforcement learning. In: International Conference on Machine Learning, pp. 3341ā€“3350. PMLR (2019)

    Google ScholarĀ 

  13. Khadka, S., Tumer, K.: Evolution-guided policy gradient in reinforcement learning. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp. 1196ā€“1208 (2018)

    Google ScholarĀ 

  14. Lee, K., Lee, B.U., Shin, U., Kweon, I.S.: An efficient asynchronous method for integrating evolutionary and gradient-based policy search. arXiv preprint arXiv:2012.05417 (2020)

  15. Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015)

  16. Majid, A.Y., Saaybi, S., van Rietbergen, T., Francois-Lavet, V., Prasad, R.V., Verhoeven, C.: Deep reinforcement learning versus evolution strategies: a comparative survey. arXiv preprint arXiv:2110.01411 (2021)

  17. Marchesini, E., Corsi, D., Farinelli, A.: Genetic soft updates for policy evolution in deep reinforcement learning. In: International Conference on Learning Representations (2021)

    Google ScholarĀ 

  18. Mitchell, M.: An Introduction to Genetic Algorithms. MIT Press, Cambridge (1998)

    BookĀ  MATHĀ  Google ScholarĀ 

  19. Mnih, V., et al.: Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013)

  20. Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 6292ā€“6299. IEEE (2018)

    Google ScholarĀ 

  21. Pierrot, T., et al.: Diversity policy gradient for sample efficient quality-diversity optimization. In: ICLR Workshop on Agent Learning in Open-Endedness (2022)

    Google ScholarĀ 

  22. Pourchot, A., Sigaud, O.: CEM-RL: combining evolutionary and gradient-based methods for policy search. In: International Conference on Learning Representations (2019)

    Google ScholarĀ 

  23. Qian, H., Yu, Y.: Derivative-free reinforcement learning: a review. arXiv preprint arXiv:2102.05710 (2021)

  24. Salimans, T., Ho, J., Chen, X., Sidor, S., Sutskever, I.: Evolution strategies as a scalable alternative to reinforcement learning. arXiv preprint arXiv:1703.03864 (2017)

  25. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)

  26. Fujimoto, S.: Open-source implementation for TD3. https://github.com/sfujim/TD3 (2018)

  27. Sigaud, O.: Combining evolution and deep reinforcement learning for policy search: a survey. arXiv preprint arXiv:2203.14009 (2022)

  28. Silver, D., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484ā€“489 (2016)

    ArticleĀ  Google ScholarĀ 

  29. Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., Riedmiller, M.: Deterministic policy gradient algorithms. In: International Conference on Machine Learning, pp. 387ā€“395. PMLR (2014)

    Google ScholarĀ 

  30. Such, F.P., Madhavan, V., Conti, E., Lehman, J., Stanley, K.O., Clune, J.: Deep neuroevolution: genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning. arXiv preprint arXiv:1712.06567 (2017)

  31. Todorov, E., Erez, T., Tassa, Y.: MuJoCo: a physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026ā€“5033. IEEE (2012)

    Google ScholarĀ 

  32. Wang, D., Tan, D., Liu, L.: Particle swarm optimization algorithm: an overview. Soft. Comput. 22(2), 387ā€“408 (2018)

    ArticleĀ  Google ScholarĀ 

Download references

Acknowledgments

This work was supported in part by Shanghai Municipal Science and Technology Major Project (No. 2021SHZDZX0103), in part by Ji Hua Laboratory (No. X190011TB190), in part by the Shanghai Engineering Research Center of AI and Robotics, and in part by the Engineering Research Center of AI and Robotics, Ministry of Education, China.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wei Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

Ā© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ma, Y., Liu, T., Wei, B., Liu, Y., Xu, K., Li, W. (2023). Evolutionary Action Selection forĀ Gradient-Based Policy Learning. In: Tanveer, M., Agarwal, S., Ozawa, S., Ekbal, A., Jatowt, A. (eds) Neural Information Processing. ICONIP 2022. Lecture Notes in Computer Science, vol 13625. Springer, Cham. https://doi.org/10.1007/978-3-031-30111-7_49

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-30111-7_49

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-30110-0

  • Online ISBN: 978-3-031-30111-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics