research-article

Evolutionary reinforcement learning for sparse rewards

Authors:

Francesco Belardinelli,

Borja González LeónAuthors Info & Claims

GECCO '21: Proceedings of the Genetic and Evolutionary Computation Conference Companion

Pages 1508 - 1512

https://doi.org/10.1145/3449726.3463142

Published: 08 July 2021 Publication History

Abstract

Temporal logic (TL) is an expressive way of specifying complex goals in reinforcement learning (RL), which facilitates the design of reward functions. However, the combination of these two techniques is prone to generate sparse rewards, which might hinder the learning process. Evolutionary algorithms (EAs) hold promise in tackling this problem by encouraging the diversification of policies through exploration in the parameter space. In this paper, we present GEATL, the first hybrid on-policy evolutionary-based algorithm that combines the advantages of gradient learning in deep RL with the exploration ability of evolutionary algorithms, in order to solve the sparse reward problem pertaining to TL specifications. We test our approach in a delayed reward scenario. Differently from previous baselines combining RL and TL, we show that GEATL is able to tackle complex TL specifications even in sparse-reward settings.

Supplementary Material

PDF File (p1508-zhu_suppl.pdf)

p1508-zhu_suppl.pdf

Download
2.46 MB

References

[1]

Jacob Andreas, Dan Klein, and Sergey Levine. 2017. Modular multitask reinforcement learning with policy sketches. In International Conference on Machine Learning. 166--175.

[2]

Thomas Back. 1996. Evolutionary algorithms in theory and practice: evolution strategies, evolutionary programming, genetic algorithms. Oxford university press.

[3]

Thomas Bäck and Hans-Paul Schwefel. 1993. An overview of evolutionary algorithms for parameter optimization. Evolutionary computation 1, 1 (1993), 1--23.

[4]

Alberto Camacho, Oscar Chen, Scott Sanner, and Sheila A McIlraith. 2017. Non-markovian rewards expressed in LTL: guiding search via reward shaping. In Tenth Annual Symposium on Combinatorial Search.

[5]

Alberto Camacho, R Toro Icarte, Toryn Q Klassen, Richard Valenzano, and Sheila A McIlraith. 2019. LTL and beyond: Formal languages for reward function specification in reinforcement learning. In Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI). 6065--6073.

[6]

Giuseppe De Giacomo, Luca Iocchi, Marco Favorito, and Fabio Patrizi. 2019. Foundations for restraining bolts: Reinforcement learning with LTLf/LDLf restraining specifications. In Proceedings of the International Conference on Automated Planning and Scheduling, Vol. 29. 128--136.

[7]

Giuseppe De Giacomo and Moshe Y Vardi. 2013. Linear temporal logic and linear dynamic logic on finite traces. In Twenty-Third International Joint Conference on Artificial Intelligence.

[8]

Adrien Ecoffet, Joost Huizinga, Joel Lehman, Kenneth O Stanley, and Jeff Clune. 2019. Go-explore: a new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995 (2019).

[9]

Rodrigo Toro Icarte, Toryn Klassen, Richard Valenzano, and Sheila McIlraith. 2018. Using reward machines for high-level task specification and decomposition in reinforcement learning. In International Conference on Machine Learning. PMLR, 2107--2116.

[10]

Sham Kakade and John Langford. 2002. Approximately optimal approximate reinforcement learning. In ICML, Vol. 2. 267--274.

[11]

Thomas Keller and Patrick Eyerich. 2012. PROST: Probabilistic Planning Based on UCT. In ICAPS. 119--127.

[12]

Shauharda Khadka and Kagan Tumer. 2018. Evolution-guided policy gradient in reinforcement learning. In Advances in Neural Information Processing Systems. 1188--1200.

[13]

Orna Kupferman and Moshe Y Vardi. 2001. Model checking of safety properties. Formal Methods in System Design 19, 3 (2001), 291--314.

Digital Library

[14]

Borja G León and Francesco Belardinelli. 2020. Extended Markov Games to Learn Multiple Tasks in Multi-Agent Reinforcement Learning. arXiv preprint arXiv:2002.06000 (2020).

[15]

Xiao Li, Cristian-Ioan Vasile, and Calin Belta. 2017. Reinforcement learning with temporal logic rewards. In 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 3834--3839.

Digital Library

[16]

Brad L Miller, David E Goldberg, et al. 1995. Genetic algorithms, tournament selection, and the effects of noise. Complex systems 9, 3 (1995), 193--212.

[17]

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. 2013. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013).

[18]

Amir Pnueli. 1977. The temporal logic of programs. In 18th Annual Symposium on Foundations of Computer Science (sfcs 1977). IEEE, 46--57.

Digital Library

[19]

David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. 2016. Mastering the game of Go with deep neural networks and tree search. nature 529, 7587 (2016), 484--489.

[20]

Kenneth O Stanley and Risto Miikkulainen. 2002. Evolving neural networks through augmenting topologies. Evolutionary computation 10, 2 (2002), 99--127.

[21]

Felipe Petroski Such, Vashisht Madhavan, Edoardo Conti, Joel Lehman, Kenneth O Stanley, and Jeff Clune. 2017. Deep neuroevolution: Genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning. arXiv preprint arXiv:1712.06567 (2017).

[22]

Richard S Sutton and Andrew G Barto. 2018. Reinforcement learning: An introduction. MIT press.

Digital Library

[23]

Rodrigo Toro Icarte, Toryn Q Klassen, Richard Valenzano, and Sheila A McIlraith. 2018. Teaching multiple tasks to an RL agent using LTL. In Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems. International Foundation for Autonomous Agents and Multiagent Systems, 452--461.

[24]

Oriol Vinyals, Igor Babuschkin, Wojciech M Czarnecki, Michaël Mathieu, Andrew Dudzik, Junyoung Chung, David H Choi, Richard Powell, Timo Ewalds, Petko Georgiev, et al. 2019. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575, 7782 (2019), 350--354.

Cited By

Bai HCheng RJin Y(2023)Evolutionary Reinforcement Learning: A SurveyIntelligent Computing10.34133/icomputing.00252Online publication date: 10-May-2023
https://doi.org/10.34133/icomputing.0025
Guo YXie XZhao RZhu CYin JLong H(2023)Cooperation and Competition: Flocking with Evolutionary Multi-Agent Reinforcement LearningNeural Information Processing10.1007/978-3-031-30105-6_23(271-283)Online publication date: 13-Apr-2023
https://doi.org/10.1007/978-3-031-30105-6_23
Hu CQiao RGong WYan XWang L(2022)A novelty-search-based evolutionary reinforcement learning algorithm for continuous optimization problemsMemetic Computing10.1007/s12293-022-00375-814:4(451-460)Online publication date: 15-Oct-2022
https://doi.org/10.1007/s12293-022-00375-8
Show More Cited By

Index Terms

Evolutionary reinforcement learning for sparse rewards
1. Computing methodologies
  1. Artificial intelligence
    1. Planning and scheduling
      1. Planning with abstraction and generalization

Recommendations

A surrogate-assisted controller for expensive evolutionary reinforcement learning
Abstract
The integration of Reinforcement Learning (RL) and Evolutionary Algorithms (EAs) aims at simultaneously exploiting the sample efficiency as well as the diversity and robustness of the two paradigms. Recently, hybrid learning frameworks ...
Planning with Q-Values in Sparse Reward Reinforcement Learning
Intelligent Robotics and Applications
Abstract
Learning a policy from sparse rewards is a main challenge in reinforcement learning (RL). The best solutions to this challenge have been via sample inefficient model-free RL algorithms. Model-based RL algorithms are known to be sample efficient ...
Reinforcement learning without rewards

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

GECCO '21: Proceedings of the Genetic and Evolutionary Computation Conference Companion

July 2021

2047 pages

ISBN:9781450383516

DOI:10.1145/3449726

Editor:
Francisco Chicano
University of Malaga
,
General Chair:
Krzysztof Krawiec
Poznan University of Technology

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGEVO: ACM Special Interest Group on Genetic and Evolutionary Computation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 July 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

GECCO '21

Sponsor:

SIGEVO

GECCO '21: Genetic and Evolutionary Computation Conference

July 10 - 14, 2021

Lille, France

Acceptance Rates

Overall Acceptance Rate 1,669 of 4,410 submissions, 38%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
270
Total Downloads

Downloads (Last 12 months)61
Downloads (Last 6 weeks)10

Reflects downloads up to 15 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Bai HCheng RJin Y(2023)Evolutionary Reinforcement Learning: A SurveyIntelligent Computing10.34133/icomputing.00252Online publication date: 10-May-2023
https://doi.org/10.34133/icomputing.0025
Guo YXie XZhao RZhu CYin JLong H(2023)Cooperation and Competition: Flocking with Evolutionary Multi-Agent Reinforcement LearningNeural Information Processing10.1007/978-3-031-30105-6_23(271-283)Online publication date: 13-Apr-2023
https://doi.org/10.1007/978-3-031-30105-6_23
Hu CQiao RGong WYan XWang L(2022)A novelty-search-based evolutionary reinforcement learning algorithm for continuous optimization problemsMemetic Computing10.1007/s12293-022-00375-814:4(451-460)Online publication date: 15-Oct-2022
https://doi.org/10.1007/s12293-022-00375-8
Milano NNolfi S(2022)Qualitative differences between evolutionary strategies and reinforcement learning methods for control of autonomous agentsEvolutionary Intelligence10.1007/s12065-022-00801-317:2(1185-1195)Online publication date: 7-Dec-2022
https://doi.org/10.1007/s12065-022-00801-3
Pretorius KPillay N(2021)Population based Reinforcement Learning2021 IEEE Symposium Series on Computational Intelligence (SSCI)10.1109/SSCI50451.2021.9660084(1-8)Online publication date: 5-Dec-2021
https://doi.org/10.1109/SSCI50451.2021.9660084

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents