research-article

Coordinate ascent MORE with adaptive entropy control for population-based regret minimization

Authors:

Maximilian Hüttenrauch,

Gerhard NeumannAuthors Info & Claims

GECCO '21: Proceedings of the Genetic and Evolutionary Computation Conference Companion

Pages 1493 - 1497

https://doi.org/10.1145/3449726.3463183

Published: 08 July 2021 Publication History

Get Access

Abstract

Model-based Relative Entropy Policy Search (MORE) is a population-based stochastic search algorithm with desirable properties such as a well defined policy search objective, i.e., it optimizes the expected return, and exact closed form information theoretic update rules. This is in contrast with existing population-based methods, that are often referred to as evolutionary strategies, such as CMA-ES. While these methods work very well in practice, the updates of the search distribution are often based on heuristics and they do not optimize the expected return of the population but instead implicitly optimize the return of elite samples, which may yield a poor expected return and unreliable or risky solutions. We show that the MORE algorithm can be improved with distinct updates based on coordinate ascent on the mean and covariance of the search distribution, which considerably improves the convergence speed while maintaining the exact closed form updates. In this way, we can match the performance of elite samples of CMA-ES while also showing a considerably improved performance of the sample average. We evaluate our new algorithm on simulated robotic tasks and compare to the state of the art CMA-ES.

References

[1]

Abbas Abdolmaleki, Rudolf Lioutikov, Jan R Peters, Nuno Lau, Luis Pualo Reis, and Gerhard Neumann. 2015. Model-based relative entropy stochastic search. Advances in Neural Information Processing Systems 28 (2015), 3537--3545.

Google Scholar

[2]

Marc Peter Deisenroth, Gerhard Neumann, Jan Peters, et al. 2013. A Survey on Policy Search for Robotics. Foundations and Trends in Robotics 2, 1-2 (2013), 1--142.

Digital Library

Google Scholar

[3]

Nikolaus Hansen, Youhei Akimoto, and Petr Baudis. 2019. CMA-ES/pycma on Github. Zenodo, https://doi.org/10.5281/zenodo.2559634

Crossref

Google Scholar

[4]

N. Hansen and A. Ostermeier. 2001. Completely derandomized self-adaptation in evolution strategies. Evolutionary computation 9, 2 (2001), 159--195.

Google Scholar

[5]

Auke Jan Ijspeert, Jun Nakanishi, Heiko Hoffmann, Peter Pastor, and Stefan Schaal. 2013. Dynamical movement primitives: learning attractor models for motor behaviors. Neural computation 25, 2 (2013), 328--373.

Google Scholar

[6]

Jens Kober and Jan Peters. 2011. Policy search for motor primitives in robotics. Machine Learning 84, 1 (2011), 171--203.

Digital Library

Google Scholar

[7]

Shie Mannor, Reuven Y Rubinstein, and Yohai Gat. 2003. The cross entropy method for fast policy search. In Proceedings of the 20th International Conference on Machine Learning (ICML-03). 512--519.

Digital Library

Google Scholar

[8]

Emanuel Todorov, Tom Erez, and Yuval Tassa. 2012. Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, 5026--5033.

Crossref

Google Scholar

[9]

Daan Wierstra, Tom Schaul, Tobias Glasmachers, Yi Sun, Jan Peters, and Jürgen Schmidhuber. 2014. Natural evolution strategies. The Journal of Machine Learning Research 15, 1 (2014), 949--980.

Digital Library

Google Scholar

Index Terms

Coordinate ascent MORE with adaptive entropy control for population-based regret minimization
1. Computer systems organization
  1. Embedded and cyber-physical systems
    1. Robotics
2. Computing methodologies
  1. Artificial intelligence
    1. Search methodologies
      1. Continuous space search
  2. Machine learning
    1. Machine learning algorithms

Recommendations

Multi-population differential evolution with adaptive parameter control for global optimization
GECCO '11: Proceedings of the 13th annual conference on Genetic and evolutionary computation

Differential evolution (DE) is one of the most successful evolutionary algorithms (EAs) for global numerical optimization. Like other EAs, maintaining population diversity is important for DE to escape from local optima and locate a near-global optimum. ...
A dual-population genetic algorithm for adaptive diversity control

A variety of previous works exist on maintaining population diversity of genetic algorithms (GAs). Dual-population GA (DPGA) is a type of multipopulation GA (MPGA) that uses an additional population as a reservoir of diversity. The main population is ...
Evolutionary Action Selection for Gradient-Based Policy Learning
Neural Information Processing
Abstract
Evolutionary Algorithms (EAs) and Deep Reinforcement Learning (DRL) have recently been integrated to take advantage of both methods for better exploration and exploitation. The evolutionary part of these hybrid methods maintains a population of ...

Comments

Information & Contributors

Information

Published In

GECCO '21: Proceedings of the Genetic and Evolutionary Computation Conference Companion

July 2021

2047 pages

ISBN:9781450383516

DOI:10.1145/3449726

Editor:
Francisco Chicano
University of Malaga
,
General Chair:
Krzysztof Krawiec
Poznan University of Technology

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 July 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

GECCO '21

Sponsor:

SIGEVO

GECCO '21: Genetic and Evolutionary Computation Conference

July 10 - 14, 2021

Lille, France

Acceptance Rates

Overall Acceptance Rate 1,669 of 4,410 submissions, 38%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
66
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 25 Jan 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

References

Index Terms

Recommendations

Multi-population differential evolution with adaptive parameter control for global optimization

A dual-population genetic algorithm for adaptive diversity control

Evolutionary Action Selection for Gradient-Based Policy Learning

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations