research-article

Deep neuroevolution of recurrent and discrete world models

Authors:

Sebastian Risi,

Kenneth O. StanleyAuthors Info & Claims

GECCO '19: Proceedings of the Genetic and Evolutionary Computation Conference

Pages 456 - 462

https://doi.org/10.1145/3321707.3321817

Published: 13 July 2019 Publication History

Abstract

Neural architectures inspired by our own human cognitive system, such as the recently introduced world models, have been shown to outperform traditional deep reinforcement learning (RL) methods in a variety of different domains. Instead of the relatively simple architectures employed in most RL experiments, world models rely on multiple different neural components that are responsible for visual information processing, memory, and decision-making. However, so far the components of these models have to be trained separately and through a variety of specialized training methods. This paper demonstrates the surprising finding that models with the same precise parts can be instead efficiently trained end-to-end through a genetic algorithm (GA), reaching a comparable performance to the original world model by solving a challenging car racing task. An analysis of the evolved visual and memory system indicates that they include a similar effective representation to the system trained through gradient descent. Additionally, in contrast to gradient descent methods that struggle with discrete variables, GAs also work directly with such representations, opening up opportunities for classical planning in latent space. This paper adds additional evidence on the effectiveness of deep neuroevolution for tasks that require the intricate orchestration of multiple components in complex heterogeneous architectures.

References

[1]

Samuel Alvernaz and Julian Togelius. 2017. Autoencoder-augmented neuroevolution for visual doom playing. In Computational Intelligence and Games (CIG), 2017 IEEE Conference on. IEEE, 1--8.

Digital Library

[2]

Masataro Asai and Alex Fukunaga. 2018. Classical planning in deep latent space: Bridging the subsymbolic-symbolic boundary. In Thirty-Second AAAI Conference on Artificial Intelligence.

[3]

Yoshua Bengio, Nicholas Léonard, and Aaron Courville. 2013. Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432 (2013).

[4]

Konstantinos Chatzilygeroudis, Roberto Rama, Rituraj Kaushik, Dorian Goepp, Vassilis Vassiliades, and Jean-Baptiste Mouret. 2017. Black-box data-efficient policy search for robotics. In 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 51--58.

Digital Library

[5]

Karl Cobbe, Oleg Klimov, Chris Hesse, Taehoon Kim, and John Schulman. 2018. Quantifying generalization in reinforcement learning. arXiv preprint arXiv:1812.02341 (2018).

[6]

Marc Deisenroth and Carl E Rasmussen. 2011. PILCO: A model-based and data-efficient approach to policy search. In Proceedings of the 28th International Conference on machine learning (ICML-11). 465--472.

Digital Library

[7]

Dario Floreano, Peter Dürr, and Claudio Mattiussi. 2008. Neuroevolution: from architectures to learning. Evolutionary Intelligence 1, 1 (2008), 47--62.

[8]

P. Gerber, J. Guan, E. Nunez, K. Phamdo, T. Monsoor, and N. Malaya. 2018. Solving OpenAI's Car Racing Environment with Deep Reinforcement Learning and Dropout. https://github.com/AMD-RIPS/RL-2018/blob/master/documents/nips/nips_2018.pdf

[9]

Alex Graves. 2013. Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850 (2013).

[10]

Alex Graves. 2013. Hallucination with recurrent neural networks. https://www.youtube.com/watch?v=-yX1SYeDHbg&t=49m33s

[11]

Matthew Guzdial, Boyang Li, and Mark O Riedl. 2017. Game Engine Learning from Video. In IJCAI. 3707--3713.

Digital Library

[12]

David Ha and Douglas Eck. 2017. A neural representation of sketch drawings. arXiv preprint arXiv:1704.03477 (2017).

[13]

David Ha and Jürgen Schmidhuber. 2018. Recurrent world models facilitate policy evolution. In Advances in Neural Information Processing Systems. 2455--2467.

Digital Library

[14]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE international conference on computer vision. 1026--1034.

Digital Library

[15]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735--1780.

Digital Library

[16]

Min J. Jang, S. and C. Lee. 2017. Car racing with A3C. https://www.scribd.com/document/358019044/

[17]

Niels Justesen, Philip Bontrager, Julian Togelius, and Sebastian Risi. 2019. Deep learning for video game playing. To appear in: IEEE Transactions on Games (2019).

[18]

Niels Justesen, Ruben Rodriguez Torrado, Philip Bontrager, Ahmed Khalifa, Julian Togelius, and Sebastian Risi. 2018. Illuminating Generalization in Deep Reinforcement Learning through Procedural Level Generation. NeurIPS 2018 Workshop on Deep Reinforcement Learning (2018).

[19]

M. Khan and O. Elibol. 2018. Car racing using reinforcement learning. https://web.stanford.edu/class/cs221/2017/restricted/p-final/elibol/final.pdf.

[20]

Oleg Klimov. 2016. Carracing-v0. https://gym.openai.com/envs/CarRacing-v0/

[21]

Jan Koutník, Jürgen Schmidhuber, and Faustino Gomez. 2014. Evolving deep unsupervised convolutional networks for vision-based reinforcement learning. In Proceedings of the 2014 Annual Conference on Genetic and Evolutionary Computation. ACM, 541--548.

Digital Library

[22]

Joel Lehman, Jay Chen, Jeff Clune, and Kenneth O Stanley. 2017. Safe Mutations for Deep and Recurrent Neural Networks through Output Gradients. arXiv preprint arXiv:1712.06563 (2017).

[23]

Joel Lehman and Kenneth O Stanley. 2008. Exploiting open-endedness to solve problems through the search for novelty. In ALIFE. 329--336.

[24]

Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research 9, Nov (2008), 2579--2605.

[25]

Risto Miikkulainen, Jason Liang, Elliot Meyerson, Aditya Rawal, Daniel Fink, Olivier Francon, Bala Raju, Hormoz Shahrzad, Arshak Navruzyan, Nigel Duffy, et al. 2019. Evolving deep neural networks. In Artificial Intelligence in the Age of Neural Networks and Brain Computing. Elsevier, 293--312.

[26]

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (2015), 529.

[27]

Mohammad Sadegh Norouzzadeh and Jeff Clune. 2016. Neuromodulation improves the evolution of forward models. In Proceedings of the Genetic and Evolutionary Computation Conference 2016. ACM, 157--164.

Digital Library

[28]

Andreas Precht Poulsen, Mark Thorhauge, Mikkel Hvilshj Funch, and Sebastian Risi. 2017. DLNE: A hybridization of deep learning and neuroevolution for visual control. In Computational Intelligence and Games (CIG), 2017 IEEE Conference on. IEEE, 256--263.

Digital Library

[29]

Luc. Prieur. 2017. Deep-Q learning for Box2d racecar RL problem. https://goo.gl/VpDqSw

[30]

Tapani Raiko, Mathias Berglund, Guillaume Alain, and Laurent Dinh. 2014. Techniques for learning binary stochastic feedforward neural networks. arXiv preprint arXiv:1406.2989 (2014).

[31]

Sebastian Risi and Kenneth O Stanley. 2012. An enhanced hypercube-based encoding for evolving the placement, density, and connectivity of neurons. Artificial life 18, 4 (2012), 331--363.

Digital Library

[32]

Sebastian Risi and Julian Togelius. 2017. Neuroevolution in games: State of the art and open challenges. IEEE Transactions on Computational Intelligence and AI in Games 9, 1 (2017), 25--41.

[33]

Jason Tyler Rolfe. 2016. Discrete variational autoencoders. arXiv preprint arXiv:1609.02200 (2016).

[34]

Tim Salimans, Jonathan Ho, Xi Chen, Szymon Sidor, and Ilya Sutskever. 2017. Evolution strategies as a scalable alternative to reinforcement learning. arXiv preprint arXiv:1703.03864 (2017).

[35]

Jürgen Schmidhuber. 1990. An on-line algorithm for dynamic reinforcement learning and planning in reactive environments. In 1990 IJCNN international joint conference on neural networks. IEEE, 253--258.

[36]

John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. 2015. Trust region policy optimization. In International Conference on Machine Learning. 1889--1897.

Digital Library

[37]

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017).

[38]

Kenneth O Stanley, David B D'Ambrosio, and Jason Gauci. 2009. A hypercube-based encoding for evolving large-scale neural networks. Artificial life 15, 2 (2009), 185--212.

Digital Library

[39]

Kenneth O Stanley and Risto Miikkulainen. 2002. Evolving neural networks through augmenting topologies. Evolutionary computation 10, 2 (2002), 99--127.

Digital Library

[40]

Felipe Petroski Such, Vashisht Madhavan, Edoardo Conti, Joel Lehman, Kenneth O Stanley, and Jeff Clune. 2017. Deep neuroevolution: genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning. arXiv preprint arXiv:1712.06567 (2017).

[41]

Felipe Petroski Such, Vashisht Madhavan, Rosanne Liu, Rui Wang, Pablo Samuel Castro, Yulun Li, Ludwig Schubert, Marc Bellemare, Jeff Clune, and Joel Lehman. 2018. An atari model zoo for analyzing, visualizing, and comparing deep reinforcement learning agents. arXiv preprint arXiv:1812.07069 (2018).

[42]

Corentin Tallec, Léonard Blier, and Diviyan Kalainathan. 2018. Reimplementation of World-Models (Ha and Schmidhuber 2018) in pytorch. https://github.com/ctallec/world-models

[43]

Corentin Tallec, Léonard Blier, and Diviyan Kalainathan. 2018. Reproducing "World Models" Is training the recurrent network really needed ? https://ctallec.github.io/world-models/

[44]

Julian Togelius, Georgios N Yannakakis, Kenneth O Stanley, and Cameron Browne. 2011. Search-based procedural content generation: A taxonomy and survey. IEEE Transactions on Computational Intelligence and AI in Games 3, 3 (2011), 172--186.

[45]

Aaron van den Oord, Oriol Vinyals, et al. 2017. Neural discrete representation learning. In Advances in Neural Information Processing Systems. 6306--6315.

Digital Library

[46]

Niklas Wahlström, Thomas B Schön, and Marc Peter Deisenroth. 2015. From pixels to torques: Policy learning with deep dynamical models. arXiv preprint arXiv:1502.02251 (2015).

[47]

Manuel Watter, Jost Springenberg, Joschka Boedecker, and Martin Riedmiller. 2015. Embed to control: A locally linear latent dynamics model for control from raw images. In Advances in neural information processing systems. 2746--2754.

Digital Library

[48]

Greg Wayne, Chia-Chun Hung, David Amos, Mehdi Mirza, Arun Ahuja, Agnieszka Grabska-Barwinska, Jack Rae, Piotr Mirowski, Joel Z Leibo, Adam Santoro, et al. 2018. Unsupervised Predictive Memory in a Goal-Directed Agent. arXiv preprint arXiv:1803.10760 (2018).

[49]

Chiyuan Zhang, Oriol Vinyals, Remi Munos, and Samy Bengio. 2018. A study on overfitting in deep reinforcement learning. arXiv preprint arXiv:1804.06893 (2018).

Cited By

Bai HCheng RJin Y(2023)Evolutionary Reinforcement Learning: A SurveyIntelligent Computing10.34133/icomputing.00252Online publication date: 10-May-2023
https://doi.org/10.34133/icomputing.0025
Tammewar AChaudhari NSaini BVenkatesh DDharahas GVora DPatil SKotecha KAlfarhood S(2023)Improving the Performance of Autonomous Driving through Deep Reinforcement LearningSustainability10.3390/su15181379915:18(13799)Online publication date: 15-Sep-2023
https://doi.org/10.3390/su151813799
Le Clei MBellec PSilva SPaquete L(2023)Generative Adversarial Neuroevolution for Control Behaviour ImitationProceedings of the Companion Conference on Genetic and Evolutionary Computation10.1145/3583133.3590731(663-666)Online publication date: 15-Jul-2023
https://dl.acm.org/doi/10.1145/3583133.3590731
Show More Cited By

Deep neuroevolution of recurrent and discrete world models
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches

Recommendations

Co-evolving recurrent neurons learn deep memory POMDPs
GECCO '05: Proceedings of the 7th annual conference on Genetic and evolutionary computation

Recurrent neural networks are theoretically capable of learning complex temporal sequences, but training them through gradient-descent is too slow and unstable for practical use in reinforcement learning environments. Neuroevolution, the evolution of ...
Recurrent world models facilitate policy evolution
NIPS'18: Proceedings of the 32nd International Conference on Neural Information Processing Systems

A generative recurrent neural network is quickly trained in an unsupervised manner to model popular reinforcement learning environments through compressed spatio-temporal representations. The world model's extracted features are fed into compact and ...
Neuroevolution under unimodal error landscapes: an exploration of the semantic learning machine algorithm
GECCO '18: Proceedings of the Genetic and Evolutionary Computation Conference Companion

Neuroevolution is a field in which evolutionary algorithms are applied with the goal of evolving Neural Networks (NNs). This paper studies different variants of the Semantic Learning Machine (SLM) algorithm, a recently proposed supervised learning ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

GECCO '19: Proceedings of the Genetic and Evolutionary Computation Conference

July 2019

1545 pages

ISBN:9781450361118

DOI:10.1145/3321707

Editor:
Manuel López-Ibáñez
University of Manchester, UK
,
General Chairs:
Anne Auger
Inria and Ecole Polytechnique, France
,
Thomas Stützle
IRIDIA, Université libre de Bruxelles Belgium

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGEVO: ACM Special Interest Group on Genetic and Evolutionary Computation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 July 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Conference

GECCO '19

Sponsor:

SIGEVO

GECCO '19: Genetic and Evolutionary Computation Conference

July 13 - 17, 2019

Prague, Czech Republic

Acceptance Rates

Overall Acceptance Rate 1,669 of 4,410 submissions, 38%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

28
Total Citations
View Citations
475
Total Downloads

Downloads (Last 12 months)28
Downloads (Last 6 weeks)1

Reflects downloads up to 08 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Bai HCheng RJin Y(2023)Evolutionary Reinforcement Learning: A SurveyIntelligent Computing10.34133/icomputing.00252Online publication date: 10-May-2023
https://doi.org/10.34133/icomputing.0025
Tammewar AChaudhari NSaini BVenkatesh DDharahas GVora DPatil SKotecha KAlfarhood S(2023)Improving the Performance of Autonomous Driving through Deep Reinforcement LearningSustainability10.3390/su15181379915:18(13799)Online publication date: 15-Sep-2023
https://doi.org/10.3390/su151813799
Le Clei MBellec PSilva SPaquete L(2023)Generative Adversarial Neuroevolution for Control Behaviour ImitationProceedings of the Companion Conference on Genetic and Evolutionary Computation10.1145/3583133.3590731(663-666)Online publication date: 15-Jul-2023
https://dl.acm.org/doi/10.1145/3583133.3590731
Pigozzi FWoodman SMedvet EKramer-Bottiglio RBongard JSilva SPaquete L(2023)Morphology Choice Affects the Evolution of Affordance Detection in RobotsProceedings of the Genetic and Evolutionary Computation Conference10.1145/3583131.3590505(211-219)Online publication date: 15-Jul-2023
https://dl.acm.org/doi/10.1145/3583131.3590505
Behjat AMaurer NChidambaran SChowdhury S(2023)Adaptive Neuroevolution With Genetic Operator Control and Two-Way Complexity VariationIEEE Transactions on Artificial Intelligence10.1109/TAI.2022.32141814:6(1627-1641)Online publication date: Dec-2023
https://doi.org/10.1109/TAI.2022.3214181
Basterrech SRubino G(2023)Evolutionary Echo State NetworkApplied Soft Computing10.1016/j.asoc.2023.110463144:COnline publication date: 1-Sep-2023
https://dl.acm.org/doi/10.1016/j.asoc.2023.110463
Khamesian SMalek H(2023)Hybrid self-attention NEAT: a novel evolutionary self-attention approach to improve the NEAT algorithm in high dimensional inputsEvolving Systems10.1007/s12530-023-09510-315:2(489-503)Online publication date: 12-Jun-2023
https://doi.org/10.1007/s12530-023-09510-3
Iwami AMorimoto DShiozaki NHiraga MOhkura K(2023)Generating collective behavior of a robotic swarm using an attention agent with deep neuroevolutionArtificial Life and Robotics10.1007/s10015-023-00902-x28:4(669-679)Online publication date: 5-Oct-2023
https://doi.org/10.1007/s10015-023-00902-x
Clei MBellec PWagner M(2022)Neuroevolution of recurrent architectures on control tasksProceedings of the Genetic and Evolutionary Computation Conference Companion10.1145/3520304.3529052(651-654)Online publication date: 9-Jul-2022
https://dl.acm.org/doi/10.1145/3520304.3529052
Tang YTian YHa DWagner M(2022)EvoJAXProceedings of the Genetic and Evolutionary Computation Conference Companion10.1145/3520304.3528770(308-311)Online publication date: 9-Jul-2022
https://dl.acm.org/doi/10.1145/3520304.3528770
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten