Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3321707.3321817acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
research-article

Deep neuroevolution of recurrent and discrete world models

Published: 13 July 2019 Publication History

Abstract

Neural architectures inspired by our own human cognitive system, such as the recently introduced world models, have been shown to outperform traditional deep reinforcement learning (RL) methods in a variety of different domains. Instead of the relatively simple architectures employed in most RL experiments, world models rely on multiple different neural components that are responsible for visual information processing, memory, and decision-making. However, so far the components of these models have to be trained separately and through a variety of specialized training methods. This paper demonstrates the surprising finding that models with the same precise parts can be instead efficiently trained end-to-end through a genetic algorithm (GA), reaching a comparable performance to the original world model by solving a challenging car racing task. An analysis of the evolved visual and memory system indicates that they include a similar effective representation to the system trained through gradient descent. Additionally, in contrast to gradient descent methods that struggle with discrete variables, GAs also work directly with such representations, opening up opportunities for classical planning in latent space. This paper adds additional evidence on the effectiveness of deep neuroevolution for tasks that require the intricate orchestration of multiple components in complex heterogeneous architectures.

References

[1]
Samuel Alvernaz and Julian Togelius. 2017. Autoencoder-augmented neuroevolution for visual doom playing. In Computational Intelligence and Games (CIG), 2017 IEEE Conference on. IEEE, 1--8.
[2]
Masataro Asai and Alex Fukunaga. 2018. Classical planning in deep latent space: Bridging the subsymbolic-symbolic boundary. In Thirty-Second AAAI Conference on Artificial Intelligence.
[3]
Yoshua Bengio, Nicholas Léonard, and Aaron Courville. 2013. Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432 (2013).
[4]
Konstantinos Chatzilygeroudis, Roberto Rama, Rituraj Kaushik, Dorian Goepp, Vassilis Vassiliades, and Jean-Baptiste Mouret. 2017. Black-box data-efficient policy search for robotics. In 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 51--58.
[5]
Karl Cobbe, Oleg Klimov, Chris Hesse, Taehoon Kim, and John Schulman. 2018. Quantifying generalization in reinforcement learning. arXiv preprint arXiv:1812.02341 (2018).
[6]
Marc Deisenroth and Carl E Rasmussen. 2011. PILCO: A model-based and data-efficient approach to policy search. In Proceedings of the 28th International Conference on machine learning (ICML-11). 465--472.
[7]
Dario Floreano, Peter Dürr, and Claudio Mattiussi. 2008. Neuroevolution: from architectures to learning. Evolutionary Intelligence 1, 1 (2008), 47--62.
[8]
P. Gerber, J. Guan, E. Nunez, K. Phamdo, T. Monsoor, and N. Malaya. 2018. Solving OpenAI's Car Racing Environment with Deep Reinforcement Learning and Dropout. https://github.com/AMD-RIPS/RL-2018/blob/master/documents/nips/nips_2018.pdf
[9]
Alex Graves. 2013. Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850 (2013).
[10]
Alex Graves. 2013. Hallucination with recurrent neural networks. https://www.youtube.com/watch?v=-yX1SYeDHbg&t=49m33s
[11]
Matthew Guzdial, Boyang Li, and Mark O Riedl. 2017. Game Engine Learning from Video. In IJCAI. 3707--3713.
[12]
David Ha and Douglas Eck. 2017. A neural representation of sketch drawings. arXiv preprint arXiv:1704.03477 (2017).
[13]
David Ha and Jürgen Schmidhuber. 2018. Recurrent world models facilitate policy evolution. In Advances in Neural Information Processing Systems. 2455--2467.
[14]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE international conference on computer vision. 1026--1034.
[15]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735--1780.
[16]
Min J. Jang, S. and C. Lee. 2017. Car racing with A3C. https://www.scribd.com/document/358019044/
[17]
Niels Justesen, Philip Bontrager, Julian Togelius, and Sebastian Risi. 2019. Deep learning for video game playing. To appear in: IEEE Transactions on Games (2019).
[18]
Niels Justesen, Ruben Rodriguez Torrado, Philip Bontrager, Ahmed Khalifa, Julian Togelius, and Sebastian Risi. 2018. Illuminating Generalization in Deep Reinforcement Learning through Procedural Level Generation. NeurIPS 2018 Workshop on Deep Reinforcement Learning (2018).
[19]
M. Khan and O. Elibol. 2018. Car racing using reinforcement learning. https://web.stanford.edu/class/cs221/2017/restricted/p-final/elibol/final.pdf.
[20]
Oleg Klimov. 2016. Carracing-v0. https://gym.openai.com/envs/CarRacing-v0/
[21]
Jan Koutník, Jürgen Schmidhuber, and Faustino Gomez. 2014. Evolving deep unsupervised convolutional networks for vision-based reinforcement learning. In Proceedings of the 2014 Annual Conference on Genetic and Evolutionary Computation. ACM, 541--548.
[22]
Joel Lehman, Jay Chen, Jeff Clune, and Kenneth O Stanley. 2017. Safe Mutations for Deep and Recurrent Neural Networks through Output Gradients. arXiv preprint arXiv:1712.06563 (2017).
[23]
Joel Lehman and Kenneth O Stanley. 2008. Exploiting open-endedness to solve problems through the search for novelty. In ALIFE. 329--336.
[24]
Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research 9, Nov (2008), 2579--2605.
[25]
Risto Miikkulainen, Jason Liang, Elliot Meyerson, Aditya Rawal, Daniel Fink, Olivier Francon, Bala Raju, Hormoz Shahrzad, Arshak Navruzyan, Nigel Duffy, et al. 2019. Evolving deep neural networks. In Artificial Intelligence in the Age of Neural Networks and Brain Computing. Elsevier, 293--312.
[26]
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (2015), 529.
[27]
Mohammad Sadegh Norouzzadeh and Jeff Clune. 2016. Neuromodulation improves the evolution of forward models. In Proceedings of the Genetic and Evolutionary Computation Conference 2016. ACM, 157--164.
[28]
Andreas Precht Poulsen, Mark Thorhauge, Mikkel Hvilshj Funch, and Sebastian Risi. 2017. DLNE: A hybridization of deep learning and neuroevolution for visual control. In Computational Intelligence and Games (CIG), 2017 IEEE Conference on. IEEE, 256--263.
[29]
Luc. Prieur. 2017. Deep-Q learning for Box2d racecar RL problem. https://goo.gl/VpDqSw
[30]
Tapani Raiko, Mathias Berglund, Guillaume Alain, and Laurent Dinh. 2014. Techniques for learning binary stochastic feedforward neural networks. arXiv preprint arXiv:1406.2989 (2014).
[31]
Sebastian Risi and Kenneth O Stanley. 2012. An enhanced hypercube-based encoding for evolving the placement, density, and connectivity of neurons. Artificial life 18, 4 (2012), 331--363.
[32]
Sebastian Risi and Julian Togelius. 2017. Neuroevolution in games: State of the art and open challenges. IEEE Transactions on Computational Intelligence and AI in Games 9, 1 (2017), 25--41.
[33]
Jason Tyler Rolfe. 2016. Discrete variational autoencoders. arXiv preprint arXiv:1609.02200 (2016).
[34]
Tim Salimans, Jonathan Ho, Xi Chen, Szymon Sidor, and Ilya Sutskever. 2017. Evolution strategies as a scalable alternative to reinforcement learning. arXiv preprint arXiv:1703.03864 (2017).
[35]
Jürgen Schmidhuber. 1990. An on-line algorithm for dynamic reinforcement learning and planning in reactive environments. In 1990 IJCNN international joint conference on neural networks. IEEE, 253--258.
[36]
John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. 2015. Trust region policy optimization. In International Conference on Machine Learning. 1889--1897.
[37]
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017).
[38]
Kenneth O Stanley, David B D'Ambrosio, and Jason Gauci. 2009. A hypercube-based encoding for evolving large-scale neural networks. Artificial life 15, 2 (2009), 185--212.
[39]
Kenneth O Stanley and Risto Miikkulainen. 2002. Evolving neural networks through augmenting topologies. Evolutionary computation 10, 2 (2002), 99--127.
[40]
Felipe Petroski Such, Vashisht Madhavan, Edoardo Conti, Joel Lehman, Kenneth O Stanley, and Jeff Clune. 2017. Deep neuroevolution: genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning. arXiv preprint arXiv:1712.06567 (2017).
[41]
Felipe Petroski Such, Vashisht Madhavan, Rosanne Liu, Rui Wang, Pablo Samuel Castro, Yulun Li, Ludwig Schubert, Marc Bellemare, Jeff Clune, and Joel Lehman. 2018. An atari model zoo for analyzing, visualizing, and comparing deep reinforcement learning agents. arXiv preprint arXiv:1812.07069 (2018).
[42]
Corentin Tallec, Léonard Blier, and Diviyan Kalainathan. 2018. Reimplementation of World-Models (Ha and Schmidhuber 2018) in pytorch. https://github.com/ctallec/world-models
[43]
Corentin Tallec, Léonard Blier, and Diviyan Kalainathan. 2018. Reproducing "World Models" Is training the recurrent network really needed ? https://ctallec.github.io/world-models/
[44]
Julian Togelius, Georgios N Yannakakis, Kenneth O Stanley, and Cameron Browne. 2011. Search-based procedural content generation: A taxonomy and survey. IEEE Transactions on Computational Intelligence and AI in Games 3, 3 (2011), 172--186.
[45]
Aaron van den Oord, Oriol Vinyals, et al. 2017. Neural discrete representation learning. In Advances in Neural Information Processing Systems. 6306--6315.
[46]
Niklas Wahlström, Thomas B Schön, and Marc Peter Deisenroth. 2015. From pixels to torques: Policy learning with deep dynamical models. arXiv preprint arXiv:1502.02251 (2015).
[47]
Manuel Watter, Jost Springenberg, Joschka Boedecker, and Martin Riedmiller. 2015. Embed to control: A locally linear latent dynamics model for control from raw images. In Advances in neural information processing systems. 2746--2754.
[48]
Greg Wayne, Chia-Chun Hung, David Amos, Mehdi Mirza, Arun Ahuja, Agnieszka Grabska-Barwinska, Jack Rae, Piotr Mirowski, Joel Z Leibo, Adam Santoro, et al. 2018. Unsupervised Predictive Memory in a Goal-Directed Agent. arXiv preprint arXiv:1803.10760 (2018).
[49]
Chiyuan Zhang, Oriol Vinyals, Remi Munos, and Samy Bengio. 2018. A study on overfitting in deep reinforcement learning. arXiv preprint arXiv:1804.06893 (2018).

Cited By

View all
  • (2023)Evolutionary Reinforcement Learning: A SurveyIntelligent Computing10.34133/icomputing.00252Online publication date: 10-May-2023
  • (2023)Improving the Performance of Autonomous Driving through Deep Reinforcement LearningSustainability10.3390/su15181379915:18(13799)Online publication date: 15-Sep-2023
  • (2023)Generative Adversarial Neuroevolution for Control Behaviour ImitationProceedings of the Companion Conference on Genetic and Evolutionary Computation10.1145/3583133.3590731(663-666)Online publication date: 15-Jul-2023
  • Show More Cited By
  1. Deep neuroevolution of recurrent and discrete world models

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    GECCO '19: Proceedings of the Genetic and Evolutionary Computation Conference
    July 2019
    1545 pages
    ISBN:9781450361118
    DOI:10.1145/3321707
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 July 2019

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Research-article

    Conference

    GECCO '19
    Sponsor:
    GECCO '19: Genetic and Evolutionary Computation Conference
    July 13 - 17, 2019
    Prague, Czech Republic

    Acceptance Rates

    Overall Acceptance Rate 1,669 of 4,410 submissions, 38%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)28
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 08 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Evolutionary Reinforcement Learning: A SurveyIntelligent Computing10.34133/icomputing.00252Online publication date: 10-May-2023
    • (2023)Improving the Performance of Autonomous Driving through Deep Reinforcement LearningSustainability10.3390/su15181379915:18(13799)Online publication date: 15-Sep-2023
    • (2023)Generative Adversarial Neuroevolution for Control Behaviour ImitationProceedings of the Companion Conference on Genetic and Evolutionary Computation10.1145/3583133.3590731(663-666)Online publication date: 15-Jul-2023
    • (2023)Morphology Choice Affects the Evolution of Affordance Detection in RobotsProceedings of the Genetic and Evolutionary Computation Conference10.1145/3583131.3590505(211-219)Online publication date: 15-Jul-2023
    • (2023)Adaptive Neuroevolution With Genetic Operator Control and Two-Way Complexity VariationIEEE Transactions on Artificial Intelligence10.1109/TAI.2022.32141814:6(1627-1641)Online publication date: Dec-2023
    • (2023)Evolutionary Echo State NetworkApplied Soft Computing10.1016/j.asoc.2023.110463144:COnline publication date: 1-Sep-2023
    • (2023)Hybrid self-attention NEAT: a novel evolutionary self-attention approach to improve the NEAT algorithm in high dimensional inputsEvolving Systems10.1007/s12530-023-09510-315:2(489-503)Online publication date: 12-Jun-2023
    • (2023)Generating collective behavior of a robotic swarm using an attention agent with deep neuroevolutionArtificial Life and Robotics10.1007/s10015-023-00902-x28:4(669-679)Online publication date: 5-Oct-2023
    • (2022)Neuroevolution of recurrent architectures on control tasksProceedings of the Genetic and Evolutionary Computation Conference Companion10.1145/3520304.3529052(651-654)Online publication date: 9-Jul-2022
    • (2022)EvoJAXProceedings of the Genetic and Evolutionary Computation Conference Companion10.1145/3520304.3528770(308-311)Online publication date: 9-Jul-2022
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media