Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3321707.3321799acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
research-article

POET: open-ended coevolution of environments and their optimized solutions

Published: 13 July 2019 Publication History
  • Get Citation Alerts
  • Abstract

    How can progress in machine learning and reinforcement learning be automated to generate its own never-ending curriculum of challenges without human intervention? The recent emergence of quality diversity (QD) algorithms offers a glimpse of the potential for such continual open-ended invention. For example, novelty search showcases the benefits of explicit novelty pressure, MAP-Elites and Innovation Engines highlight the advantage of explicit elitism within niches in an otherwise divergent process, and minimal criterion coevolution (MCC) reveals that problems and solutions can coevolve divergently. The Paired Open-Ended Trailblazer (POET) algorithm introduced in this paper combines these principles to produce a practical approach to generating an endless progression of diverse and increasingly challenging environments while at the same time explicitly optimizing their solutions. An intriguing implication is the opportunity to transfer solutions among environments, reflecting the view that innovation is a circuitous and unpredictable process. POET is tested in a 2-D obstacles course domain, where it generates diverse and sophisticated behaviors that create and solve a wide range of environmental challenges, many of which cannot be solved by direct optimization, or by a direct-path curriculum-building control algorithm. We hope that POET will inspire a new push towards open-ended discovery across many domains.

    Supplementary Material

    PDF File (p142-wang_suppl.pdf)
    Supplemental material.

    References

    [1]
    Charles W. Anderson. 1989. Learning to Control an Inverted Pendulum Using Neural Networks. IEEE Control Systems Magazine 9 (1989), 31--37.
    [2]
    Mark Bedau. 2008. The Arrow of Complexity Hypothesis (Abstract). In Proceedings of the Eleventh International Conference on Artificial Life (Alife XI), Seth Bullock, Jason Noble, Richard Watson, and Mark Bedau (Eds.). MIT Press, Cambridge, MA, 750. http://www.alifexi.org/papers/ALIFExi-abstracts-0010.pdf
    [3]
    Marc G Bellemare, Yavar Naddaf, Joel Veness, and Michael Bowling. 2013. The Arcade Learning Environment: An evaluation platform for general agents. J. Artif. Intell. Res.(JAIR) 47 (2013), 253--279.
    [4]
    Yoshua Bengio, Jérôme Louradour, Ronan Collobert, and Jason Weston. 2009. Curriculum learning. In Proceedings of the 26th annual international conference on machine learning. ACM, 41--48.
    [5]
    Jonathan C. Brant and Kenneth O. Stanley. 2017. Minimal Criterion Coevolution: A New Approach to Open-Ended Search. In Proceedings of the 2017 on Genetic and Evolutionary Computation Conference (GECCO). 67--74.
    [6]
    Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. 2016. OpenAI Gym. (2016). arXiv:arXiv:1606.01540
    [7]
    N. Cheney, R. MacCurdy, J. Clune, and H. Lipson. 2013. Unshackling evolution: evolving soft robots with multiple materials and a powerful generative encoding. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2013). ACM Press, New York, NY.
    [8]
    Edoardo Conti, Vashisht Madhavan, Felipe Petroski Such, Joel Lehman, Kenneth Stanley, and Jeff Clune. 2018. Improving Exploration in Evolution Strategies for Deep Reinforcement Learning via a Population of Novelty-Seeking Agents. In Advances in Neural Information Processing Systems (NeurIPS) 31. 5032--5043.
    [9]
    A. Cully, J. Clune, D. Tarapore, and J.-B. Mouret. 2015. Robots that can adapt like animals. Nature 521 (2015), 503--507.
    [10]
    Edwin D De Jong. 2004. The incremental pareto-coevolution archive. In Genetic and Evolutionary Computation Conference. Springer, 525--536.
    [11]
    Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. 2009 IEEE Conference on Computer Vision and Pattern Recognition (2009), 248--255.
    [12]
    Adrien Ecoffet, Joost Huizinga, Joel Lehman, Kenneth O. Stanley, and Jeff Clune. 2019. Go-Explore: a New Approach for Hard-Exploration Problems. arXiv preprint arXiv:1901.10995 (2019).
    [13]
    Benjamin Eysenbach, Abhishek Gupta, Julian Ibarz, and Sergey Levine. 2018. Diversity is All You Need: Learning Skills without a Reward Function. arXiv preprint arXiv:1802.06070 (2018).
    [14]
    S.G. Ficici and J.B. Pollack. 1998. Challenges in coevolutionary learning: Armsrace dynamics, open-endedness, and mediocre stable states. Artificial life VI (1998), 238.
    [15]
    Carlos Florensa, David Held, Xinyang Geng, and Pieter Abbeel. 2018. Automatic goal generation for reinforcement learning agents. In International Conference on Machine Learning. 1514--1523.
    [16]
    Carlos Florensa, David Held, Markus Wulfmeier, Michael Zhang, and Pieter Abbeel. 2017. Reverse Curriculum Generation for Reinforcement Learning. In Proceedings of the 1st Annual Conference on Robot Learning. 482--495.
    [17]
    Sébastien Forestier, Yoan Mollard, and Pierre-Yves Oudeyer. 2017. Intrinsically motivated goal exploration processes with automatic curriculum learning. arXiv preprint arXiv:1708.02190 (2017).
    [18]
    Faustino Gomez and Risto Miikkulainen. 1997. Incremental Evolution of Complex General Behavior. Adaptive Behavior 5 (1997), 317--342. gomez:ab97
    [19]
    Abhishek Gupta, Benjamin Eysenbach, Chelsea Finn, and Sergey Levine. 2018. Unsupervised Meta-Learning for Reinforcement Learning. arXiv (2018). arXiv:1806.04640 http://arxiv.org/abs/1806.04640
    [20]
    David Ha. 2017. Evolving stable strategies. http://blog.otoro.net/. (2017).
    [21]
    David Ha. 2018. Reinforcement Learning for Improving Agent Design. arXiv preprint arXiv:1810.03779 (2018).
    [22]
    Nicolas Heess, Srinivasan Sriram, Jay Lemmon, Josh Merel, Greg Wayne, Yuval Tassa, Tom Erez, Ziyu Wang, Ali Eslami, Martin Riedmiller, et al. 2017. Emergence of locomotion behaviours in rich environments. arXiv preprint arXiv:1707.02286 (2017).
    [23]
    Nicolas Heess, Dhruva TB, Srinivasan Sriram, Jay Lemmon, Josh Merel, Greg Wayne, Yuval Tassa, Tom Erez, Ziyu Wang, S. M. Ali Eslami, Martin A. Riedmiller, and David Silver. 2017. Emergence of Locomotion Behaviours in Rich Environments. CoRR abs/1707.02286 (2017). arXiv:1707.02286 http://arxiv.org/abs/1707.02286
    [24]
    Joost Huizinga and Jeff Clune. 2018. Evolving Multimodal Robot Behavior via Many Stepping Stones with the Combinatorial Multi-Objective Evolutionary Algorithm. arXiv preprint arXiv:1807.03392 (2018).
    [25]
    Joost Huizinga, Jean-Baptiste Mouret, and Jeff Clune. 2016. Does Aligning Phenotypic and Genotypic Modularity Improve the Evolution of Neural Networks?. In Proceedings of the 2016 on Genetic and Evolutionary Computation Conference (GECCO). 125--132.
    [26]
    Edwin D De Jong and Jordan B Pollack. 2004. Ideal evaluation from coevolution. Evolutionary computation 12, 2 (2004), 159--192.
    [27]
    Niels Justesen, Ruben Rodriguez Torrado, Philip Bontrager, Ahmed Khalifa, Julian Togelius, and Sebastian Risi. 2018. Illuminating Generalization in Deep Reinforcement Learning through Procedural Level Generation. arXiv preprint arXiv:1806.10729 (2018).
    [28]
    Andrej Karpathy and Michiel Van De Panne. 2012. Curriculum learning for motor skills. In Canadian Conference on Artificial Intelligence. Springer, 325--330.
    [29]
    Diederik Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
    [30]
    O. Klimov. 2016. BipedalWalkerHardcore-v2. https://gym.openai.com. (2016).
    [31]
    Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. 1097--1105.
    [32]
    W. B. Langdon. 2005. Pfeiffer - A Distributed Open-ended Evolutionary System. In AISB'05: Proceedings of the Joint Symposium on Socially Inspired Computing (METAS 2005), Bruce Edmonds, Nigel Gilbert, Steven Gustafson, David Hales, and Natalio Krasnogor (Eds.). 7--13. http://www.cs.ucl.ac.uk/staf/W.Langdon/ftp/papers/wbl_metas2005.pdf
    [33]
    Yann LeCun and Corinna Cortes. 1998. The MNIST database of handwritten digits. (1998).
    [34]
    Joel Lehman, Jay Chen, Jeff Clune, and Kenneth O. Stanley. 2018. ES is More Than Just a Traditional Finite-difference Approximator. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO '18). ACM, New York, NY, USA, 450--457.
    [35]
    Joel Lehman and Kenneth O. Stanley. 2010. Revising the evolutionary computation abstraction: minimal criteria novelty search. In Proceedings of the 12th annual conference on Genetic and evolutionary computation (GECCO '10). ACM, New York, NY, USA, 103--110.
    [36]
    Joel Lehman and Kenneth O. Stanley. 2011. Abandoning Objectives: Evolution through the Search for Novelty Alone. Evolutionary Computation 19, 2 (2011), 189--223.
    [37]
    Joel Lehman and Kenneth O. Stanley. 2011. Evolving a diversity of virtual creatures through novelty search and local competition. In GECCO '11: Proceedings of the 13th annual conference on Genetic and evolutionary computation. 211--218.
    [38]
    Joel Lehman and Kenneth O. Stanley. 2011. Novelty Search and the Problem with Objectives. In Genetic Programming Theory and Practice IX (GPTP 2011).
    [39]
    Tambet Matiisen, Avital Oliver, Taco Cohen, and John Schulman. 2017. Teacher-Student Curriculum Learning. arXiv preprint arXiv:1707.00183 (2017).
    [40]
    Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. 2016. Asynchronous methods for deep reinforcement learning. In ICML. 1928--1937.
    [41]
    Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. 2015. Human-level control through deep reinforcement learning. Nature 518, 7540 (2015), 529--533.
    [42]
    Jean-Baptiste Mouret and Jeff Clune. 2015. Illuminating search spaces by mapping elites. arXiv preprint arXiv:1504.04909 (2015).
    [43]
    Anh Nguyen, Jason Yosinski, and Jeff Clune. 2016. Understanding Innovation Engines: Automated Creativity and Improved Stochastic Optimization via Deep Learning. Evolutionary Computation 24, 3 (2016), 545--572.
    [44]
    Elena Popovici, Anthony Bucci, R. Paul Wiegand, and Edwin D. De Jong. 2012. Coevolutionary Principles. Springer Berlin Heidelberg, Berlin, Heidelberg, 987-- 1033.
    [45]
    Justin K Pugh, Lisa B. Soros, and Kenneth O. Stanley. 2016. Quality Diversity: A New Frontier for Evolutionary Computation. 3, 40 (2016).
    [46]
    Ingo Rechenberg. 1978. Evolutionsstrategien. In Simulationsmethoden in der Medizin und Biologie. 83--114.
    [47]
    Tim Salimans, Jonathan Ho, Xi Chen, and Ilya Sutskever. 2017. Evolution strategies as a scalable alternative to reinforcement learning. arXiv preprint arXiv:1703.03864 (2017).
    [48]
    Nikolay Savinov, Anton Raichuk, Raphael Marinier, Damien Vincent, Marc Pollefeys, Timothy Lillicrap, and Sylvain Gelly. 2018. EPISODIC CURIOSITY THROUGH REACHABILITY. arXiv preprint arXiv:1810.0227 (2018).
    [49]
    Jürgen Schmidhuber. 2013. Powerplay: Training an increasingly general problem solver by continually searching for the simplest still unsolvable problem. Frontiers in psychology 4 (2013), 313.
    [50]
    John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. 2015. Trust region policy optimization. In International Conference on Machine Learning. 1889--1897.
    [51]
    Frank Sehnke, Christian Osendorfer, Thomas Rückstieß, Alex Graves, Jan Peters, and Jürgen Schmidhuber. 2010. Parameter-exploring policy gradients. Neural Networks 23, 4 (2010), 551--559.
    [52]
    Noor Shaker, Julian Togelius, and Mark J Nelson. 2016. Procedural content generation in games. Springer.
    [53]
    L.B. Soros, Nick Cheney, and Kenneth O Stanley. 2016. How the Strictness of the Minimal Criterion Impacts Open-Ended Evolution. In ALIFE 15: The Fifteenth International Conference on the Synthesis and Simulation of Living Systems. 208--215.
    [54]
    L.B. Soros and Kenneth O Stanley. 2014. Identifying Necessary Conditions for Open-Ended Evolution through the Artificial Life World of Chromaria. In ALIFE 14: The Fourteenth International Conference on the Synthesis and Simulation of Living Systems. 793--800.
    [55]
    Russell K Standish. 2003. Open-ended artificial evolution. International Journal of Computational Intelligence and Applications 3, 02 (2003), 167--175.
    [56]
    Kenneth O. Stanley. 2007. Compositional Pattern Producing Networks: A Novel Abstraction of Development. Genetic Programming and Evolvable Machines Special Issue on Developmental Systems 8, 2 (2007), 131--162.
    [57]
    Kenneth O Stanley and Joel Lehman. 2015. Why Greatness Cannot Be Planned. (2015).
    [58]
    Kenneth O. Stanley, Joel Lehman, and Lisa Soros. 2017. Open-endedness: The last grand challenge you've never heard of. O'Reilly Online (2017). https://www.oreilly.com/ideas/open-endedness-the-last-grand-challenge-youve-never-heard-of
    [59]
    Richard S Sutton and Andrew G Barto. 1998. Reinforcement learning: An introduction. Vol. 1.
    [60]
    Tim Taylor, Mark Bedau, Alastair Channon, and David Ackley et al. 2016. Open-Ended Evolution: Perspectives from the OEE Workshop in York. Artificial life 22, 3 (2016), 408âĂŞ423.
    [61]
    Julian Togelius, Georgios N Yannakakis, Kenneth O Stanley, and Cameron Browne. 2011. Search-based procedural content generation: A taxonomy and survey. IEEE Transactions on Computational Intelligence and AI in Games 3, 3 (2011), 172--186.
    [62]
    R. Paul Wiegand, William C. Liles, and Kenneth A. De Jong. 2001. An Empirical Analysis of Collaboration Methods in Cooperative Coevolutionary Algorithms. In Proceedings of the 3rd Annual Conference on Genetic and Evolutionary Computation (GECCO'01). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1235--1242. http://dl.acm.org/citation.cfm?id=2955239.2955458
    [63]
    Daan Wierstra, Tom Schaul, Jan Peters, and Juergen Schmidhuber. 2008. Natural evolution strategies. In Evolutionary Computation, 2008. 3381--3387.
    [64]
    Ronald J Williams. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning 8, 3--4 (1992), 229--256.
    [65]
    Brian G. Woolley and Kenneth O. Stanley. 2011. On the deleterious effects of a priori objectives on evolution and representation. In GECCO '11: Proceedings of the 13th annual conference on Genetic and evolutionary computation. ACM, Dublin, Ireland, 957--964. https://doi.org/
    [66]
    Xingwen Zhang, Jeff Clune, and Kenneth O. Stanley. 2017. On the Relationship Between the OpenAI Evolution Strategy and Stochastic Gradient Descent. arXiv preprint arXiv:1712.06564 (2017).

    Cited By

    View all
    • (2024)Evolving Generalist Controllers to Handle a Wide Range of Morphological VariationsProceedings of the Genetic and Evolutionary Computation Conference10.1145/3638529.3654116(1137-1145)Online publication date: 14-Jul-2024
    • (2024)Large Language Model-based Test Case Generation for GP AgentsProceedings of the Genetic and Evolutionary Computation Conference10.1145/3638529.3654056(914-923)Online publication date: 14-Jul-2024
    • (2024)Open-endedness in synthetic biology: A route to continual innovation for biological designScience Advances10.1126/sciadv.adi362110:3Online publication date: 19-Jan-2024
    • Show More Cited By

    Index Terms

    1. POET: open-ended coevolution of environments and their optimized solutions

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      GECCO '19: Proceedings of the Genetic and Evolutionary Computation Conference
      July 2019
      1545 pages
      ISBN:9781450361118
      DOI:10.1145/3321707
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 13 July 2019

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. artificial life
      2. coevolution
      3. evolution strategies
      4. novelty search
      5. open-ended evolution

      Qualifiers

      • Research-article

      Conference

      GECCO '19
      Sponsor:
      GECCO '19: Genetic and Evolutionary Computation Conference
      July 13 - 17, 2019
      Prague, Czech Republic

      Acceptance Rates

      Overall Acceptance Rate 1,669 of 4,410 submissions, 38%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)89
      • Downloads (Last 6 weeks)14
      Reflects downloads up to 26 Jul 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Evolving Generalist Controllers to Handle a Wide Range of Morphological VariationsProceedings of the Genetic and Evolutionary Computation Conference10.1145/3638529.3654116(1137-1145)Online publication date: 14-Jul-2024
      • (2024)Large Language Model-based Test Case Generation for GP AgentsProceedings of the Genetic and Evolutionary Computation Conference10.1145/3638529.3654056(914-923)Online publication date: 14-Jul-2024
      • (2024)Open-endedness in synthetic biology: A route to continual innovation for biological designScience Advances10.1126/sciadv.adi362110:3Online publication date: 19-Jan-2024
      • (2024)Automated quantum software engineeringAutomated Software Engineering10.1007/s10515-024-00436-x31:1Online publication date: 12-Apr-2024
      • (2024)A co-evolutionary algorithm with adaptive penalty function for constrained optimizationSoft Computing10.1007/s00500-024-09896-5Online publication date: 26-Jul-2024
      • (2023)Iteratively learn diverse strategies with state distance informationProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3667113(22841-22869)Online publication date: 10-Dec-2023
      • (2023)Evolutionary Reinforcement Learning: A SurveyIntelligent Computing10.34133/icomputing.00252Online publication date: 10-May-2023
      • (2023)Evolution of Neural NetworksProceedings of the Companion Conference on Genetic and Evolutionary Computation10.1145/3583133.3595036(1008-1025)Online publication date: 15-Jul-2023
      • (2023)Understanding the Synergies between Quality-Diversity and Deep Reinforcement LearningProceedings of the Genetic and Evolutionary Computation Conference10.1145/3583131.3590388(1212-1220)Online publication date: 15-Jul-2023
      • (2023)Transfer Dynamics in Emergent Evolutionary CurriculaIEEE Transactions on Games10.1109/TG.2022.315102515:2(157-170)Online publication date: Jul-2023
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media