Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/3306127.3331813acmconferencesArticle/Chapter ViewAbstractPublication PagesaamasConference Proceedingsconference-collections
research-article

The Body is Not a Given: Joint Agent Policy Learning and Morphology Evolution

Published: 08 May 2019 Publication History

Abstract

Reinforcement learning (RL) has proven to be a powerful paradigm for deriving complex behaviors from simple reward signals in a wide range of environments. When applying RL to continuous control agents in simulated physics environments, the body is usually considered to be part of the environment. However, during evolution the physical body of biological organisms and their controlling brains are co-evolved, thus exploring a much larger space of actuator/controller configurations. Put differently, the intelligence does not reside only in the agent's mind, but also in the design of their body. We propose a method for uncovering strong agents, consisting of a good combination of a body and policy, based on combining RL with an evolutionary procedure. Given the resulting agent, we also propose an approach for identifying the body changes that contributed the most to the agent performance. We use the Shapley value from cooperative game theory to find the fair contribution of individual components, taking into account synergies between components. We evaluate our methods in an environment similar to the the recently proposed Robo-Sumo task, where agents in a software physics simulator compete in tipping over their opponent or pushing them out of the arena. Our results show that the proposed methods are indeed capable of generating strong agents, significantly outperforming baselines that focus on optimizing the agent policy alone. A video is available at: https://youtu.be/CHlecRim9PI

References

[1]
Maruan Al-Shedivat, Trapit Bansal, Yuri Burda, Ilya Sutskever, Igor Mordatch, and Pieter Abbeel. 2017. Continuous adaptation via meta-learning in nonstationary and competitive environments. arXiv preprint arXiv:1710.03641 (2017).
[2]
Yoram Bachrach, Evangelos Markakis, Ezra Resnick, Ariel D Procaccia, Jeffrey S Rosenschein, and Amin Saberi. 2010. Approximating power indices: theoretical and empirical analysis. Autonomous Agents and Multi-Agent Systems, Vol. 20, 2 (2010), 105--122.
[3]
Yoram Bachrach, Reshef Meir, Michal Feldman, and Moshe Tennenholtz. 2012. Solving cooperative reliability games. UAI (2012).
[4]
Yoram Bachrach and Ely Porat. 2010. Path disruption games. In AAMAS .
[5]
Yoram Bachrach, Jeffrey S Rosenschein, and Ely Porat. 2008. Power and stability in connectivity games. In AAMAS .
[6]
D. Bernstein, R. Givan, N. Immerman, and S. Zilberstein. 2002. The complexity of decentralized control of Markov decision processes. (2002), bibinfonumpages819--840 pages.
[7]
Bongard J. SunSpiral V. Lipson H. Cheney, N. 2016. On the Difficulty of Co-Optimizing Morphology and Control in Evolved Virtual Creatures. ALIFE XV, The Fifteenth International Conference on the Synthesis and Simulation of Living Systems (2016).
[8]
MacCurdy R. Clune J. Lipson H. Cheney, N. 2013. Unshackling Evolution: Evolving Soft Robots with Multiple Materials and a Powerful Generative Encoding. In Genetic and Evolutionary Computation Conference (GECCO'13), Amsterdam, The Netherlands.
[9]
Shay B Cohen, Eytan Ruppin, and Gideon Dror. 2005. Feature Selection Based on the Shapley Value. In IJCAI, Vol. 5. 665--670.
[10]
Don Fussell Dan Lessin and Risto Miikkulainen. 2014. Adapting Morphology to Multiple Tasks in Evolved Virtual Creatures. In The Fourteenth International Conference on the Synthesis and Simulation of Living Systems (ALIFE 14) .
[11]
Risto Miikkulainen Dan Lessin, Don Fussell. 2013. Open-Ended Behavioral Complexity for Evolved Virtual Creatures. In Genetic and Evolutionary Computation Conference (GECCO'13), Amsterdam, The Netherlands.
[12]
Anupam Datta, Shayak Sen, and Yair Zick. 2016. Algorithmic transparency via quantitative input influence: Theory and experiments with learning systems. In Security and Privacy (SP), 2016 IEEE Symposium on . IEEE, 598--617.
[13]
Pradeep Dubey. 1975. On the uniqueness of the Shapley value. International Journal of Game Theory, Vol. 4, 3 (1975), 131--139.
[14]
Arpad E. Elo. 1978. The rating of chessplayers, past and present .Arco Pub., New York. http://www.amazon.com/Rating-Chess-Players-Past-Present/dp/0668047216
[15]
Vincent Feltkamp. 1995. Alternative axiomatic characterizations of the Shapley and Banzhaf values. International Journal of Game Theory, Vol. 24, 2 (1995), 179--186.
[16]
Faruk Gul. 1989. Bargaining Foundations of Shapley Value. Econometrica, Vol. 57, 1 (1989), 81--95.
[17]
Eric A. Hansen, Daniel S. Bernstein, and Shlomo Zilberstein. 2004. Dynamic Programming for Partially Observable Stochastic Games. In Proceedings of the 19th National Conference on Artifical Intelligence (AAAI'04). AAAI Press, 709--715.
[18]
Karol Hausman, Jost Tobias Springenberg, Ziyu Wang, Nicolas Heess, and Martin Riedmiller. 2018. Learning an Embedding Space for Transferable Robot Skills. In International Conference on Learning Representations .
[19]
Nicolas Heess, Dhruva TB, Srinivasan Sriram, Jay Lemmon, Josh Merel, Greg Wayne, Yuval Tassa, Tom Erez, Ziyu Wang, S. M. Ali Eslami, Martin A. Riedmiller, and David Silver. 2017. Emergence of Locomotion Behaviours in Rich Environments. CoRR, Vol. abs/1707.02286 (2017). http://arxiv.org/abs/1707.02286
[20]
Nicolas Heess, Gregory Wayne, David Silver, Tim Lillicrap, Tom Erez, and Yuval Tassa. 2015. Learning continuous control policies by stochastic value gradients. In Advances in Neural Information Processing Systems. 2944--2952.
[21]
Jonathan Hiller and Hod Lipson. 2012. Automatic Design and Manufacture of Soft Robots. (2012).
[22]
Max Jaderberg, Wojciech Czarnecki, Iain Dunning, Luke Marris, Guy Lever, Antonio García Casta neda, Charles Beattie, Neil C. Rabinowitz, Ari S. Morcos, Avraham Ruderman, Nicolas Sonnerat, Tim Green, Louise Deason, Joel Z. Leibo, David Silver, Demis Hassabis, Koray Kavukcuoglu, and Thore Graepel. 2018. Human-level performance in first-person multiplayer games with population-based deep reinforcement learning. CoRR, Vol. abs/1807.01281 (2018).
[23]
Max Jaderberg, Valentin Dalibard, Simon Osindero, Wojciech M Czarnecki, Jeff Donahue, Ali Razavi, Oriol Vinyals, Tim Green, Iain Dunning, Karen Simonyan, et almbox. 2017. Population based training of neural networks. arXiv preprint arXiv:1711.09846 (2017).
[24]
Milan Jelisavcic, Rafael Kiesel, Kyrre Glette, Evert Haasdijk, and A. E. Eiben. 2017. Analysis of Lamarckian Evolution in Morphologically Evolving Robots. In Proceedings of the 14th European Conference on Artificial Life. MIT Press, 214--221.
[25]
M.J. Jelisavcic, D.M. Roijers, and A.E. Eiben. 2018. Analysing the Relative Importance of Robot Brains and Bodies. In ALIFE 2018 Proceedings of the Artificial Life Conference 2018 (Artificial Life Conference Proceedings). MIT Press Journals, United States, 327--334.
[26]
Jens Kober, J Andrew Bagnell, and Jan Peters. 2013. Reinforcement learning in robotics: A survey. The International Journal of Robotics Research, Vol. 32, 11 (2013), 1238--1274.
[27]
Sergey Levine, Chelsea Finn, Trevor Darrell, and Pieter Abbeel. 2016. End-to-end training of deep visuomotor policies. The Journal of Machine Learning Research, Vol. 17, 1 (2016), 1334--1373.
[28]
Jiwei Li, Will Monroe, Alan Ritter, Michel Galley, Jianfeng Gao, and Dan Jurafsky. 2016. Deep reinforcement learning for dialogue generation. arXiv preprint arXiv:1606.01541 (2016).
[29]
Michael L. Littman. 1994. Markov Games as a Framework for Multi-Agent Reinforcement Learning. In In Proceedings of the Eleventh International Conference on Machine Learning . Morgan Kaufmann, 157--163.
[30]
Hanxiao Liu, Karen Simonyan, Oriol Vinyals, Chrisantha Fernando, and Koray Kavukcuoglu. 2017. Hierarchical representations for efficient architecture search. arXiv preprint arXiv:1711.00436 (2017).
[31]
Borys Wróbel Michal Joachimczak. 2012. Co-evolution of morphology and control of soft-bodied multicellular animats. In Genetic and Evolutionary Computation Conference (GECCO'12), Philadelphia, Pennsylvania, USA.
[32]
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et almbox. 2015. Human-level control through deep reinforcement learning. Nature, Vol. 518, 7540 (2015), 529.
[33]
Rémi Munos, Tom Stepleton, Anna Harutyunyan, and Marc Bellemare. 2016. Safe and efficient off-policy reinforcement learning. In Advances in Neural Information Processing Systems. 1054--1062.
[34]
Aravind Rajeswaran, Vikash Kumar, Abhishek Gupta, John Schulman, Emanuel Todorov, and Sergey Levine. 2017. Learning complex dexterous manipulation with deep reinforcement learning and demonstrations. arXiv preprint arXiv:1709.10087 (2017).
[35]
Martin Riedmiller, Roland Hafner, Thomas Lampe, Michael Neunert, Jonas Degrave, Tom Van de Wiele, Volodymyr Mnih, Nicolas Heess, and Jost Tobias Springenberg. 2018. Learning by Playing-Solving Sparse Reward Tasks from Scratch. arXiv preprint arXiv:1802.10567 (2018).
[36]
John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. 2015. Trust region policy optimization. In International Conference on Machine Learning . 1889--1897.
[37]
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017).
[38]
Abigail See, Yoram Bachrach, and Pushmeet Kohli. 2014. The cost of principles: analyzing power in compatibility weighted voting games. In AAMAS .
[39]
L. S. Shapley. 1953 a. Stochastic Games. Proceedings of the National Academy of Sciences of the United States of America, Vol. 39, 10 (1953), 1095--1100.
[40]
Lloyd S Shapley. 1953 b. A value for n-person games. (1953).
[41]
David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et almbox. 2016. Mastering the game of Go with deep neural networks and tree search. nature, Vol. 529, 7587 (2016), 484.
[42]
George Gaylord Simpson. 1953. The baldwin effect. Evolution, Vol. 7, 2 (1953), 110--117.
[43]
K. Sims. 1994. Evolving virtual creatures. 21st annual conference on Computer graphics and interactive techniques, SIGGRAPH '94 (1994).
[44]
P Straffin. 1988. The Shapley-Shubik and Banzhaf power indices as probabilities. The Shapley value. Essays in honor of Lloyd S. Shapley (1988), 71--81.
[45]
Emanuel Todorov, Tom Erez, and Yuval Tassa. 2012. Mujoco: A physics engine for model-based control. Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ International Conference on. IEEE, 5026--5033.
[46]
Bruce H Weber and David J Depew. 2003. Evolution and learning: The Baldwin effect reconsidered .Mit Press.
[47]
Darrell Whitley, V Scott Gordon, and Keith Mathias. 1994. Lamarckian evolution, the Baldwin effect and function optimization. In International Conference on Parallel Problem Solving from Nature. Springer, 5--15.

Cited By

View all
  • (2021)Evaluating Strategic Structures in Multi-Agent Inverse Reinforcement LearningJournal of Artificial Intelligence Research10.1613/jair.1.1259471(925-951)Online publication date: 18-Aug-2021

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
AAMAS '19: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems
May 2019
2518 pages
ISBN:9781450363099

Sponsors

Publisher

International Foundation for Autonomous Agents and Multiagent Systems

Richland, SC

Publication History

Published: 08 May 2019

Check for updates

Author Tags

  1. evolutionary computation
  2. reinforcement learning

Qualifiers

  • Research-article

Conference

AAMAS '19
Sponsor:

Acceptance Rates

AAMAS '19 Paper Acceptance Rate 193 of 793 submissions, 24%;
Overall Acceptance Rate 1,155 of 5,036 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 16 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2021)Evaluating Strategic Structures in Multi-Agent Inverse Reinforcement LearningJournal of Artificial Intelligence Research10.1613/jair.1.1259471(925-951)Online publication date: 18-Aug-2021

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media