research-article

The Body is Not a Given: Joint Agent Policy Learning and Morphology Evolution

Authors:

Yoram Bachrach,

Chrisantha Fernando,

Pushmeet Kohli,

Thore GraepelAuthors Info & Claims

AAMAS '19: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems

Pages 1134 - 1142

Published: 08 May 2019 Publication History

Abstract

Reinforcement learning (RL) has proven to be a powerful paradigm for deriving complex behaviors from simple reward signals in a wide range of environments. When applying RL to continuous control agents in simulated physics environments, the body is usually considered to be part of the environment. However, during evolution the physical body of biological organisms and their controlling brains are co-evolved, thus exploring a much larger space of actuator/controller configurations. Put differently, the intelligence does not reside only in the agent's mind, but also in the design of their body. We propose a method for uncovering strong agents, consisting of a good combination of a body and policy, based on combining RL with an evolutionary procedure. Given the resulting agent, we also propose an approach for identifying the body changes that contributed the most to the agent performance. We use the Shapley value from cooperative game theory to find the fair contribution of individual components, taking into account synergies between components. We evaluate our methods in an environment similar to the the recently proposed Robo-Sumo task, where agents in a software physics simulator compete in tipping over their opponent or pushing them out of the arena. Our results show that the proposed methods are indeed capable of generating strong agents, significantly outperforming baselines that focus on optimizing the agent policy alone. A video is available at: https://youtu.be/CHlecRim9PI

References

[1]

Maruan Al-Shedivat, Trapit Bansal, Yuri Burda, Ilya Sutskever, Igor Mordatch, and Pieter Abbeel. 2017. Continuous adaptation via meta-learning in nonstationary and competitive environments. arXiv preprint arXiv:1710.03641 (2017).

[2]

Yoram Bachrach, Evangelos Markakis, Ezra Resnick, Ariel D Procaccia, Jeffrey S Rosenschein, and Amin Saberi. 2010. Approximating power indices: theoretical and empirical analysis. Autonomous Agents and Multi-Agent Systems, Vol. 20, 2 (2010), 105--122.

Digital Library

[3]

Yoram Bachrach, Reshef Meir, Michal Feldman, and Moshe Tennenholtz. 2012. Solving cooperative reliability games. UAI (2012).

Digital Library

[4]

Yoram Bachrach and Ely Porat. 2010. Path disruption games. In AAMAS .

Digital Library

[5]

Yoram Bachrach, Jeffrey S Rosenschein, and Ely Porat. 2008. Power and stability in connectivity games. In AAMAS .

Digital Library

[6]

D. Bernstein, R. Givan, N. Immerman, and S. Zilberstein. 2002. The complexity of decentralized control of Markov decision processes. (2002), bibinfonumpages819--840 pages.

Digital Library

[7]

Bongard J. SunSpiral V. Lipson H. Cheney, N. 2016. On the Difficulty of Co-Optimizing Morphology and Control in Evolved Virtual Creatures. ALIFE XV, The Fifteenth International Conference on the Synthesis and Simulation of Living Systems (2016).

[8]

MacCurdy R. Clune J. Lipson H. Cheney, N. 2013. Unshackling Evolution: Evolving Soft Robots with Multiple Materials and a Powerful Generative Encoding. In Genetic and Evolutionary Computation Conference (GECCO'13), Amsterdam, The Netherlands.

Digital Library

[9]

Shay B Cohen, Eytan Ruppin, and Gideon Dror. 2005. Feature Selection Based on the Shapley Value. In IJCAI, Vol. 5. 665--670.

Digital Library

[10]

Don Fussell Dan Lessin and Risto Miikkulainen. 2014. Adapting Morphology to Multiple Tasks in Evolved Virtual Creatures. In The Fourteenth International Conference on the Synthesis and Simulation of Living Systems (ALIFE 14) .

[11]

Risto Miikkulainen Dan Lessin, Don Fussell. 2013. Open-Ended Behavioral Complexity for Evolved Virtual Creatures. In Genetic and Evolutionary Computation Conference (GECCO'13), Amsterdam, The Netherlands.

Digital Library

[12]

Anupam Datta, Shayak Sen, and Yair Zick. 2016. Algorithmic transparency via quantitative input influence: Theory and experiments with learning systems. In Security and Privacy (SP), 2016 IEEE Symposium on . IEEE, 598--617.

[13]

Pradeep Dubey. 1975. On the uniqueness of the Shapley value. International Journal of Game Theory, Vol. 4, 3 (1975), 131--139.

Digital Library

[14]

Arpad E. Elo. 1978. The rating of chessplayers, past and present .Arco Pub., New York. http://www.amazon.com/Rating-Chess-Players-Past-Present/dp/0668047216

[15]

Vincent Feltkamp. 1995. Alternative axiomatic characterizations of the Shapley and Banzhaf values. International Journal of Game Theory, Vol. 24, 2 (1995), 179--186.

Digital Library

[16]

Faruk Gul. 1989. Bargaining Foundations of Shapley Value. Econometrica, Vol. 57, 1 (1989), 81--95.

[17]

Eric A. Hansen, Daniel S. Bernstein, and Shlomo Zilberstein. 2004. Dynamic Programming for Partially Observable Stochastic Games. In Proceedings of the 19th National Conference on Artifical Intelligence (AAAI'04). AAAI Press, 709--715.

Digital Library

[18]

Karol Hausman, Jost Tobias Springenberg, Ziyu Wang, Nicolas Heess, and Martin Riedmiller. 2018. Learning an Embedding Space for Transferable Robot Skills. In International Conference on Learning Representations .

[19]

Nicolas Heess, Dhruva TB, Srinivasan Sriram, Jay Lemmon, Josh Merel, Greg Wayne, Yuval Tassa, Tom Erez, Ziyu Wang, S. M. Ali Eslami, Martin A. Riedmiller, and David Silver. 2017. Emergence of Locomotion Behaviours in Rich Environments. CoRR, Vol. abs/1707.02286 (2017). http://arxiv.org/abs/1707.02286

[20]

Nicolas Heess, Gregory Wayne, David Silver, Tim Lillicrap, Tom Erez, and Yuval Tassa. 2015. Learning continuous control policies by stochastic value gradients. In Advances in Neural Information Processing Systems. 2944--2952.

Digital Library

[21]

Jonathan Hiller and Hod Lipson. 2012. Automatic Design and Manufacture of Soft Robots. (2012).

Digital Library

[22]

Max Jaderberg, Wojciech Czarnecki, Iain Dunning, Luke Marris, Guy Lever, Antonio García Casta neda, Charles Beattie, Neil C. Rabinowitz, Ari S. Morcos, Avraham Ruderman, Nicolas Sonnerat, Tim Green, Louise Deason, Joel Z. Leibo, David Silver, Demis Hassabis, Koray Kavukcuoglu, and Thore Graepel. 2018. Human-level performance in first-person multiplayer games with population-based deep reinforcement learning. CoRR, Vol. abs/1807.01281 (2018).

[23]

Max Jaderberg, Valentin Dalibard, Simon Osindero, Wojciech M Czarnecki, Jeff Donahue, Ali Razavi, Oriol Vinyals, Tim Green, Iain Dunning, Karen Simonyan, et almbox. 2017. Population based training of neural networks. arXiv preprint arXiv:1711.09846 (2017).

[24]

Milan Jelisavcic, Rafael Kiesel, Kyrre Glette, Evert Haasdijk, and A. E. Eiben. 2017. Analysis of Lamarckian Evolution in Morphologically Evolving Robots. In Proceedings of the 14th European Conference on Artificial Life. MIT Press, 214--221.

[25]

M.J. Jelisavcic, D.M. Roijers, and A.E. Eiben. 2018. Analysing the Relative Importance of Robot Brains and Bodies. In ALIFE 2018 Proceedings of the Artificial Life Conference 2018 (Artificial Life Conference Proceedings). MIT Press Journals, United States, 327--334.

[26]

Jens Kober, J Andrew Bagnell, and Jan Peters. 2013. Reinforcement learning in robotics: A survey. The International Journal of Robotics Research, Vol. 32, 11 (2013), 1238--1274.

Digital Library

[27]

Sergey Levine, Chelsea Finn, Trevor Darrell, and Pieter Abbeel. 2016. End-to-end training of deep visuomotor policies. The Journal of Machine Learning Research, Vol. 17, 1 (2016), 1334--1373.

Digital Library

[28]

Jiwei Li, Will Monroe, Alan Ritter, Michel Galley, Jianfeng Gao, and Dan Jurafsky. 2016. Deep reinforcement learning for dialogue generation. arXiv preprint arXiv:1606.01541 (2016).

[29]

Michael L. Littman. 1994. Markov Games as a Framework for Multi-Agent Reinforcement Learning. In In Proceedings of the Eleventh International Conference on Machine Learning . Morgan Kaufmann, 157--163.

Digital Library

[30]

Hanxiao Liu, Karen Simonyan, Oriol Vinyals, Chrisantha Fernando, and Koray Kavukcuoglu. 2017. Hierarchical representations for efficient architecture search. arXiv preprint arXiv:1711.00436 (2017).

[31]

Borys Wróbel Michal Joachimczak. 2012. Co-evolution of morphology and control of soft-bodied multicellular animats. In Genetic and Evolutionary Computation Conference (GECCO'12), Philadelphia, Pennsylvania, USA.

Digital Library

[32]

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et almbox. 2015. Human-level control through deep reinforcement learning. Nature, Vol. 518, 7540 (2015), 529.

[33]

Rémi Munos, Tom Stepleton, Anna Harutyunyan, and Marc Bellemare. 2016. Safe and efficient off-policy reinforcement learning. In Advances in Neural Information Processing Systems. 1054--1062.

Digital Library

[34]

Aravind Rajeswaran, Vikash Kumar, Abhishek Gupta, John Schulman, Emanuel Todorov, and Sergey Levine. 2017. Learning complex dexterous manipulation with deep reinforcement learning and demonstrations. arXiv preprint arXiv:1709.10087 (2017).

[35]

Martin Riedmiller, Roland Hafner, Thomas Lampe, Michael Neunert, Jonas Degrave, Tom Van de Wiele, Volodymyr Mnih, Nicolas Heess, and Jost Tobias Springenberg. 2018. Learning by Playing-Solving Sparse Reward Tasks from Scratch. arXiv preprint arXiv:1802.10567 (2018).

[36]

John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. 2015. Trust region policy optimization. In International Conference on Machine Learning . 1889--1897.

Digital Library

[37]

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017).

[38]

Abigail See, Yoram Bachrach, and Pushmeet Kohli. 2014. The cost of principles: analyzing power in compatibility weighted voting games. In AAMAS .

Digital Library

[39]

L. S. Shapley. 1953 a. Stochastic Games. Proceedings of the National Academy of Sciences of the United States of America, Vol. 39, 10 (1953), 1095--1100.

[40]

Lloyd S Shapley. 1953 b. A value for n-person games. (1953).

[41]

David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et almbox. 2016. Mastering the game of Go with deep neural networks and tree search. nature, Vol. 529, 7587 (2016), 484.

[42]

George Gaylord Simpson. 1953. The baldwin effect. Evolution, Vol. 7, 2 (1953), 110--117.

[43]

K. Sims. 1994. Evolving virtual creatures. 21st annual conference on Computer graphics and interactive techniques, SIGGRAPH '94 (1994).

Digital Library

[44]

P Straffin. 1988. The Shapley-Shubik and Banzhaf power indices as probabilities. The Shapley value. Essays in honor of Lloyd S. Shapley (1988), 71--81.

[45]

Emanuel Todorov, Tom Erez, and Yuval Tassa. 2012. Mujoco: A physics engine for model-based control. Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ International Conference on. IEEE, 5026--5033.

[46]

Bruce H Weber and David J Depew. 2003. Evolution and learning: The Baldwin effect reconsidered .Mit Press.

[47]

Darrell Whitley, V Scott Gordon, and Keith Mathias. 1994. Lamarckian evolution, the Baldwin effect and function optimization. In International Conference on Parallel Problem Solving from Nature. Springer, 5--15.

Digital Library

Cited By

Fu JTacchetti APerolat JBachrach Y(2021)Evaluating Strategic Structures in Multi-Agent Inverse Reinforcement LearningJournal of Artificial Intelligence Research10.1613/jair.1.1259471(925-951)Online publication date: 18-Aug-2021
https://dl.acm.org/doi/10.1613/jair.1.12594

Index Terms

The Body is Not a Given: Joint Agent Policy Learning and Morphology Evolution
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Reinforcement learning
        Multi-agent reinforcement learning
    2. Machine learning approaches
      1. Bio-inspired approaches
        Evolutionary robotics
        Genetic algorithms
  2. Modeling and simulation
    1. Simulation types and techniques
      1. Artificial life

Recommendations

Toward a Model of Intelligence as an Economy of Agents

A market-based algorithm is presented which autonomously apportions complex tasks to multiple cooperating agents giving each agent the motivation of improving performance of the whole system. A specific model, called “The Hayek Machine” is proposed and ...
Using reinforcement learning and artificial evolution for the detection of group identities in complex adaptive artificial societies
GECCO '13 Companion: Proceedings of the 15th annual conference companion on Genetic and evolutionary computation

We present a computational framework capable of inferring the existence of groups, built upon social networks of re- ciprocal friendship, in Complex Adaptive Artificial Societies (CAAS). Our modelling framework infers the group identi- ties by following ...
Evo-RL: evolutionary-driven reinforcement learning
GECCO '21: Proceedings of the Genetic and Evolutionary Computation Conference Companion

In this work, we propose a novel approach for reinforcement learning driven by evolutionary computation. Our algorithm, dubbed as Evolutionary-Driven Reinforcement Learning (Evo-RL), embeds the reinforcement learning algorithm in an evolutionary cycle, ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

AAMAS '19: Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems

May 2019

2518 pages

ISBN:9781450363099

General Chairs:
Edith Elkind
University of Oxford, UK
,
Manuela Veloso
CMU (on leave), JPMorgan, USA
,
Program Chairs:
Noa Agmon
Bar-Ilan University, Israel
,
Matthew E. Taylor
Borealis AI, Canada

Sponsors

Publisher

International Foundation for Autonomous Agents and Multiagent Systems

Richland, SC

Publication History

Published: 08 May 2019

Check for updates

Author Tags

Qualifiers

Research-article

Conference

AAMAS '19

Sponsor:

SIGAI

AAMAS '19: International Conference on Autonomous Agents and Multiagent Systems

May 13 - 17, 2019

Montreal QC, Canada

Acceptance Rates

AAMAS '19 Paper Acceptance Rate 193 of 793 submissions, 24%;

Overall Acceptance Rate 1,155 of 5,036 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
88
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 16 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Fu JTacchetti APerolat JBachrach Y(2021)Evaluating Strategic Structures in Multi-Agent Inverse Reinforcement LearningJournal of Artificial Intelligence Research10.1613/jair.1.1259471(925-951)Online publication date: 18-Aug-2021
https://dl.acm.org/doi/10.1613/jair.1.12594

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents