Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/3666122.3668856guideproceedingsArticle/Chapter ViewAbstractPublication PagesnipsConference Proceedingsconference-collections
research-article

Regularity as intrinsic reward for free play

Published: 10 December 2023 Publication History

Abstract

We propose regularity as a novel reward signal for intrinsically-motivated reinforcement learning. Taking inspiration from child development, we postulate that striving for structure and order helps guide exploration towards a subspace of tasks that are not favored by naive uncertainty-based intrinsic rewards. Our generalized formulation of Regularity as Intrinsic Reward (RaIR) allows us to operationalize it within model-based reinforcement learning. In a synthetic environment, we showcase the plethora of structured patterns that can emerge from pursuing this regularity objective. We also demonstrate the strength of our method in a multi-object robotic manipulation environment. We incorporate RaIR into free play and use it to complement the model's epistemic uncertainty as an intrinsic reward. Doing so, we witness the autonomous construction of towers and other regular structures during free play, which leads to a substantial improvement in zero-shot downstream task performance on assembly tasks. Code and videos are available at https://sites.google.com/view/rair-project.

References

[1]
Yanxi Liu, Hagit Hel-Or, Craig S Kaplan, Luc Van Gool, et al. Computational symmetry in computer vision and computer graphics. Foundations and Trends® in Computer Graphics and Vision, 5(1-2):1-195, 2010.
[2]
Fred Attneave. Symmetry, information, and memory for patterns. The American journal of psychology, 68(2):209-222, 1955.
[3]
Marc H Bornstein, Kay Ferdinandsen, and Charles G Gross. Perception of symmetry in infancy. Developmental psychology, 17(1):82, 1981.
[4]
Marjory W Bailey. A scale of block constructions for young children. Child Development, 4(2): 121-139, 1933.
[5]
William A Zingrone. The construction of symmetry in children and adults. The Journal of genetic psychology, 175(2):91-104, 2014.
[6]
Claire Golomb. The development of compositional strategies in children's drawings. Visual Arts Research, pages 42-52, 1987.
[7]
Eleanor J Gibson. Exploratory behavior in the development of perceiving, acting, and the acquiring of knowledge. Annual review of psychology, 39(1):1-42, 1988.
[8]
Jonas Langer. The origins of logic: One to two years. Academic Press, 1986. ISBN 9780124365551.
[9]
Deepak Pathak, Pulkit Agrawal, Alexei A. Efros, and Trevor Darrell. Curiosity-driven exploration by self-supervised prediction. In International Conference on Machine Learning (ICML), 2017. URL https://proceedings.mlr.press/v70/pathak17a.html.
[10]
Deepak Pathak, Dhiraj Gandhi, and Abhinav Gupta. Self-supervised exploration via disagreement. In International Conference on Machine Learning (ICML), 2019. URL https://proceedings.mlr.press/v97/pathak19a.html.
[11]
Ramanan Sekar, Oleh Rybkin, Kostas Daniilidis, Pieter Abbeel, Danijar Hafner, and Deepak Pathak. Planning to explore via self-supervised world models. In International Conference on Machine Learning (ICML), 2020. URL https://proceedings.mlr.press/v119/sekar20a.html.
[12]
Cansu Sancaktar, Sebastian Blaes, and Georg Martius. Curious exploration via structured world models yields zero-shot object manipulation. In Advances in Neural Information Processing Systems (NeurIPS), 2022.
[13]
Shu-Kun Lin. Correlation of entropy with similarity and symmetry. Journal of Chemical Information and Computer Sciences, 36(3):367-376, 1996.
[14]
Daniel Lazarev. Information measures for entropy and symmetry. arXiv preprint arXiv:2211.14857, 2022.
[15]
Edward Bormashenko. Entropy, information, and symmetry; ordered is symmetrical, ii: system of spins in the magnetic field. Entropy, 22(2):235, 2020.
[16]
Jürgen Schmidhuber. Driven by compression progress: A simple principle explains essential aspects of subjective beauty, novelty, surprise, interestingness, attention, curiosity, creativity, art, science, music, jokes. In Anticipatory Behavior in Adaptive Learning Systems: From Psychological Theories to Artificial Cognitive Systems 4, pages 48-76. Springer, 2009. Longer preprint: https://arxiv.org/abs/0812.4360.
[17]
Jürgen Schmidhuber. Powerplay: Training an increasingly general problem solver by continually searching for the simplest still unsolvable problem. Frontiers in Psychology, 4:313, 2013.
[18]
Kurtland Chua, Roberto Calandra, Rowan McAllister, and Sergey Levine. Deep reinforcement learning in a handful of trials using probabilistic dynamics models. In Advances in Neural Information Processing Systems (NeurIPS), 2018. URL https://proceedings.neurips.cc/paper_files/paper/2018/file/3de568f8597b94bda53149c7d7f5958c-Paper.pdf.
[19]
Cristina Pinneri, Shambhuraj Sawant, Sebastian Blaes, Jan Achterhold, Joerg Stueckler, Michal Rolinek, and Georg Martius. Sample-efficient cross-entropy method for real-time planning. In Conference on Robot Learning (CoRL), 2020. URL https://proceedings.mlr.press/v155/pinneri21a.html.
[20]
Claude E Shannon. A mathematical theory of communication. The Bell system technical journal, 27(3):379-423, 1948.
[21]
Yuri Burda, Harrison Edwards, Amos Storkey, and Oleg Klimov. Exploration by random network distillation. In International Conference on Learning Representations (ICLR), 2019. URL https://openreview.net/forum?id=H1lJJnR5Ym.
[22]
Peter W. Battaglia, Jessica B. Hamrick, Victor Bapst, Alvaro Sanchez-Gonzalez, Vinicius Zambaldi, Mateusz Malinowski, Andrea Tacchetti, David Raposo, Adam Santoro, Ryan Faulkner, Caglar Gulcehre, Francis Song, Andrew Ballard, Justin Gilmer, George Dahl, Ashish Vaswani, Kelsey Allen, Charles Nash, Victoria Langston, Chris Dyer, Nicolas Heess, Daan Wierstra, Pushmeet Kohli, Matt Botvinick, Oriol Vinyals, Yujia Li, and Razvan Pascanu. Relational inductive biases, deep learning, and graph networks. arXiv:1806.01261, 2018. URL https://arxiv.org/abs/1806.01261.
[23]
Matthias Plappert, Marcin Andrychowicz, Alex Ray, Bob McGrew, Bowen Baker, Glenn Powell, J. Schneider, Joshua Tobin, Maciek Chociej, P. Welinder, V. Kumar, and W. Zaremba. Multi-goal reinforcement learning: Challenging robotics environments and request for research. arXiv preprint arXiv:1802.09464, 2018. URL https://arxiv.org/abs/1802.09464.
[24]
Richard Li, Allan Jabri, Trevor Darrell, and Pulkit Agrawal. Towards practical multi-object manipulation using relational reinforcement learning. In 2020 IEEE International Conference on Robotics and Automation (ICRA), pages 4051-4058. IEEE, 2020. URL https://arxiv.org/abs/1912.11032.
[25]
Russell Mendonca, Oleh Rybkin, Kostas Daniilidis, Danijar Hafner, and Deepak Pathak. Discovering and achieving goals via world models. In Advances in Neural Information Processing Systems (NeurIPS), 2021. URL https://openreview.net/forum?id=6vWuYzkp8d.
[26]
Jürgen Schmidhuber. A possibility for implementing curiosity and boredom in model-building neural controllers. In Proceedings of the International Conference on Simulation of Adaptive Behavior: From Animals to Animats, 1991.
[27]
Kuno Kim, Megumi Sano, Julian De Freitas, Nick Haber, and Daniel Yamins. Active world model learning with progress curiosity. In International Conference on Machine Learning (ICML), 2020. URL https://arxiv.org/abs/2007.07853.
[28]
Oliver Groth, Markus Wulfmeier, Giulia Vezzani, Vibhavari Dasagi, Tim Hertweck, Roland Hafner, Nicolas Heess, and Martin Riedmiller. Is curiosity all you need? on the utility of emergent behaviours from curious exploration. arXiv preprint arXiv:2109.08603, 2021. URL https://arxiv.org/abs/2109.08603.
[29]
J. Storck, S. Hochreiter, and J. Schmidhuber. Reinforcement driven information acquisition in non-deterministic environments. In Proceedings of the International Conference on Artificial Neural Networks, pages 159-164, Paris, 1995. EC2 & Cie. URL https://people.idsia.ch/~juergen/icann95new.pdf.
[30]
Sebastian Blaes, Marin Vlastelica, Jia-Jie Zhu, and Georg Martius. Control What You Can: Intrinsically motivated task-planning agent. In Advances in Neural Information Processing Systems (NeurIPS), 2019. URL https://proceedings.neurips.cc/paper/2019/hash/ b6f97e6f0fd175613910d613d574d0cb-Abstract.html.
[31]
Giuseppe Paolo, Alexandre Coninx, Stephane Doncieux, and Alban Laflaquière. Sparse reward exploration via novelty search and emitters. In Proceedings of the Genetic and Evolutionary Computation Conference, page 154-162, 2021.
[32]
Cédric Colas, Pierre Fournier, Mohamed Chetouani, Olivier Sigaud, and Pierre-Yves Oudeyer. CURIOUS: Intrinsically motivated modular multi-goal reinforcement learning. In International Conference on Machine Learning (ICML), 2019. URL https://proceedings.mlr.press/v97/colas19a.html.
[33]
A.S. Klyubin, D. Polani, and C.L. Nehaniv. Empowerment: a universal agent-centric measure of control. In IEEE Congress on Evolutionary Computation, volume 1, pages 128-135 Vol.1, 2005. URL https://ieeexplore.ieee.org/document/1554676.
[34]
Shakir Mohamed and Danilo Jimenez Rezende. Variational information maximisation for intrinsically motivated reinforcement learning. In Advances in Neural Information Processing Systems (NeurIPS), 2015. URL https://arxiv.org/abs/1509.08731.
[35]
Marc Bellemare, Sriram Srinivasan, Georg Ostrovski, Tom Schaul, David Saxton, and Remi Munos. Unifying count-based exploration and intrinsic motivation. Advances in Neural Information Processing Systems (NeurIPS), 2016.
[36]
Haoran Tang, Rein Houthooft, Davis Foote, Adam Stooke, OpenAI Xi Chen, Yan Duan, John Schulman, Filip DeTurck, and Pieter Abbeel. # exploration: A study of count-based exploration for deep reinforcement learning. Advances in Neural Information Processing Systems (NeurIPS), 2017.
[37]
Edward S. Hu, Richard Chang, Oleh Rybkin, and Dinesh Jayaraman. Planning goals for exploration. In International Conference on Learning Representations (ICLR), 2023. URL https://openreview.net/forum?id=6qeBuZSo7Pr.
[38]
OpenAI, Matthias Plappert, Raul Sampedro, Tao Xu, Ilge Akkaya, Vineet Kosaraju, Peter Welinder, Ruben D'Sa, Arthur Petron, Henrique Ponde de Oliveira Pinto, Alex Paino, Hyeonwoo Noh, Lilian Weng, Qiming Yuan, Casey Chu, and Wojciech Zaremba. Asymmetric self-play for automatic goal discovery in robotic manipulation. arXiv:2101.04882, 2021. URL https://arxiv.org/abs/2101.04882.
[39]
Benjamin Eysenbach, Abhishek Gupta, Julian Ibarz, and Sergey Levine. Diversity is all you need: Learning skills without a reward function. In International Conference on Learning Representations (ICLR), 2019. URL https://openreview.net/forum?id=SJx63jRqFm.
[40]
Archit Sharma, Shixiang Gu, Sergey Levine, Vikash Kumar, and Karol Hausman. Dynamics-aware unsupervised discovery of skills. In International Conference on Learning Representations (ICLR), 2020. URL https://openreview.net/forum?id=HJgLZR4KvH.
[41]
Glen Berseth, Daniel Geng, Coline Manon Devin, Nicholas Rhinehart, Chelsea Finn, Dinesh Jayaraman, and Sergey Levine. {SM}irl: Surprise minimizing reinforcement learning in unstable environments. In International Conference on Learning Representations (ICLR), 2021. URL https://openreview.net/forum?id=cPZOyoDloxl.
[42]
John B. Lanier, Stephen McAleer, and Pierre Baldi. Curiosity-driven multi-criteria hindsight experience replay, 2019.
[43]
Ashvin Nair, Bob McGrew, Marcin Andrychowicz, Wojciech Zaremba, and Pieter Abbeel. Overcoming exploration in reinforcement learning with demonstrations. In 2018 IEEE international conference on robotics and automation (ICRA), pages 6292-6299. IEEE, 2018.
[44]
Ondrej Biza, Thomas Kipf, David Klee, Robert Platt, Jan-Willem van de Meent, and Lawson LS Wong. Factored world models for zero-shot generalization in robotic manipulation. arXiv preprint arXiv:2202.05333, 2022. URL https://arxiv.org/abs/2202.05333.
[45]
Yuval Tassa, Yotam Doron, Alistair Muldal, Tom Erez, Yazhe Li, Diego de Las Casas, David Budden, Abbas Abdolmaleki, Josh Merel, Andrew Lefrancq, et al. Deepmind control suite. arXiv preprint arXiv:1801.00690, 2018.
[46]
Francesco Locatello, Dirk Weissenborn, Thomas Unterthiner, Aravindh Mahendran, Georg Heigold, Jakob Uszkoreit, Alexey Dosovitskiy, and Thomas Kipf. Object-centric learning with slot attention. Advances in Neural Information Processing Systems (NeurIPS), 2020.
[47]
Pedro A. Tsividis, Joao Loula, Jake Burga, Nathan Foss, Andres Campero, Thomas Pouncy, Samuel J. Gershman, and Joshua B. Tenenbaum. Human-level reinforcement learning through theory-based modeling, exploration, and planning. arXiv:2107.12544, 2021. URL https://arxiv.org/abs/2107.12544.
[48]
Peter Battaglia, Razvan Pascanu, Matthew Lai, Danilo Jimenez Rezende, et al. Interaction networks for learning about objects, relations and physics. Advances in Neural Information Processing Systems (NeurIPS), 2016.
[49]
Elizabeth S Spelke. Principles of object perception. Cognitive science, 14(1):29-56, 1990.
[50]
Elizabeth S Spelke, Karen Breinlinger, Kristen Jacobson, and Ann Phillips. Gestalt relations and object perception: A developmental study. Perception, 22(12):1483-1501, 1993.
[51]
Benjamin Peters and Nikolaus Kriegeskorte. Capturing the objects of vision with neural networks. Nature human behaviour, 5(9):1127-1144, 2021.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
NIPS '23: Proceedings of the 37th International Conference on Neural Information Processing Systems
December 2023
80772 pages

Publisher

Curran Associates Inc.

Red Hook, NY, United States

Publication History

Published: 10 December 2023

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 08 Feb 2025

Other Metrics

Citations

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media