Search | arXiv e-print repository

LPGD: A General Framework for Backpropagation through Embedded Optimization Layers

Authors: Anselm Paulus, Georg Martius, Vít Musil

Abstract: Embedding parameterized optimization problems as layers into machine learning architectures serves as a powerful inductive bias. Training such architectures with stochastic gradient descent requires care, as degenerate derivatives of the embedded optimization problem often render the gradients uninformative. We propose Lagrangian Proximal Gradient Descent (LPGD) a flexible framework for training a… ▽ More Embedding parameterized optimization problems as layers into machine learning architectures serves as a powerful inductive bias. Training such architectures with stochastic gradient descent requires care, as degenerate derivatives of the embedded optimization problem often render the gradients uninformative. We propose Lagrangian Proximal Gradient Descent (LPGD) a flexible framework for training architectures with embedded optimization layers that seamlessly integrates into automatic differentiation libraries. LPGD efficiently computes meaningful replacements of the degenerate optimization layer derivatives by re-running the forward solver oracle on a perturbed input. LPGD captures various previously proposed methods as special cases, while fostering deep links to traditional optimization methods. We theoretically analyze our method and demonstrate on historical and synthetic data that LPGD converges faster than gradient descent even in a differentiable setup. △ Less

Submitted 8 July, 2024; originally announced July 2024.

Comments: ICML 2024 conference paper

arXiv:2405.18917 [pdf, other]

Causal Action Influence Aware Counterfactual Data Augmentation

Authors: Núria Armengol Urpí, Marco Bagatella, Marin Vlastelica, Georg Martius

Abstract: Offline data are both valuable and practical resources for teaching robots complex behaviors. Ideally, learning agents should not be constrained by the scarcity of available demonstrations, but rather generalize beyond the training distribution. However, the complexity of real-world scenarios typically requires huge amounts of data to prevent neural network policies from picking up on spurious cor… ▽ More Offline data are both valuable and practical resources for teaching robots complex behaviors. Ideally, learning agents should not be constrained by the scarcity of available demonstrations, but rather generalize beyond the training distribution. However, the complexity of real-world scenarios typically requires huge amounts of data to prevent neural network policies from picking up on spurious correlations and learning non-causal relationships. We propose CAIAC, a data augmentation method that can create feasible synthetic transitions from a fixed dataset without having access to online environment interactions. By utilizing principled methods for quantifying causal influence, we are able to perform counterfactual reasoning by swapping $\it{action}$-unaffected parts of the state-space between independent trajectories in the dataset. We empirically show that this leads to a substantial increase in robustness of offline learning algorithms against distributional shift. △ Less

Submitted 29 May, 2024; originally announced May 2024.

Comments: Accepted in 41st International Conference on Machine Learning (ICML 2024)

arXiv:2404.11735 [pdf, other]

Learning with 3D rotations, a hitchhiker's guide to SO(3)

Authors: A. René Geist, Jonas Frey, Mikel Zobro, Anna Levina, Georg Martius

Abstract: Many settings in machine learning require the selection of a rotation representation. However, choosing a suitable representation from the many available options is challenging. This paper acts as a survey and guide through rotation representations. We walk through their properties that harm or benefit deep learning with gradient-based optimization. By consolidating insights from rotation-based le… ▽ More Many settings in machine learning require the selection of a rotation representation. However, choosing a suitable representation from the many available options is challenging. This paper acts as a survey and guide through rotation representations. We walk through their properties that harm or benefit deep learning with gradient-based optimization. By consolidating insights from rotation-based learning, we provide a comprehensive overview of learning functions with rotation representations. We provide guidance on selecting representations based on whether rotations are in the model's input or output and whether the data primarily comprises small angles. △ Less

Submitted 19 June, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

Comments: Published at ICML 2024

arXiv:2404.07110 [pdf, other]

Wild Visual Navigation: Fast Traversability Learning via Pre-Trained Models and Online Self-Supervision

Authors: Matías Mattamala, Jonas Frey, Piotr Libera, Nived Chebrolu, Georg Martius, Cesar Cadena, Marco Hutter, Maurice Fallon

Abstract: Natural environments such as forests and grasslands are challenging for robotic navigation because of the false perception of rigid obstacles from high grass, twigs, or bushes. In this work, we present Wild Visual Navigation (WVN), an online self-supervised learning system for visual traversability estimation. The system is able to continuously adapt from a short human demonstration in the field,… ▽ More Natural environments such as forests and grasslands are challenging for robotic navigation because of the false perception of rigid obstacles from high grass, twigs, or bushes. In this work, we present Wild Visual Navigation (WVN), an online self-supervised learning system for visual traversability estimation. The system is able to continuously adapt from a short human demonstration in the field, only using onboard sensing and computing. One of the key ideas to achieve this is the use of high-dimensional features from pre-trained self-supervised models, which implicitly encode semantic information that massively simplifies the learning task. Further, the development of an online scheme for supervision generator enables concurrent training and inference of the learned model in the wild. We demonstrate our approach through diverse real-world deployments in forests, parks, and grasslands. Our system is able to bootstrap the traversable terrain segmentation in less than 5 min of in-field training time, enabling the robot to navigate in complex, previously unseen outdoor terrains. Code: https://bit.ly/498b0CV - Project page:https://bit.ly/3M6nMHH △ Less

Submitted 10 April, 2024; originally announced April 2024.

Comments: Extended version of arXiv:2305.08510

arXiv:2402.05371 [pdf, other]

Learning to Control Emulated Muscles in Real Robots: Towards Exploiting Bio-Inspired Actuator Morphology

Authors: Pierre Schumacher, Lorenz Krause, Jan Schneider, Dieter Büchler, Georg Martius, Daniel Haeufle

Abstract: Recent studies have demonstrated the immense potential of exploiting muscle actuator morphology for natural and robust movement -- in simulation. A validation on real robotic hardware is yet missing. In this study, we emulate muscle actuator properties on hardware in real-time, taking advantage of modern and affordable electric motors. We demonstrate that our setup can emulate a simplified muscle… ▽ More Recent studies have demonstrated the immense potential of exploiting muscle actuator morphology for natural and robust movement -- in simulation. A validation on real robotic hardware is yet missing. In this study, we emulate muscle actuator properties on hardware in real-time, taking advantage of modern and affordable electric motors. We demonstrate that our setup can emulate a simplified muscle model on a real robot while being controlled by a learned policy. We improve upon an existing muscle model by deriving a damping rule that ensures that the model is not only performant and stable but also tuneable for the real hardware. Our policies are trained by reinforcement learning entirely in simulation, where we show that previously reported benefits of muscles extend to the case of quadruped locomotion and hopping: the learned policies are more robust and exhibit more regular gaits. Finally, we confirm that the learned policies can be executed on real hardware and show that sim-to-real transfer with real-time emulated muscles on a quadruped robot is possible. These results show that artificial muscles can be highly beneficial actuators for future generations of robust legged robots. △ Less

Submitted 7 February, 2024; originally announced February 2024.

arXiv:2402.03913 [pdf, other]

Machine learning stochastic differential equations for the evolution of order parameters of classical many-body systems in and out of equilibrium

Authors: Francesco Carnazza, Federico Carollo, Sabine Andergassen, Georg Martius, Miriam Klopotek, Igor Lesanovsky

Abstract: We develop a machine learning algorithm to infer the emergent stochastic equation governing the evolution of an order parameter of a many-body system. We train our neural network to independently learn the directed force acting on the order parameter as well as an effective diffusive noise. We illustrate our approach using the classical Ising model endowed with Glauber dynamics, and the contact pr… ▽ More We develop a machine learning algorithm to infer the emergent stochastic equation governing the evolution of an order parameter of a many-body system. We train our neural network to independently learn the directed force acting on the order parameter as well as an effective diffusive noise. We illustrate our approach using the classical Ising model endowed with Glauber dynamics, and the contact process as test cases. For both models, which represent paradigmatic equilibrium and nonequilibrium scenarios, the directed force and noise can be efficiently inferred. The directed force term of the Ising model allows us to reconstruct an effective potential for the order parameter which develops the characteristic double-well shape below the critical temperature. Despite its genuine nonequilibrium nature, such an effective potential can also be obtained for the contact process and its shape signals a phase transition into an absorbing state. Also, in contrast to the equilibrium Ising model, the presence of an absorbing state renders the noise term dependent on the value of the order parameter itself. △ Less

Submitted 4 June, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

Comments: 11 pages, 6 figure, 1 table

arXiv:2312.11091 [pdf, other]

doi 10.1609/aaai.v38i11.29139

Colored Noise in PPO: Improved Exploration and Performance through Correlated Action Sampling

Authors: Jakob Hollenstein, Georg Martius, Justus Piater

Abstract: Proximal Policy Optimization (PPO), a popular on-policy deep reinforcement learning method, employs a stochastic policy for exploration. In this paper, we propose a colored noise-based stochastic policy variant of PPO. Previous research highlighted the importance of temporal correlation in action noise for effective exploration in off-policy reinforcement learning. Building on this, we investigate… ▽ More Proximal Policy Optimization (PPO), a popular on-policy deep reinforcement learning method, employs a stochastic policy for exploration. In this paper, we propose a colored noise-based stochastic policy variant of PPO. Previous research highlighted the importance of temporal correlation in action noise for effective exploration in off-policy reinforcement learning. Building on this, we investigate whether correlated noise can also enhance exploration in on-policy methods like PPO. We discovered that correlated noise for action selection improves learning performance and outperforms the currently popular uncorrelated white noise approach in on-policy methods. Unlike off-policy learning, where pink noise was found to be highly effective, we found that a colored noise, intermediate between white and pink, performed best for on-policy learning in PPO. We examined the impact of varying the amount of data collected for each update by modifying the number of parallel simulation environments for data collection and observed that with a larger number of parallel environments, more strongly correlated noise is beneficial. Due to the significant impact and ease of implementation, we recommend switching to correlated noise as the default noise source in PPO. △ Less

Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

Journal ref: (2024) Proceedings of the AAAI Conference on Artificial Intelligence, 38(11), 12466-12472

arXiv:2312.01473 [pdf, other]

Regularity as Intrinsic Reward for Free Play

Authors: Cansu Sancaktar, Justus Piater, Georg Martius

Abstract: We propose regularity as a novel reward signal for intrinsically-motivated reinforcement learning. Taking inspiration from child development, we postulate that striving for structure and order helps guide exploration towards a subspace of tasks that are not favored by naive uncertainty-based intrinsic rewards. Our generalized formulation of Regularity as Intrinsic Reward (RaIR) allows us to operat… ▽ More We propose regularity as a novel reward signal for intrinsically-motivated reinforcement learning. Taking inspiration from child development, we postulate that striving for structure and order helps guide exploration towards a subspace of tasks that are not favored by naive uncertainty-based intrinsic rewards. Our generalized formulation of Regularity as Intrinsic Reward (RaIR) allows us to operationalize it within model-based reinforcement learning. In a synthetic environment, we showcase the plethora of structured patterns that can emerge from pursuing this regularity objective. We also demonstrate the strength of our method in a multi-object robotic manipulation environment. We incorporate RaIR into free play and use it to complement the model's epistemic uncertainty as an intrinsic reward. Doing so, we witness the autonomous construction of towers and other regular structures during free play, which leads to a substantial improvement in zero-shot downstream task performance on assembly tasks. △ Less

Submitted 3 December, 2023; originally announced December 2023.

Comments: NeurIPS 2023 camera-ready version. Project webpage at http://sites.google.com/view/rair-project

arXiv:2311.16996 [pdf, other]

Goal-conditioned Offline Planning from Curious Exploration

Authors: Marco Bagatella, Georg Martius

Abstract: Curiosity has established itself as a powerful exploration strategy in deep reinforcement learning. Notably, leveraging expected future novelty as intrinsic motivation has been shown to efficiently generate exploratory trajectories, as well as a robust dynamics model. We consider the challenge of extracting goal-conditioned behavior from the products of such unsupervised exploration techniques, wi… ▽ More Curiosity has established itself as a powerful exploration strategy in deep reinforcement learning. Notably, leveraging expected future novelty as intrinsic motivation has been shown to efficiently generate exploratory trajectories, as well as a robust dynamics model. We consider the challenge of extracting goal-conditioned behavior from the products of such unsupervised exploration techniques, without any additional environment interaction. We find that conventional goal-conditioned reinforcement learning approaches for extracting a value function and policy fall short in this difficult offline setting. By analyzing the geometry of optimal goal-conditioned value functions, we relate this issue to a specific class of estimation artifacts in learned values. In order to mitigate their occurrence, we propose to combine model-based planning over learned value landscapes with a graph-based value aggregation scheme. We show how this combination can correct both local and global artifacts, obtaining significant improvements in zero-shot goal-reaching performance across diverse simulated environments. △ Less

Submitted 28 November, 2023; originally announced November 2023.

arXiv:2311.04358 [pdf, other]

doi 10.1021/acs.jctc.3c01238

Machine learning of a density functional for anisotropic patchy particles

Authors: Alessandro Simon, Jens Weimar, Georg Martius, Martin Oettel

Abstract: Anisotropic patchy particles have become an archetypical statistical model system for associating fluids. Here we formulate an approach to the Kern-Frenkel model via classical density functional theory to describe the positionally and orientationally resolved equilibrium density distributions in flat wall geometries. The density functional is split into a reference part for the orientationally ave… ▽ More Anisotropic patchy particles have become an archetypical statistical model system for associating fluids. Here we formulate an approach to the Kern-Frenkel model via classical density functional theory to describe the positionally and orientationally resolved equilibrium density distributions in flat wall geometries. The density functional is split into a reference part for the orientationally averaged density and an orientational part in mean-field approximation. To bring the orientational part into a kernel form suitable for machine learning techniques, an expansion into orientational invariants and the proper incorporation of single-particle symmetries is formulated. The mean-field kernel is constructed via machine learning on the basis of hard wall simulation data. Results are compared to the well-known random-phase approximation which strongly underestimates the orientational correlations close to the wall. Successes and shortcomings of the mean-field treatment of the orientational part are highlighted and perspectives are given for attaining a full density functional via machine learning. △ Less

Submitted 7 November, 2023; originally announced November 2023.

arXiv:2311.04056 [pdf, other]

Multi-View Causal Representation Learning with Partial Observability

Authors: Dingling Yao, Danru Xu, Sébastien Lachapelle, Sara Magliacane, Perouz Taslakian, Georg Martius, Julius von Kügelgen, Francesco Locatello

Abstract: We present a unified framework for studying the identifiability of representations learned from simultaneously observed views, such as different data modalities. We allow a partially observed setting in which each view constitutes a nonlinear mixture of a subset of underlying latent variables, which can be causally related. We prove that the information shared across all subsets of any number of v… ▽ More We present a unified framework for studying the identifiability of representations learned from simultaneously observed views, such as different data modalities. We allow a partially observed setting in which each view constitutes a nonlinear mixture of a subset of underlying latent variables, which can be causally related. We prove that the information shared across all subsets of any number of views can be learned up to a smooth bijection using contrastive learning and a single encoder per view. We also provide graphical criteria indicating which latent variables can be identified through a simple set of rules, which we refer to as identifiability algebra. Our general framework and theoretical results unify and extend several previous works on multi-view nonlinear ICA, disentanglement, and causal representation learning. We experimentally validate our claims on numerical, image, and multi-modal data sets. Further, we demonstrate that the performance of prior methods is recovered in different special cases of our setup. Overall, we find that access to multiple partial views enables us to identify a more fine-grained representation, under the generally milder assumption of partial observability. △ Less

Submitted 8 March, 2024; v1 submitted 7 November, 2023; originally announced November 2023.

Comments: 28 pages, 10 figures, 11 tables

arXiv:2310.02440 [pdf, other]

Learning Diverse Skills for Local Navigation under Multi-constraint Optimality

Authors: Jin Cheng, Marin Vlastelica, Pavel Kolev, Chenhao Li, Georg Martius

Abstract: Despite many successful applications of data-driven control in robotics, extracting meaningful diverse behaviors remains a challenge. Typically, task performance needs to be compromised in order to achieve diversity. In many scenarios, task requirements are specified as a multitude of reward terms, each requiring a different trade-off. In this work, we take a constrained optimization viewpoint on… ▽ More Despite many successful applications of data-driven control in robotics, extracting meaningful diverse behaviors remains a challenge. Typically, task performance needs to be compromised in order to achieve diversity. In many scenarios, task requirements are specified as a multitude of reward terms, each requiring a different trade-off. In this work, we take a constrained optimization viewpoint on the quality-diversity trade-off and show that we can obtain diverse policies while imposing constraints on their value functions which are defined through distinct rewards. In line with previous work, further control of the diversity level can be achieved through an attract-repel reward term motivated by the Van der Waals force. We demonstrate the effectiveness of our method on a local navigation task where a quadruped robot needs to reach the target within a finite horizon. Finally, our trained policies transfer well to the real 12-DoF quadruped robot, Solo12, and exhibit diverse agile behaviors with successful obstacle traversal. △ Less

Submitted 3 October, 2023; originally announced October 2023.

Comments: 7 pages, 6 figures, in submission to ICRA 2024

arXiv:2309.12927 [pdf, other]

Emergent mechanisms for long timescales depend on training curriculum and affect performance in memory tasks

Authors: Sina Khajehabdollahi, Roxana Zeraati, Emmanouil Giannakakis, Tim Jakob Schäfer, Georg Martius, Anna Levina

Abstract: Recurrent neural networks (RNNs) in the brain and in silico excel at solving tasks with intricate temporal dependencies. Long timescales required for solving such tasks can arise from properties of individual neurons (single-neuron timescale, $τ$, e.g., membrane time constant in biological neurons) or recurrent interactions among them (network-mediated timescale). However, the contribution of each… ▽ More Recurrent neural networks (RNNs) in the brain and in silico excel at solving tasks with intricate temporal dependencies. Long timescales required for solving such tasks can arise from properties of individual neurons (single-neuron timescale, $τ$, e.g., membrane time constant in biological neurons) or recurrent interactions among them (network-mediated timescale). However, the contribution of each mechanism for optimally solving memory-dependent tasks remains poorly understood. Here, we train RNNs to solve $N$-parity and $N$-delayed match-to-sample tasks with increasing memory requirements controlled by $N$ by simultaneously optimizing recurrent weights and $τ$s. We find that for both tasks RNNs develop longer timescales with increasing $N$, but depending on the learning objective, they use different mechanisms. Two distinct curricula define learning objectives: sequential learning of a single-$N$ (single-head) or simultaneous learning of multiple $N$s (multi-head). Single-head networks increase their $τ$ with $N$ and are able to solve tasks for large $N$, but they suffer from catastrophic forgetting. However, multi-head networks, which are explicitly required to hold multiple concurrent memories, keep $τ$ constant and develop longer timescales through recurrent connectivity. Moreover, we show that the multi-head curriculum increases training speed and network stability to ablations and perturbations, and allows RNNs to generalize better to tasks beyond their training regime. This curriculum also significantly improves training GRUs and LSTMs for large-$N$ tasks. Our results suggest that adapting timescales to task requirements via recurrent interactions allows learning more complex objectives and improves the RNN's performance. △ Less

Submitted 22 September, 2023; originally announced September 2023.

Journal ref: The Twelfth International Conference on Learning Representations (2024)

arXiv:2309.05582 [pdf, other]

Mind the Uncertainty: Risk-Aware and Actively Exploring Model-Based Reinforcement Learning

Authors: Marin Vlastelica, Sebastian Blaes, Cristina Pineri, Georg Martius

Abstract: We introduce a simple but effective method for managing risk in model-based reinforcement learning with trajectory sampling that involves probabilistic safety constraints and balancing of optimism in the face of epistemic uncertainty and pessimism in the face of aleatoric uncertainty of an ensemble of stochastic neural networks.Various experiments indicate that the separation of uncertainties is e… ▽ More We introduce a simple but effective method for managing risk in model-based reinforcement learning with trajectory sampling that involves probabilistic safety constraints and balancing of optimism in the face of epistemic uncertainty and pessimism in the face of aleatoric uncertainty of an ensemble of stochastic neural networks.Various experiments indicate that the separation of uncertainties is essential to performing well with data-driven MPC approaches in uncertain and safety-critical control environments. △ Less

Submitted 11 September, 2023; originally announced September 2023.

arXiv:2309.02976 [pdf, other]

Natural and Robust Walking using Reinforcement Learning without Demonstrations in High-Dimensional Musculoskeletal Models

Authors: Pierre Schumacher, Thomas Geijtenbeek, Vittorio Caggiano, Vikash Kumar, Syn Schmitt, Georg Martius, Daniel F. B. Haeufle

Abstract: Humans excel at robust bipedal walking in complex natural environments. In each step, they adequately tune the interaction of biomechanical muscle dynamics and neuronal signals to be robust against uncertainties in ground conditions. However, it is still not fully understood how the nervous system resolves the musculoskeletal redundancy to solve the multi-objective control problem considering stab… ▽ More Humans excel at robust bipedal walking in complex natural environments. In each step, they adequately tune the interaction of biomechanical muscle dynamics and neuronal signals to be robust against uncertainties in ground conditions. However, it is still not fully understood how the nervous system resolves the musculoskeletal redundancy to solve the multi-objective control problem considering stability, robustness, and energy efficiency. In computer simulations, energy minimization has been shown to be a successful optimization target, reproducing natural walking with trajectory optimization or reflex-based control methods. However, these methods focus on particular motions at a time and the resulting controllers are limited when compensating for perturbations. In robotics, reinforcement learning~(RL) methods recently achieved highly stable (and efficient) locomotion on quadruped systems, but the generation of human-like walking with bipedal biomechanical models has required extensive use of expert data sets. This strong reliance on demonstrations often results in brittle policies and limits the application to new behaviors, especially considering the potential variety of movements for high-dimensional musculoskeletal models in 3D. Achieving natural locomotion with RL without sacrificing its incredible robustness might pave the way for a novel approach to studying human walking in complex natural environments. Videos: https://sites.google.com/view/naturalwalkingrl △ Less

Submitted 7 September, 2023; v1 submitted 6 September, 2023; originally announced September 2023.

arXiv:2308.07741 [pdf, other]

Real Robot Challenge 2022: Learning Dexterous Manipulation from Offline Data in the Real World

Authors: Nico Gürtler, Felix Widmaier, Cansu Sancaktar, Sebastian Blaes, Pavel Kolev, Stefan Bauer, Manuel Wüthrich, Markus Wulfmeier, Martin Riedmiller, Arthur Allshire, Qiang Wang, Robert McCarthy, Hangyeol Kim, Jongchan Baek, Wookyong Kwon, Shanliang Qian, Yasunori Toshimitsu, Mike Yan Michelis, Amirhossein Kazemipour, Arman Raayatsanati, Hehui Zheng, Barnabas Gavin Cangan, Bernhard Schölkopf, Georg Martius

Abstract: Experimentation on real robots is demanding in terms of time and costs. For this reason, a large part of the reinforcement learning (RL) community uses simulators to develop and benchmark algorithms. However, insights gained in simulation do not necessarily translate to real robots, in particular for tasks involving complex interactions with the environment. The Real Robot Challenge 2022 therefore… ▽ More Experimentation on real robots is demanding in terms of time and costs. For this reason, a large part of the reinforcement learning (RL) community uses simulators to develop and benchmark algorithms. However, insights gained in simulation do not necessarily translate to real robots, in particular for tasks involving complex interactions with the environment. The Real Robot Challenge 2022 therefore served as a bridge between the RL and robotics communities by allowing participants to experiment remotely with a real robot - as easily as in simulation. In the last years, offline reinforcement learning has matured into a promising paradigm for learning from pre-collected datasets, alleviating the reliance on expensive online interactions. We therefore asked the participants to learn two dexterous manipulation tasks involving pushing, grasping, and in-hand orientation from provided real-robot datasets. An extensive software documentation and an initial stage based on a simulation of the real set-up made the competition particularly accessible. By giving each team plenty of access budget to evaluate their offline-learned policies on a cluster of seven identical real TriFinger platforms, we organized an exciting competition for machine learners and roboticists alike. In this work we state the rules of the competition, present the methods used by the winning teams and compare their results with a benchmark of state-of-the-art offline RL algorithms on the challenge datasets. △ Less

Submitted 24 November, 2023; v1 submitted 15 August, 2023; originally announced August 2023.

Comments: Typo in author list fixed

arXiv:2307.15690 [pdf, other]

Benchmarking Offline Reinforcement Learning on Real-Robot Hardware

Authors: Nico Gürtler, Sebastian Blaes, Pavel Kolev, Felix Widmaier, Manuel Wüthrich, Stefan Bauer, Bernhard Schölkopf, Georg Martius

Abstract: Learning policies from previously recorded data is a promising direction for real-world robotics tasks, as online learning is often infeasible. Dexterous manipulation in particular remains an open problem in its general form. The combination of offline reinforcement learning with large diverse datasets, however, has the potential to lead to a breakthrough in this challenging domain analogously to… ▽ More Learning policies from previously recorded data is a promising direction for real-world robotics tasks, as online learning is often infeasible. Dexterous manipulation in particular remains an open problem in its general form. The combination of offline reinforcement learning with large diverse datasets, however, has the potential to lead to a breakthrough in this challenging domain analogously to the rapid progress made in supervised learning in recent years. To coordinate the efforts of the research community toward tackling this problem, we propose a benchmark including: i) a large collection of data for offline learning from a dexterous manipulation platform on two tasks, obtained with capable RL agents trained in simulation; ii) the option to execute learned policies on a real-world robotic system and a simulation for efficient debugging. We evaluate prominent open-sourced offline reinforcement learning algorithms on the datasets and provide a reproducible experimental setup for offline reinforcement learning on real systems. △ Less

Submitted 28 July, 2023; originally announced July 2023.

Comments: The Eleventh International Conference on Learning Representations. 2022. Published at ICLR 2023. Datasets available at https://github.com/rr-learning/trifinger_rl_datasets

arXiv:2307.11373 [pdf, other]

Offline Diversity Maximization Under Imitation Constraints

Authors: Marin Vlastelica, Jin Cheng, Georg Martius, Pavel Kolev

Abstract: There has been significant recent progress in the area of unsupervised skill discovery, utilizing various information-theoretic objectives as measures of diversity. Despite these advances, challenges remain: current methods require significant online interaction, fail to leverage vast amounts of available task-agnostic data and typically lack a quantitative measure of skill utility. We address the… ▽ More There has been significant recent progress in the area of unsupervised skill discovery, utilizing various information-theoretic objectives as measures of diversity. Despite these advances, challenges remain: current methods require significant online interaction, fail to leverage vast amounts of available task-agnostic data and typically lack a quantitative measure of skill utility. We address these challenges by proposing a principled offline algorithm for unsupervised skill discovery that, in addition to maximizing diversity, ensures that each learned skill imitates state-only expert demonstrations to a certain degree. Our main analytical contribution is to connect Fenchel duality, reinforcement learning, and unsupervised skill discovery to maximize a mutual information objective subject to KL-divergence state occupancy constraints. Furthermore, we demonstrate the effectiveness of our method on the standard offline benchmark D4RL and on a custom offline dataset collected from a 12-DoF quadruped robot for which the policies trained in simulation transfer well to the real robotic system. △ Less

Submitted 21 June, 2024; v1 submitted 21 July, 2023; originally announced July 2023.

Comments: RLC 2024

arXiv:2306.16922 [pdf, other]

The Expressive Leaky Memory Neuron: an Efficient and Expressive Phenomenological Neuron Model Can Solve Long-Horizon Tasks

Authors: Aaron Spieler, Nasim Rahaman, Georg Martius, Bernhard Schölkopf, Anna Levina

Abstract: Biological cortical neurons are remarkably sophisticated computational devices, temporally integrating their vast synaptic input over an intricate dendritic tree, subject to complex, nonlinearly interacting internal biological processes. A recent study proposed to characterize this complexity by fitting accurate surrogate models to replicate the input-output relationship of a detailed biophysical… ▽ More Biological cortical neurons are remarkably sophisticated computational devices, temporally integrating their vast synaptic input over an intricate dendritic tree, subject to complex, nonlinearly interacting internal biological processes. A recent study proposed to characterize this complexity by fitting accurate surrogate models to replicate the input-output relationship of a detailed biophysical cortical pyramidal neuron model and discovered it needed temporal convolutional networks (TCN) with millions of parameters. Requiring these many parameters, however, could stem from a misalignment between the inductive biases of the TCN and cortical neuron's computations. In light of this, and to explore the computational implications of leaky memory units and nonlinear dendritic processing, we introduce the Expressive Leaky Memory (ELM) neuron model, a biologically inspired phenomenological model of a cortical neuron. Remarkably, by exploiting such slowly decaying memory-like hidden states and two-layered nonlinear integration of synaptic input, our ELM neuron can accurately match the aforementioned input-output relationship with under ten thousand trainable parameters. To further assess the computational ramifications of our neuron design, we evaluate it on various tasks with demanding temporal structures, including the Long Range Arena (LRA) datasets, as well as a novel neuromorphic dataset based on the Spiking Heidelberg Digits dataset (SHD-Adding). Leveraging a larger number of memory units with sufficiently long timescales, and correspondingly sophisticated synaptic integration, the ELM neuron displays substantial long-range processing capabilities, reliably outperforming the classic Transformer or Chrono-LSTM architectures on LRA, and even solving the Pathfinder-X task with over 70% accuracy (16k context length). △ Less

Submitted 17 March, 2024; v1 submitted 14 June, 2023; originally announced June 2023.

Comments: 25 pages, 14 figures, 13 tables, additional experiments and clarifications, accepted to ICLR 2024

arXiv:2306.07067 [pdf, other]

Locally adaptive cellular automata for goal-oriented self-organization

Authors: Sina Khajehabdollahi, Emmanouil Giannakakis, Victor Buendia, Georg Martius, Anna Levina

Abstract: The essential ingredient for studying the phenomena of emergence is the ability to generate and manipulate emergent systems that span large scales. Cellular automata are the model class particularly known for their effective scalability but are also typically constrained by fixed local rules. In this paper, we propose a new model class of adaptive cellular automata that allows for the generation o… ▽ More The essential ingredient for studying the phenomena of emergence is the ability to generate and manipulate emergent systems that span large scales. Cellular automata are the model class particularly known for their effective scalability but are also typically constrained by fixed local rules. In this paper, we propose a new model class of adaptive cellular automata that allows for the generation of scalable and expressive models. We show how to implement computation-effective adaptation by coupling the update rule of the cellular automaton with itself and the system state in a localized way. To demonstrate the applications of this approach, we implement two different emergent models: a self-organizing Ising model and two types of plastic neural networks, a rate and spiking model. With the Ising model, we show how coupling local/global temperatures to local/global measurements can tune the model to stay in the vicinity of the critical temperature. With the neural models, we reproduce a classical balanced state in large recurrent neuronal networks with excitatory and inhibitory neurons and various plasticity mechanisms. Our study opens multiple directions for studying collective behavior and emergence. △ Less

Submitted 12 June, 2023; originally announced June 2023.

arXiv:2306.04829 [pdf, other]

Object-Centric Learning for Real-World Videos by Predicting Temporal Feature Similarities

Authors: Andrii Zadaianchuk, Maximilian Seitzer, Georg Martius

Abstract: Unsupervised video-based object-centric learning is a promising avenue to learn structured representations from large, unlabeled video collections, but previous approaches have only managed to scale to real-world datasets in restricted domains. Recently, it was shown that the reconstruction of pre-trained self-supervised features leads to object-centric representations on unconstrained real-world… ▽ More Unsupervised video-based object-centric learning is a promising avenue to learn structured representations from large, unlabeled video collections, but previous approaches have only managed to scale to real-world datasets in restricted domains. Recently, it was shown that the reconstruction of pre-trained self-supervised features leads to object-centric representations on unconstrained real-world image datasets. Building on this approach, we propose a novel way to use such pre-trained features in the form of a temporal feature similarity loss. This loss encodes semantic and temporal correlations between image patches and is a natural way to introduce a motion bias for object discovery. We demonstrate that this loss leads to state-of-the-art performance on the challenging synthetic MOVi datasets. When used in combination with the feature reconstruction loss, our model is the first object-centric video model that scales to unconstrained video datasets such as YouTube-VIS. △ Less

Submitted 8 December, 2023; v1 submitted 7 June, 2023; originally announced June 2023.

Comments: NeurIPS 2023. Website and code available at https://martius-lab.github.io/videosaur

arXiv:2306.03935 [pdf, other]

Inferring interpretable dynamical generators of local quantum observables from projective measurements through machine learning

Authors: Giovanni Cemin, Francesco Carnazza, Sabine Andergassen, Georg Martius, Federico Carollo, Igor Lesanovsky

Abstract: To characterize the dynamical behavior of many-body quantum systems, one is usually interested in the evolution of so-called order-parameters rather than in characterizing the full quantum state. In many situations, these quantities coincide with the expectation value of local observables, such as the magnetization or the particle density. In experiment, however, these expectation values can only… ▽ More To characterize the dynamical behavior of many-body quantum systems, one is usually interested in the evolution of so-called order-parameters rather than in characterizing the full quantum state. In many situations, these quantities coincide with the expectation value of local observables, such as the magnetization or the particle density. In experiment, however, these expectation values can only be obtained with a finite degree of accuracy due to the effects of the projection noise. Here, we utilize a machine-learning approach to infer the dynamical generator governing the evolution of local observables in a many-body system from noisy data. To benchmark our method, we consider a variant of the quantum Ising model and generate synthetic experimental data, containing the results of $N$ projective measurements at $M$ sampling points in time, using the time-evolving block-decimation algorithm. As we show, across a wide range of parameters the dynamical generator of local observables can be approximated by a Markovian quantum master equation. Our method is not only useful for extracting effective dynamical generators from many-body systems, but may also be applied for inferring decoherence mechanisms of quantum simulation and computing platforms. △ Less

Submitted 20 February, 2024; v1 submitted 6 June, 2023; originally announced June 2023.

Comments: 7+4 pages, 3+5 figures

arXiv:2306.03655 [pdf, other]

Online Learning under Adversarial Nonlinear Constraints

Authors: Pavel Kolev, Georg Martius, Michael Muehlebach

Abstract: In many applications, learning systems are required to process continuous non-stationary data streams. We study this problem in an online learning framework and propose an algorithm that can deal with adversarial time-varying and nonlinear constraints. As we show in our work, the algorithm called Constraint Violation Velocity Projection (CVV-Pro) achieves $\sqrt{T}$ regret and converges to the fea… ▽ More In many applications, learning systems are required to process continuous non-stationary data streams. We study this problem in an online learning framework and propose an algorithm that can deal with adversarial time-varying and nonlinear constraints. As we show in our work, the algorithm called Constraint Violation Velocity Projection (CVV-Pro) achieves $\sqrt{T}$ regret and converges to the feasible set at a rate of $1/\sqrt{T}$, despite the fact that the feasible set is slowly time-varying and a priori unknown to the learner. CVV-Pro only relies on local sparse linear approximations of the feasible set and therefore avoids optimizing over the entire set at each iteration, which is in sharp contrast to projected gradients or Frank-Wolfe methods. We also empirically evaluate our algorithm on two-player games, where the players are subjected to a shared constraint. △ Less

Submitted 13 October, 2023; v1 submitted 6 June, 2023; originally announced June 2023.

Comments: NeurIPS 2023

arXiv:2305.13341 [pdf, other]

Discovering Causal Relations and Equations from Data

Authors: Gustau Camps-Valls, Andreas Gerhardus, Urmi Ninad, Gherardo Varando, Georg Martius, Emili Balaguer-Ballester, Ricardo Vinuesa, Emiliano Diaz, Laure Zanna, Jakob Runge

Abstract: Physics is a field of science that has traditionally used the scientific method to answer questions about why natural phenomena occur and to make testable models that explain the phenomena. Discovering equations, laws and principles that are invariant, robust and causal explanations of the world has been fundamental in physical sciences throughout the centuries. Discoveries emerge from observing t… ▽ More Physics is a field of science that has traditionally used the scientific method to answer questions about why natural phenomena occur and to make testable models that explain the phenomena. Discovering equations, laws and principles that are invariant, robust and causal explanations of the world has been fundamental in physical sciences throughout the centuries. Discoveries emerge from observing the world and, when possible, performing interventional studies in the system under study. With the advent of big data and the use of data-driven methods, causal and equation discovery fields have grown and made progress in computer science, physics, statistics, philosophy, and many applied fields. All these domains are intertwined and can be used to discover causal relations, physical laws, and equations from observational data. This paper reviews the concepts, methods, and relevant works on causal and equation discovery in the broad field of Physics and outlines the most important challenges and promising future lines of research. We also provide a taxonomy for observational causal and equation discovery, point out connections, and showcase a complete set of case studies in Earth and climate sciences, fluid dynamics and mechanics, and the neurosciences. This review demonstrates that discovering fundamental laws and causal relations by observing natural phenomena is being revolutionised with the efficient exploitation of observational data, modern machine learning algorithms and the interaction with domain knowledge. Exciting times are ahead with many challenges and opportunities to improve our understanding of complex systems. △ Less

Submitted 21 May, 2023; originally announced May 2023.

Comments: 137 pages

arXiv:2304.10990 [pdf, other]

Minsight: A Fingertip-Sized Vision-Based Tactile Sensor for Robotic Manipulation

Authors: Iris Andrussow, Huanbo Sun, Katherine J. Kuchenbecker, Georg Martius

Abstract: Intelligent interaction with the physical world requires perceptual abilities beyond vision and hearing; vibrant tactile sensing is essential for autonomous robots to dexterously manipulate unfamiliar objects or safely contact humans. Therefore, robotic manipulators need high-resolution touch sensors that are compact, robust, inexpensive, and efficient. The soft vision-based haptic sensor presente… ▽ More Intelligent interaction with the physical world requires perceptual abilities beyond vision and hearing; vibrant tactile sensing is essential for autonomous robots to dexterously manipulate unfamiliar objects or safely contact humans. Therefore, robotic manipulators need high-resolution touch sensors that are compact, robust, inexpensive, and efficient. The soft vision-based haptic sensor presented herein is a miniaturized and optimized version of the previously published sensor Insight. Minsight has the size and shape of a human fingertip and uses machine learning methods to output high-resolution maps of 3D contact force vectors at 60 Hz. Experiments confirm its excellent sensing performance, with a mean absolute force error of 0.07 N and contact location error of 0.6 mm across its surface area. Minsight's utility is shown in two robotic tasks on a 3-DoF manipulator. First, closed-loop force control enables the robot to track the movements of a human finger based only on tactile data. Second, the informative value of the sensor output is shown by detecting whether a hard lump is embedded within a soft elastomer with an accuracy of 98%. These findings indicate that Minsight can give robots the detailed fingertip touch sensing needed for dexterous manipulation and physical human-robot interaction. △ Less

Submitted 21 April, 2023; originally announced April 2023.

arXiv:2304.04664 [pdf, other]

Inductive biases in deep learning models for weather prediction

Authors: Jannik Thuemmel, Matthias Karlbauer, Sebastian Otte, Christiane Zarfl, Georg Martius, Nicole Ludwig, Thomas Scholten, Ulrich Friedrich, Volker Wulfmeyer, Bedartha Goswami, Martin V. Butz

Abstract: Deep learning has gained immense popularity in the Earth sciences as it enables us to formulate purely data-driven models of complex Earth system processes. Deep learning-based weather prediction (DLWP) models have made significant progress in the last few years, achieving forecast skills comparable to established numerical weather prediction models with comparatively lesser computational costs. I… ▽ More Deep learning has gained immense popularity in the Earth sciences as it enables us to formulate purely data-driven models of complex Earth system processes. Deep learning-based weather prediction (DLWP) models have made significant progress in the last few years, achieving forecast skills comparable to established numerical weather prediction models with comparatively lesser computational costs. In order to train accurate, reliable, and tractable DLWP models with several millions of parameters, the model design needs to incorporate suitable inductive biases that encode structural assumptions about the data and the modelled processes. When chosen appropriately, these biases enable faster learning and better generalisation to unseen data. Although inductive biases play a crucial role in successful DLWP models, they are often not stated explicitly and their contribution to model performance remains unclear. Here, we review and analyse the inductive biases of state-of-the-art DLWP models with respect to five key design elements: data selection, learning objective, loss function, architecture, and optimisation method. We identify the most important inductive biases and highlight potential avenues towards more efficient and probabilistic DLWP models. △ Less

Submitted 30 April, 2024; v1 submitted 6 April, 2023; originally announced April 2023.

arXiv:2303.16195 [pdf, other]

doi 10.1162/artl_a_00383

When to be critical? Performance and evolvability in different regimes of neural Ising agents

Authors: Sina Khajehabdollahi, Jan Prosi, Emmanouil Giannakakis, Georg Martius, Anna Levina

Abstract: It has long been hypothesized that operating close to the critical state is beneficial for natural, artificial and their evolutionary systems. We put this hypothesis to test in a system of evolving foraging agents controlled by neural networks that can adapt agents' dynamical regime throughout evolution. Surprisingly, we find that all populations that discover solutions, evolve to be subcritical.… ▽ More It has long been hypothesized that operating close to the critical state is beneficial for natural, artificial and their evolutionary systems. We put this hypothesis to test in a system of evolving foraging agents controlled by neural networks that can adapt agents' dynamical regime throughout evolution. Surprisingly, we find that all populations that discover solutions, evolve to be subcritical. By a resilience analysis, we find that there are still benefits of starting the evolution in the critical regime. Namely, initially critical agents maintain their fitness level under environmental changes (for example, in the lifespan) and degrade gracefully when their genome is perturbed. At the same time, initially subcritical agents, even when evolved to the same fitness, are often inadequate to withstand the changes in the lifespan and degrade catastrophically with genetic perturbations. Furthermore, we find the optimal distance to criticality depends on the task complexity. To test it we introduce a hard and simple task: for the hard task, agents evolve closer to criticality whereas more subcritical solutions are found for the simple task. We verify that our results are independent of the selected evolutionary mechanisms by testing them on two principally different approaches: a genetic algorithm and an evolutionary strategy. In summary, our study suggests that although optimal behaviour in the simple task is obtained in a subcritical regime, initializing near criticality is important to be efficient at finding optimal solutions for new tasks of unknown complexity. △ Less

Submitted 24 November, 2023; v1 submitted 28 March, 2023; originally announced March 2023.

Comments: arXiv admin note: substantial text overlap with arXiv:2103.12184

Journal ref: Artificial Life (2022) 28 (4): 458-478

arXiv:2303.09628 [pdf, other]

Efficient Learning of High Level Plans from Play

Authors: Núria Armengol Urpí, Marco Bagatella, Otmar Hilliges, Georg Martius, Stelian Coros

Abstract: Real-world robotic manipulation tasks remain an elusive challenge, since they involve both fine-grained environment interaction, as well as the ability to plan for long-horizon goals. Although deep reinforcement learning (RL) methods have shown encouraging results when planning end-to-end in high-dimensional environments, they remain fundamentally limited by poor sample efficiency due to inefficie… ▽ More Real-world robotic manipulation tasks remain an elusive challenge, since they involve both fine-grained environment interaction, as well as the ability to plan for long-horizon goals. Although deep reinforcement learning (RL) methods have shown encouraging results when planning end-to-end in high-dimensional environments, they remain fundamentally limited by poor sample efficiency due to inefficient exploration, and by the complexity of credit assignment over long horizons. In this work, we present Efficient Learning of High-Level Plans from Play (ELF-P), a framework for robotic learning that bridges motion planning and deep RL to achieve long-horizon complex manipulation tasks. We leverage task-agnostic play data to learn a discrete behavioral prior over object-centric primitives, modeling their feasibility given the current context. We then design a high-level goal-conditioned policy which (1) uses primitives as building blocks to scaffold complex long-horizon tasks and (2) leverages the behavioral prior to accelerate learning. We demonstrate that ELF-P has significantly better sample efficiency than relevant baselines over multiple realistic manipulation tasks and learns policies that can be easily transferred to physical hardware. △ Less

Submitted 16 March, 2023; originally announced March 2023.

Comments: Accepted to the International Conference on Robotics and Automation 2023

arXiv:2209.07899 [pdf, other]

Versatile Skill Control via Self-supervised Adversarial Imitation of Unlabeled Mixed Motions

Authors: Chenhao Li, Sebastian Blaes, Pavel Kolev, Marin Vlastelica, Jonas Frey, Georg Martius

Abstract: Learning diverse skills is one of the main challenges in robotics. To this end, imitation learning approaches have achieved impressive results. These methods require explicitly labeled datasets or assume consistent skill execution to enable learning and active control of individual behaviors, which limits their applicability. In this work, we propose a cooperative adversarial method for obtaining… ▽ More Learning diverse skills is one of the main challenges in robotics. To this end, imitation learning approaches have achieved impressive results. These methods require explicitly labeled datasets or assume consistent skill execution to enable learning and active control of individual behaviors, which limits their applicability. In this work, we propose a cooperative adversarial method for obtaining single versatile policies with controllable skill sets from unlabeled datasets containing diverse state transition patterns by maximizing their discriminability. Moreover, we show that by utilizing unsupervised skill discovery in the generative adversarial imitation learning framework, novel and useful skills emerge with successful task fulfillment. Finally, the obtained versatile policies are tested on an agile quadruped robot called Solo 8 and present faithful replications of diverse skills encoded in the demonstrations. △ Less

Submitted 11 February, 2023; v1 submitted 16 September, 2022; originally announced September 2022.

arXiv:2207.03952 [pdf, other]

Learning with Muscles: Benefits for Data-Efficiency and Robustness in Anthropomorphic Tasks

Authors: Isabell Wochner, Pierre Schumacher, Georg Martius, Dieter Büchler, Syn Schmitt, Daniel F. B. Haeufle

Abstract: Humans are able to outperform robots in terms of robustness, versatility, and learning of new tasks in a wide variety of movements. We hypothesize that highly nonlinear muscle dynamics play a large role in providing inherent stability, which is favorable to learning. While recent advances have been made in applying modern learning techniques to muscle-actuated systems both in simulation as well as… ▽ More Humans are able to outperform robots in terms of robustness, versatility, and learning of new tasks in a wide variety of movements. We hypothesize that highly nonlinear muscle dynamics play a large role in providing inherent stability, which is favorable to learning. While recent advances have been made in applying modern learning techniques to muscle-actuated systems both in simulation as well as in robotics, so far, no detailed analysis has been performed to show the benefits of muscles when learning from scratch. Our study closes this gap and showcases the potential of muscle actuators for core robotics challenges in terms of data-efficiency, hyperparameter sensitivity, and robustness. △ Less

Submitted 16 January, 2023; v1 submitted 8 July, 2022; originally announced July 2022.

arXiv:2206.11693 [pdf, other]

Learning Agile Skills via Adversarial Imitation of Rough Partial Demonstrations

Authors: Chenhao Li, Marin Vlastelica, Sebastian Blaes, Jonas Frey, Felix Grimminger, Georg Martius

Abstract: Learning agile skills is one of the main challenges in robotics. To this end, reinforcement learning approaches have achieved impressive results. These methods require explicit task information in terms of a reward function or an expert that can be queried in simulation to provide a target control output, which limits their applicability. In this work, we propose a generative adversarial method fo… ▽ More Learning agile skills is one of the main challenges in robotics. To this end, reinforcement learning approaches have achieved impressive results. These methods require explicit task information in terms of a reward function or an expert that can be queried in simulation to provide a target control output, which limits their applicability. In this work, we propose a generative adversarial method for inferring reward functions from partial and potentially physically incompatible demonstrations for successful skill acquirement where reference or expert demonstrations are not easily accessible. Moreover, we show that by using a Wasserstein GAN formulation and transitions from demonstrations with rough and partial information as input, we are able to extract policies that are robust and capable of imitating demonstrated behaviors. Finally, the obtained skills such as a backflip are tested on an agile quadruped robot called Solo 8 and present faithful replication of hand-held human demonstrations. △ Less

Submitted 21 November, 2022; v1 submitted 23 June, 2022; originally announced June 2022.

arXiv:2206.11403 [pdf, other]

Curious Exploration via Structured World Models Yields Zero-Shot Object Manipulation

Authors: Cansu Sancaktar, Sebastian Blaes, Georg Martius

Abstract: It has been a long-standing dream to design artificial agents that explore their environment efficiently via intrinsic motivation, similar to how children perform curious free play. Despite recent advances in intrinsically motivated reinforcement learning (RL), sample-efficient exploration in object manipulation scenarios remains a significant challenge as most of the relevant information lies in… ▽ More It has been a long-standing dream to design artificial agents that explore their environment efficiently via intrinsic motivation, similar to how children perform curious free play. Despite recent advances in intrinsically motivated reinforcement learning (RL), sample-efficient exploration in object manipulation scenarios remains a significant challenge as most of the relevant information lies in the sparse agent-object and object-object interactions. In this paper, we propose to use structured world models to incorporate relational inductive biases in the control loop to achieve sample-efficient and interaction-rich exploration in compositional multi-object environments. By planning for future novelty inside structured world models, our method generates free-play behavior that starts to interact with objects early on and develops more complex behavior over time. Instead of using models only to compute intrinsic rewards, as commonly done, our method showcases that the self-reinforcing cycle between good models and good exploration also opens up another avenue: zero-shot generalization to downstream tasks via model-based planning. After the entirely intrinsic task-agnostic exploration phase, our method solves challenging downstream tasks such as stacking, flipping, pick & place, and throwing that generalizes to unseen numbers and arrangements of objects without any additional training. △ Less

Submitted 26 November, 2022; v1 submitted 22 June, 2022; originally announced June 2022.

Comments: NeurIPS 2022 camera-ready version

arXiv:2206.02416 [pdf, other]

Embrace the Gap: VAEs Perform Independent Mechanism Analysis

Authors: Patrik Reizinger, Luigi Gresele, Jack Brady, Julius von Kügelgen, Dominik Zietlow, Bernhard Schölkopf, Georg Martius, Wieland Brendel, Michel Besserve

Abstract: Variational autoencoders (VAEs) are a popular framework for modeling complex data distributions; they can be efficiently trained via variational inference by maximizing the evidence lower bound (ELBO), at the expense of a gap to the exact (log-)marginal likelihood. While VAEs are commonly used for representation learning, it is unclear why ELBO maximization would yield useful representations, sinc… ▽ More Variational autoencoders (VAEs) are a popular framework for modeling complex data distributions; they can be efficiently trained via variational inference by maximizing the evidence lower bound (ELBO), at the expense of a gap to the exact (log-)marginal likelihood. While VAEs are commonly used for representation learning, it is unclear why ELBO maximization would yield useful representations, since unregularized maximum likelihood estimation cannot invert the data-generating process. Yet, VAEs often succeed at this task. We seek to elucidate this apparent paradox by studying nonlinear VAEs in the limit of near-deterministic decoders. We first prove that, in this regime, the optimal encoder approximately inverts the decoder -- a commonly used but unproven conjecture -- which we refer to as {\em self-consistency}. Leveraging self-consistency, we show that the ELBO converges to a regularized log-likelihood. This allows VAEs to perform what has recently been termed independent mechanism analysis (IMA): it adds an inductive bias towards decoders with column-orthogonal Jacobians, which helps recovering the true latent factors. The gap between ELBO and log-likelihood is therefore welcome, since it bears unanticipated benefits for nonlinear representation learning. In experiments on synthetic and image data, we show that VAEs uncover the true latent factors when the data generating process satisfies the IMA assumption. △ Less

Submitted 27 January, 2023; v1 submitted 6 June, 2022; originally announced June 2022.

Comments: NeurIPS2022 final version

arXiv:2206.02042 [pdf, other]

Developing hierarchical anticipations via neural network-based event segmentation

Authors: Christian Gumbsch, Maurits Adam, Birgit Elsner, Georg Martius, Martin V. Butz

Abstract: Humans can make predictions on various time scales and hierarchical levels. Thereby, the learning of event encodings seems to play a crucial role. In this work we model the development of hierarchical predictions via autonomously learned latent event codes. We present a hierarchical recurrent neural network architecture, whose inductive learning biases foster the development of sparsely changing l… ▽ More Humans can make predictions on various time scales and hierarchical levels. Thereby, the learning of event encodings seems to play a crucial role. In this work we model the development of hierarchical predictions via autonomously learned latent event codes. We present a hierarchical recurrent neural network architecture, whose inductive learning biases foster the development of sparsely changing latent state that compress sensorimotor sequences. A higher level network learns to predict the situations in which the latent states tend to change. Using a simulated robotic manipulator, we demonstrate that the system (i) learns latent states that accurately reflect the event structure of the data, (ii) develops meaningful temporal abstract predictions on the higher level, and (iii) generates goal-anticipatory behavior similar to gaze behavior found in eye-tracking studies with infants. The architecture offers a step towards the autonomous learning of compressed hierarchical encodings of gathered experiences and the exploitation of these encodings to generate adaptive behavior. △ Less

Submitted 28 August, 2022; v1 submitted 4 June, 2022; originally announced June 2022.

Comments: accepted at ICDL 2022

arXiv:2206.00484 [pdf, other]

DEP-RL: Embodied Exploration for Reinforcement Learning in Overactuated and Musculoskeletal Systems

Authors: Pierre Schumacher, Daniel Häufle, Dieter Büchler, Syn Schmitt, Georg Martius

Abstract: Muscle-actuated organisms are capable of learning an unparalleled diversity of dexterous movements despite their vast amount of muscles. Reinforcement learning (RL) on large musculoskeletal models, however, has not been able to show similar performance. We conjecture that ineffective exploration in large overactuated action spaces is a key problem. This is supported by the finding that common expl… ▽ More Muscle-actuated organisms are capable of learning an unparalleled diversity of dexterous movements despite their vast amount of muscles. Reinforcement learning (RL) on large musculoskeletal models, however, has not been able to show similar performance. We conjecture that ineffective exploration in large overactuated action spaces is a key problem. This is supported by the finding that common exploration noise strategies are inadequate in synthetic examples of overactuated systems. We identify differential extrinsic plasticity (DEP), a method from the domain of self-organization, as being able to induce state-space covering exploration within seconds of interaction. By integrating DEP into RL, we achieve fast learning of reaching and locomotion in musculoskeletal systems, outperforming current approaches in all considered tasks in sample efficiency and robustness. △ Less

Submitted 27 April, 2023; v1 submitted 30 May, 2022; originally announced June 2022.

arXiv:2205.15213 [pdf, other]

Backpropagation through Combinatorial Algorithms: Identity with Projection Works

Authors: Subham Sekhar Sahoo, Anselm Paulus, Marin Vlastelica, Vít Musil, Volodymyr Kuleshov, Georg Martius

Abstract: Embedding discrete solvers as differentiable layers has given modern deep learning architectures combinatorial expressivity and discrete reasoning capabilities. The derivative of these solvers is zero or undefined, therefore a meaningful replacement is crucial for effective gradient-based learning. Prior works rely on smoothing the solver with input perturbations, relaxing the solver to continuous… ▽ More Embedding discrete solvers as differentiable layers has given modern deep learning architectures combinatorial expressivity and discrete reasoning capabilities. The derivative of these solvers is zero or undefined, therefore a meaningful replacement is crucial for effective gradient-based learning. Prior works rely on smoothing the solver with input perturbations, relaxing the solver to continuous problems, or interpolating the loss landscape with techniques that typically require additional solver calls, introduce extra hyper-parameters, or compromise performance. We propose a principled approach to exploit the geometry of the discrete solution space to treat the solver as a negative identity on the backward pass and further provide a theoretical justification. Our experiments demonstrate that such a straightforward hyper-parameter-free approach is able to compete with previous more complex methods on numerous experiments such as backpropagation through discrete samplers, deep graph matching, and image retrieval. Furthermore, we substitute the previously proposed problem-specific and label-dependent margin with a generic regularization procedure that prevents cost collapse and increases robustness. △ Less

Submitted 17 March, 2023; v1 submitted 30 May, 2022; originally announced May 2022.

Comments: ICLR 2023 conference paper. The first two authors contributed equally

arXiv:2203.09168 [pdf, other]

On the Pitfalls of Heteroscedastic Uncertainty Estimation with Probabilistic Neural Networks

Authors: Maximilian Seitzer, Arash Tavakoli, Dimitrije Antic, Georg Martius

Abstract: Capturing aleatoric uncertainty is a critical part of many machine learning systems. In deep learning, a common approach to this end is to train a neural network to estimate the parameters of a heteroscedastic Gaussian distribution by maximizing the logarithm of the likelihood function under the observed data. In this work, we examine this approach and identify potential hazards associated with th… ▽ More Capturing aleatoric uncertainty is a critical part of many machine learning systems. In deep learning, a common approach to this end is to train a neural network to estimate the parameters of a heteroscedastic Gaussian distribution by maximizing the logarithm of the likelihood function under the observed data. In this work, we examine this approach and identify potential hazards associated with the use of log-likelihood in conjunction with gradient-based optimizers. First, we present a synthetic example illustrating how this approach can lead to very poor but stable parameter estimates. Second, we identify the culprit to be the log-likelihood loss, along with certain conditions that exacerbate the issue. Third, we present an alternative formulation, termed $β$-NLL, in which each data point's contribution to the loss is weighted by the $β$-exponentiated variance estimate. We show that using an appropriate $β$ largely mitigates the issue in our illustrative example. Fourth, we evaluate this approach on a range of domains and tasks and show that it achieves considerable improvements and performs more robustly concerning hyperparameters, both in predictive RMSE and log-likelihood criteria. △ Less

Submitted 1 April, 2022; v1 submitted 17 March, 2022; originally announced March 2022.

Comments: ICLR 2022 camera-ready version. Code available at http://github.com/martius-lab/beta-nll

arXiv:2201.11599 [pdf, other]

doi 10.1088/1367-2630/ac7df6

Inferring Markovian quantum master equations of few-body observables in interacting spin chains

Authors: Francesco Carnazza, Federico Carollo, Dominik Zietlow, Sabine Andergassen, Georg Martius, Igor Lesanovsky

Abstract: Full information about a many-body quantum system is usually out-of-reach due to the exponential growth -- with the size of the system -- of the number of parameters needed to encode its state. Nonetheless, in order to understand the complex phenomenology that can be observed in these systems, it is often sufficient to consider dynamical or stationary properties of local observables or, at most, o… ▽ More Full information about a many-body quantum system is usually out-of-reach due to the exponential growth -- with the size of the system -- of the number of parameters needed to encode its state. Nonetheless, in order to understand the complex phenomenology that can be observed in these systems, it is often sufficient to consider dynamical or stationary properties of local observables or, at most, of few-body correlation functions. These quantities are typically studied by singling out a specific subsystem of interest and regarding the remainder of the many-body system as an effective bath. In the simplest scenario, the subsystem dynamics, which is in fact an open quantum dynamics, can be approximated through Markovian quantum master equations. Here, we formulate the problem of finding the generator of the subsystem dynamics as a variational problem, which we solve using the standard toolbox of machine learning for optimization. This dynamical or ``Lindblad" generator provides the relevant dynamical parameters for the subsystem of interest. Importantly, the algorithm we develop is constructed such that the learned generator implements a physically consistent open quantum time-evolution. We exploit this to learn the generator of the dynamics of a subsystem of a many-body system subject to a unitary quantum dynamics. We explore the capability of our method to recover the time-evolution of a two-body subsystem and exploit the physical consistency of the generator to make predictions on the stationary state of the subsystem dynamics. △ Less

Submitted 25 July, 2022; v1 submitted 27 January, 2022; originally announced January 2022.

Comments: 24 pages, 4 figures

Journal ref: New J. Phys. 24 073033 (2022)

arXiv:2112.03100 [pdf, other]

Hierarchical Reinforcement Learning with Timed Subgoals

Authors: Nico Gürtler, Dieter Büchler, Georg Martius

Abstract: Hierarchical reinforcement learning (HRL) holds great potential for sample-efficient learning on challenging long-horizon tasks. In particular, letting a higher level assign subgoals to a lower level has been shown to enable fast learning on difficult problems. However, such subgoal-based methods have been designed with static reinforcement learning environments in mind and consequently struggle w… ▽ More Hierarchical reinforcement learning (HRL) holds great potential for sample-efficient learning on challenging long-horizon tasks. In particular, letting a higher level assign subgoals to a lower level has been shown to enable fast learning on difficult problems. However, such subgoal-based methods have been designed with static reinforcement learning environments in mind and consequently struggle with dynamic elements beyond the immediate control of the agent even though they are ubiquitous in real-world problems. In this paper, we introduce Hierarchical reinforcement learning with Timed Subgoals (HiTS), an HRL algorithm that enables the agent to adapt its timing to a dynamic environment by not only specifying what goal state is to be reached but also when. We discuss how communicating with a lower level in terms of such timed subgoals results in a more stable learning problem for the higher level. Our experiments on a range of standard benchmarks and three new challenging dynamic reinforcement learning environments show that our method is capable of sample-efficient learning where an existing state-of-the-art subgoal-based HRL method fails to learn stable solutions. △ Less

Submitted 6 December, 2021; originally announced December 2021.

Comments: Published at NeurIPS 2021. Code available at https://github.com/martius-lab/HiTS

arXiv:2111.05934 [pdf, other]

A soft thumb-sized vision-based sensor with accurate all-round force perception

Authors: Huanbo Sun, Katherine J. Kuchenbecker, Georg Martius

Abstract: Vision-based haptic sensors have emerged as a promising approach to robotic touch due to affordable high-resolution cameras and successful computer-vision techniques. However, their physical design and the information they provide do not yet meet the requirements of real applications. We present a robust, soft, low-cost, vision-based, thumb-sized 3D haptic sensor named Insight: it continually prov… ▽ More Vision-based haptic sensors have emerged as a promising approach to robotic touch due to affordable high-resolution cameras and successful computer-vision techniques. However, their physical design and the information they provide do not yet meet the requirements of real applications. We present a robust, soft, low-cost, vision-based, thumb-sized 3D haptic sensor named Insight: it continually provides a directional force-distribution map over its entire conical sensing surface. Constructed around an internal monocular camera, the sensor has only a single layer of elastomer over-molded on a stiff frame to guarantee sensitivity, robustness, and soft contact. Furthermore, Insight is the first system to combine photometric stereo and structured light using a collimator to detect the 3D deformation of its easily replaceable flexible outer shell. The force information is inferred by a deep neural network that maps images to the spatial distribution of 3D contact force (normal and shear). Insight has an overall spatial resolution of 0.4 mm, force magnitude accuracy around 0.03 N, and force direction accuracy around 5 degrees over a range of 0.03--2 N for numerous distinct contacts with varying contact area. The presented hardware and software design concepts can be transferred to a wide variety of robot parts. △ Less

Submitted 10 November, 2021; originally announced November 2021.

Comments: 1 table, 5 figures, 24 pages for the main manuscript. 5 tables, 12 figures, 27 pages for the supplementary material. 8 supplementary videos

arXiv:2110.15949 [pdf, other]

Sparsely Changing Latent States for Prediction and Planning in Partially Observable Domains

Authors: Christian Gumbsch, Martin V. Butz, Georg Martius

Abstract: A common approach to prediction and planning in partially observable domains is to use recurrent neural networks (RNNs), which ideally develop and maintain a latent memory about hidden, task-relevant factors. We hypothesize that many of these hidden factors in the physical world are constant over time, changing only sparsely. To study this hypothesis, we propose Gated $L_0$ Regularized Dynamics (G… ▽ More A common approach to prediction and planning in partially observable domains is to use recurrent neural networks (RNNs), which ideally develop and maintain a latent memory about hidden, task-relevant factors. We hypothesize that many of these hidden factors in the physical world are constant over time, changing only sparsely. To study this hypothesis, we propose Gated $L_0$ Regularized Dynamics (GateL0RD), a novel recurrent architecture that incorporates the inductive bias to maintain stable, sparsely changing latent states. The bias is implemented by means of a novel internal gating function and a penalty on the $L_0$ norm of latent state changes. We demonstrate that GateL0RD can compete with or outperform state-of-the-art RNNs in a variety of partially observable prediction and control tasks. GateL0RD tends to encode the underlying generative factors of the environment, ignores spurious temporal dependencies, and generalizes better, improving sampling efficiency and overall performance in model-based planning and reinforcement learning tasks. Moreover, we show that the developing latent states can be easily interpreted, which is a step towards better explainability in RNNs. △ Less

Submitted 13 January, 2022; v1 submitted 29 October, 2021; originally announced October 2021.

Comments: Accepted at NeurIPS 2021

arXiv:2110.06149 [pdf, other]

Planning from Pixels in Environments with Combinatorially Hard Search Spaces

Authors: Marco Bagatella, Mirek Olšák, Michal Rolínek, Georg Martius

Abstract: The ability to form complex plans based on raw visual input is a litmus test for current capabilities of artificial intelligence, as it requires a seamless combination of visual processing and abstract algorithmic execution, two traditionally separate areas of computer science. A recent surge of interest in this field brought advances that yield good performance in tasks ranging from arcade games… ▽ More The ability to form complex plans based on raw visual input is a litmus test for current capabilities of artificial intelligence, as it requires a seamless combination of visual processing and abstract algorithmic execution, two traditionally separate areas of computer science. A recent surge of interest in this field brought advances that yield good performance in tasks ranging from arcade games to continuous control; these methods however do not come without significant issues, such as limited generalization capabilities and difficulties when dealing with combinatorially hard planning instances. Our contribution is two-fold: (i) we present a method that learns to represent its environment as a latent graph and leverages state reidentification to reduce the complexity of finding a good policy from exponential to linear (ii) we introduce a set of lightweight environments with an underlying discrete combinatorial structure in which planning is challenging even for humans. Moreover, we show that our methods achieves strong empirical generalization to variations in the environment, even across highly disadvantaged regimes, such as "one-shot" planning, or in an offline RL paradigm which only provides low-quality trajectories. △ Less

Submitted 18 March, 2022; v1 submitted 12 October, 2021; originally announced October 2021.

arXiv:2109.04150 [pdf, other]

Self-supervised Reinforcement Learning with Independently Controllable Subgoals

Authors: Andrii Zadaianchuk, Georg Martius, Fanny Yang

Abstract: To successfully tackle challenging manipulation tasks, autonomous agents must learn a diverse set of skills and how to combine them. Recently, self-supervised agents that set their own abstract goals by exploiting the discovered structure in the environment were shown to perform well on many different tasks. In particular, some of them were applied to learn basic manipulation skills in composition… ▽ More To successfully tackle challenging manipulation tasks, autonomous agents must learn a diverse set of skills and how to combine them. Recently, self-supervised agents that set their own abstract goals by exploiting the discovered structure in the environment were shown to perform well on many different tasks. In particular, some of them were applied to learn basic manipulation skills in compositional multi-object environments. However, these methods learn skills without taking the dependencies between objects into account. Thus, the learned skills are difficult to combine in realistic environments. We propose a novel self-supervised agent that estimates relations between environment components and uses them to independently control different parts of the environment state. In addition, the estimated relations between objects can be used to decompose a complex goal into a compatible sequence of subgoals. We show that, by using this framework, an agent can efficiently and automatically learn manipulation tasks in multi-object environments with different relations between objects. △ Less

Submitted 30 January, 2022; v1 submitted 9 September, 2021; originally announced September 2021.

arXiv:2106.03443 [pdf, other]

Causal Influence Detection for Improving Efficiency in Reinforcement Learning

Authors: Maximilian Seitzer, Bernhard Schölkopf, Georg Martius

Abstract: Many reinforcement learning (RL) environments consist of independent entities that interact sparsely. In such environments, RL agents have only limited influence over other entities in any particular situation. Our idea in this work is that learning can be efficiently guided by knowing when and what the agent can influence with its actions. To achieve this, we introduce a measure of \emph{situatio… ▽ More Many reinforcement learning (RL) environments consist of independent entities that interact sparsely. In such environments, RL agents have only limited influence over other entities in any particular situation. Our idea in this work is that learning can be efficiently guided by knowing when and what the agent can influence with its actions. To achieve this, we introduce a measure of \emph{situation-dependent causal influence} based on conditional mutual information and show that it can reliably detect states of influence. We then propose several ways to integrate this measure into RL algorithms to improve exploration and off-policy learning. All modified algorithms show strong increases in data efficiency on robotic manipulation tasks. △ Less

Submitted 2 December, 2021; v1 submitted 7 June, 2021; originally announced June 2021.

Comments: NeurIPS 2021 camera-ready version. Code available at http://github.com/martius-lab/cid-in-rl

arXiv:2105.11914 [pdf, other]

Theory and Design of Super-resolution Haptic Skins

Authors: Huanbo Sun, Georg Martius

Abstract: Haptic feedback is important to make robots more dexterous and effective in unstructured environments. High-resolution haptic sensors are still not widely available, and their application is often bound by the resolution-robustness dilemma. A route towards high-resolution and robust skin embeds a few sensor units (taxels) into a flexible surface material and uses signal processing to achieve sensi… ▽ More Haptic feedback is important to make robots more dexterous and effective in unstructured environments. High-resolution haptic sensors are still not widely available, and their application is often bound by the resolution-robustness dilemma. A route towards high-resolution and robust skin embeds a few sensor units (taxels) into a flexible surface material and uses signal processing to achieve sensing with super-resolution accuracy. We propose a theory for geometric super-resolution to guide the development of haptic sensors of this kind and link it to machine learning techniques for signal processing. This theory is based on sensor isolines and allows us to predict force sensitivity and accuracy in contact position and force magnitude as a spatial quantity. We evaluate the influence of different factors, such as elastic properties of the material, structure design, and transduction methods, using finite element simulations and by implementing real sensors. We empirically determine sensor isolines and validate the theory in two custom-built sensors with barometric units for 1D and 2D measurement surfaces. Using machine learning methods for the inference of contact information, our sensors obtain an unparalleled average super-resolution factor of over 100 and 1200, respectively. Our theory can guide future haptic sensor designs and inform various design choices. △ Less

Submitted 24 August, 2021; v1 submitted 25 May, 2021; originally announced May 2021.

arXiv:2105.08635 [pdf, other]

doi 10.1109/SSCI44817.2019.9002779

Assessing aesthetics of generated abstract images using correlation structure

Authors: Sina Khajehabdollahi, Georg Martius, Anna Levina

Abstract: Can we generate abstract aesthetic images without bias from natural or human selected image corpi? Are aesthetic images singled out in their correlation functions? In this paper we give answers to these and more questions. We generate images using compositional pattern-producing networks with random weights and varying architecture. We demonstrate that even with the randomly selected weights the c… ▽ More Can we generate abstract aesthetic images without bias from natural or human selected image corpi? Are aesthetic images singled out in their correlation functions? In this paper we give answers to these and more questions. We generate images using compositional pattern-producing networks with random weights and varying architecture. We demonstrate that even with the randomly selected weights the correlation functions remain largely determined by the network architecture. In a controlled experiment, human subjects picked aesthetic images out of a large dataset of all generated images. Statistical analysis reveals that the correlation function is indeed different for aesthetic images. △ Less

Submitted 18 May, 2021; originally announced May 2021.

Journal ref: 2019 IEEE Symposium Series on Computational Intelligence (SSCI), 306-313

arXiv:2105.06331 [pdf, other]

Informed Equation Learning

Authors: Matthias Werner, Andrej Junginger, Philipp Hennig, Georg Martius

Abstract: Distilling data into compact and interpretable analytic equations is one of the goals of science. Instead, contemporary supervised machine learning methods mostly produce unstructured and dense maps from input to output. Particularly in deep learning, this property is owed to the generic nature of simple standard link functions. To learn equations rather than maps, standard non-linearities can be… ▽ More Distilling data into compact and interpretable analytic equations is one of the goals of science. Instead, contemporary supervised machine learning methods mostly produce unstructured and dense maps from input to output. Particularly in deep learning, this property is owed to the generic nature of simple standard link functions. To learn equations rather than maps, standard non-linearities can be replaced with structured building blocks of atomic functions. However, without strong priors on sparsity and structure, representational complexity and numerical conditioning limit this direct approach. To scale to realistic settings in science and engineering, we propose an informed equation learning system. It provides a way to incorporate expert knowledge about what are permitted or prohibited equation components, as well as a domain-dependent structured sparsity prior. Our system then utilizes a robust method to learn equations with atomic functions exhibiting singularities, as e.g. logarithm and division. We demonstrate several artificial and real-world experiments from the engineering domain, in which our system learns interpretable models of high predictive power. △ Less

Submitted 13 May, 2021; originally announced May 2021.

arXiv:2105.02343 [pdf, other]

CombOptNet: Fit the Right NP-Hard Problem by Learning Integer Programming Constraints

Authors: Anselm Paulus, Michal Rolínek, Vít Musil, Brandon Amos, Georg Martius

Abstract: Bridging logical and algorithmic reasoning with modern machine learning techniques is a fundamental challenge with potentially transformative impact. On the algorithmic side, many NP-hard problems can be expressed as integer programs, in which the constraints play the role of their "combinatorial specification." In this work, we aim to integrate integer programming solvers into neural network arch… ▽ More Bridging logical and algorithmic reasoning with modern machine learning techniques is a fundamental challenge with potentially transformative impact. On the algorithmic side, many NP-hard problems can be expressed as integer programs, in which the constraints play the role of their "combinatorial specification." In this work, we aim to integrate integer programming solvers into neural network architectures as layers capable of learning both the cost terms and the constraints. The resulting end-to-end trainable architectures jointly extract features from raw data and solve a suitable (learned) combinatorial problem with state-of-the-art integer programming solvers. We demonstrate the potential of such layers with an extensive performance analysis on synthetic data and with a demonstration on a competitive computer vision keypoint matching benchmark. △ Less

Submitted 11 April, 2022; v1 submitted 5 May, 2021; originally announced May 2021.

Comments: ICML 2021 conference paper

arXiv:2103.12184 [pdf, other]

doi 10.1162/isal_a_00412

The dynamical regime and its importance for evolvability, task performance and generalization

Authors: Jan Prosi, Sina Khajehabdollahi, Emmanouil Giannakakis, Georg Martius, Anna Levina

Abstract: It has long been hypothesized that operating close to the critical state is beneficial for natural and artificial systems. We test this hypothesis by evolving foraging agents controlled by neural networks that can change the system's dynamical regime throughout evolution. Surprisingly, we find that all populations, regardless of their initial regime, evolve to be subcritical in simple tasks and ev… ▽ More It has long been hypothesized that operating close to the critical state is beneficial for natural and artificial systems. We test this hypothesis by evolving foraging agents controlled by neural networks that can change the system's dynamical regime throughout evolution. Surprisingly, we find that all populations, regardless of their initial regime, evolve to be subcritical in simple tasks and even strongly subcritical populations can reach comparable performance. We hypothesize that the moderately subcritical regime combines the benefits of generalizability and adaptability brought by closeness to criticality with the stability of the dynamics characteristic for subcritical systems. By a resilience analysis, we find that initially critical agents maintain their fitness level even under environmental changes and degrade slowly with increasing perturbation strength. On the other hand, subcritical agents originally evolved to the same fitness, were often rendered utterly inadequate and degraded faster. We conclude that although the subcritical regime is preferable for a simple task, the optimal deviation from criticality depends on the task difficulty: for harder tasks, agents evolve closer to criticality. Furthermore, subcritical populations cannot find the path to decrease their distance to criticality. In summary, our study suggests that initializing models near criticality is important to find an optimal and flexible solution. △ Less

Submitted 22 March, 2021; originally announced March 2021.

Comments: 8 Pages, 7 Figures, Artificial Life Conference 2021

arXiv:2102.07456 [pdf, other]

Neuro-algorithmic Policies enable Fast Combinatorial Generalization

Authors: Marin Vlastelica, Michal Rolínek, Georg Martius

Abstract: Although model-based and model-free approaches to learning the control of systems have achieved impressive results on standard benchmarks, generalization to task variations is still lacking. Recent results suggest that generalization for standard architectures improves only after obtaining exhaustive amounts of data. We give evidence that generalization capabilities are in many cases bottlenecked… ▽ More Although model-based and model-free approaches to learning the control of systems have achieved impressive results on standard benchmarks, generalization to task variations is still lacking. Recent results suggest that generalization for standard architectures improves only after obtaining exhaustive amounts of data. We give evidence that generalization capabilities are in many cases bottlenecked by the inability to generalize on the combinatorial aspects of the problem. Furthermore, we show that for a certain subclass of the MDP framework, this can be alleviated by neuro-algorithmic architectures. Many control problems require long-term planning that is hard to solve generically with neural networks alone. We introduce a neuro-algorithmic policy architecture consisting of a neural network and an embedded time-dependent shortest path solver. These policies can be trained end-to-end by blackbox differentiation. We show that this type of architecture generalizes well to unseen variations in the environment already after seeing a few examples. △ Less

Submitted 15 February, 2021; originally announced February 2021.

Comments: 15 pages

Showing 1–50 of 73 results for author: Martius, G