-
Preference-Learning Emitters for Mixed-Initiative Quality-Diversity Algorithms
Authors:
Roberto Gallotta,
Kai Arulkumaran,
L. B. Soros
Abstract:
In mixed-initiative co-creation tasks, wherein a human and a machine jointly create items, it is important to provide multiple relevant suggestions to the designer. Quality-diversity algorithms are commonly used for this purpose, as they can provide diverse suggestions that represent salient areas of the solution space, showcasing designs with high fitness and wide variety. Because generated sugge…
▽ More
In mixed-initiative co-creation tasks, wherein a human and a machine jointly create items, it is important to provide multiple relevant suggestions to the designer. Quality-diversity algorithms are commonly used for this purpose, as they can provide diverse suggestions that represent salient areas of the solution space, showcasing designs with high fitness and wide variety. Because generated suggestions drive the search process, it is important that they provide inspiration, but also stay aligned with the designer's intentions. Additionally, often many interactions with the system are required before the designer is content with a solution. In this work, we tackle these challenges with an interactive constrained MAP-Elites system that leverages emitters to learn the preferences of the designer and then use them in automated steps. By learning preferences, the generated designs remain aligned with the designer's intent, and by applying automatic steps, we generate more solutions per user interaction, giving a larger number of choices to the designer and thereby speeding up the search. We propose a general framework for preference-learning emitters (PLEs) and apply it to a procedural content generation task in the video game Space Engineers. We built an interactive application for our algorithm and performed a user study with players.
△ Less
Submitted 14 April, 2023; v1 submitted 25 October, 2022;
originally announced October 2022.
-
Surrogate Infeasible Fitness Acquirement FI-2Pop for Procedural Content Generation
Authors:
Roberto Gallotta,
Kai Arulkumaran,
L. B. Soros
Abstract:
When generating content for video games using procedural content generation (PCG), the goal is to create functional assets of high quality. Prior work has commonly leveraged the feasible-infeasible two-population (FI-2Pop) constrained optimisation algorithm for PCG, sometimes in combination with the multi-dimensional archive of phenotypic-elites (MAP-Elites) algorithm for finding a set of diverse…
▽ More
When generating content for video games using procedural content generation (PCG), the goal is to create functional assets of high quality. Prior work has commonly leveraged the feasible-infeasible two-population (FI-2Pop) constrained optimisation algorithm for PCG, sometimes in combination with the multi-dimensional archive of phenotypic-elites (MAP-Elites) algorithm for finding a set of diverse solutions. However, the fitness function for the infeasible population only takes into account the number of constraints violated. In this paper we present a variant of FI-2Pop in which a surrogate model is trained to predict the fitness of feasible children from infeasible parents, weighted by the probability of producing feasible children. This drives selection towards higher-fitness, feasible solutions. We demonstrate our method on the task of generating spaceships for Space Engineers, showing improvements over both standard FI-2Pop, and the more recent multi-emitter constrained MAP-Elites algorithm.
△ Less
Submitted 11 May, 2022;
originally announced May 2022.
-
On the link between conscious function and general intelligence in humans and machines
Authors:
Arthur Juliani,
Kai Arulkumaran,
Shuntaro Sasai,
Ryota Kanai
Abstract:
In popular media, there is often a connection drawn between the advent of awareness in artificial agents and those same agents simultaneously achieving human or superhuman level intelligence. In this work, we explore the validity and potential application of this seemingly intuitive link between consciousness and intelligence. We do so by examining the cognitive abilities associated with three con…
▽ More
In popular media, there is often a connection drawn between the advent of awareness in artificial agents and those same agents simultaneously achieving human or superhuman level intelligence. In this work, we explore the validity and potential application of this seemingly intuitive link between consciousness and intelligence. We do so by examining the cognitive abilities associated with three contemporary theories of conscious function: Global Workspace Theory (GWT), Information Generation Theory (IGT), and Attention Schema Theory (AST). We find that all three theories specifically relate conscious function to some aspect of domain-general intelligence in humans. With this insight, we turn to the field of Artificial Intelligence (AI) and find that, while still far from demonstrating general intelligence, many state-of-the-art deep learning methods have begun to incorporate key aspects of each of the three functional theories. Having identified this trend, we use the motivating example of mental time travel in humans to propose ways in which insights from each of the three theories may be combined into a single unified and implementable model. Given that it is made possible by cognitive abilities underlying each of the three functional theories, artificial agents capable of mental time travel would not only possess greater general intelligence than current approaches, but also be more consistent with our current understanding of the functional role of consciousness in humans, thus making it a promising near-term goal for AI research.
△ Less
Submitted 19 July, 2022; v1 submitted 23 March, 2022;
originally announced April 2022.
-
Learning Relative Return Policies With Upside-Down Reinforcement Learning
Authors:
Dylan R. Ashley,
Kai Arulkumaran,
Jürgen Schmidhuber,
Rupesh Kumar Srivastava
Abstract:
Lately, there has been a resurgence of interest in using supervised learning to solve reinforcement learning problems. Recent work in this area has largely focused on learning command-conditioned policies. We investigate the potential of one such method -- upside-down reinforcement learning -- to work with commands that specify a desired relationship between some scalar value and the observed retu…
▽ More
Lately, there has been a resurgence of interest in using supervised learning to solve reinforcement learning problems. Recent work in this area has largely focused on learning command-conditioned policies. We investigate the potential of one such method -- upside-down reinforcement learning -- to work with commands that specify a desired relationship between some scalar value and the observed return. We show that upside-down reinforcement learning can learn to carry out such commands online in a tabular bandit setting and in CartPole with non-linear function approximation. By doing so, we demonstrate the power of this family of methods and open the way for their practical use under more complicated command structures.
△ Less
Submitted 10 May, 2022; v1 submitted 23 February, 2022;
originally announced February 2022.
-
All You Need Is Supervised Learning: From Imitation Learning to Meta-RL With Upside Down RL
Authors:
Kai Arulkumaran,
Dylan R. Ashley,
Jürgen Schmidhuber,
Rupesh K. Srivastava
Abstract:
Upside down reinforcement learning (UDRL) flips the conventional use of the return in the objective function in RL upside down, by taking returns as input and predicting actions. UDRL is based purely on supervised learning, and bypasses some prominent issues in RL: bootstrapping, off-policy corrections, and discount factors. While previous work with UDRL demonstrated it in a traditional online RL…
▽ More
Upside down reinforcement learning (UDRL) flips the conventional use of the return in the objective function in RL upside down, by taking returns as input and predicting actions. UDRL is based purely on supervised learning, and bypasses some prominent issues in RL: bootstrapping, off-policy corrections, and discount factors. While previous work with UDRL demonstrated it in a traditional online RL setting, here we show that this single algorithm can also work in the imitation learning and offline RL settings, be extended to the goal-conditioned RL setting, and even the meta-RL setting. With a general agent architecture, a single UDRL agent can learn across all paradigms.
△ Less
Submitted 24 February, 2022;
originally announced February 2022.
-
Diversity-based Trajectory and Goal Selection with Hindsight Experience Replay
Authors:
Tianhong Dai,
Hengyan Liu,
Kai Arulkumaran,
Guangyu Ren,
Anil Anthony Bharath
Abstract:
Hindsight experience replay (HER) is a goal relabelling technique typically used with off-policy deep reinforcement learning algorithms to solve goal-oriented tasks; it is well suited to robotic manipulation tasks that deliver only sparse rewards. In HER, both trajectories and transitions are sampled uniformly for training. However, not all of the agent's experiences contribute equally to training…
▽ More
Hindsight experience replay (HER) is a goal relabelling technique typically used with off-policy deep reinforcement learning algorithms to solve goal-oriented tasks; it is well suited to robotic manipulation tasks that deliver only sparse rewards. In HER, both trajectories and transitions are sampled uniformly for training. However, not all of the agent's experiences contribute equally to training, and so naive uniform sampling may lead to inefficient learning. In this paper, we propose diversity-based trajectory and goal selection with HER (DTGSH). Firstly, trajectories are sampled according to the diversity of the goal states as modelled by determinantal point processes (DPPs). Secondly, transitions with diverse goal states are selected from the trajectories by using k-DPPs. We evaluate DTGSH on five challenging robotic manipulation tasks in simulated robot environments, where we show that our method can learn more quickly and reach higher performance than other state-of-the-art approaches on all tasks.
△ Less
Submitted 8 November, 2021; v1 submitted 17 August, 2021;
originally announced August 2021.
-
A Pragmatic Look at Deep Imitation Learning
Authors:
Kai Arulkumaran,
Dan Ogawa Lillrank
Abstract:
The introduction of the generative adversarial imitation learning (GAIL) algorithm has spurred the development of scalable imitation learning approaches using deep neural networks. Many of the algorithms that followed used a similar procedure, combining on-policy actor-critic algorithms with inverse reinforcement learning. More recently there have been an even larger breadth of approaches, most of…
▽ More
The introduction of the generative adversarial imitation learning (GAIL) algorithm has spurred the development of scalable imitation learning approaches using deep neural networks. Many of the algorithms that followed used a similar procedure, combining on-policy actor-critic algorithms with inverse reinforcement learning. More recently there have been an even larger breadth of approaches, most of which use off-policy algorithms. However, with the breadth of algorithms, everything from datasets to base reinforcement learning algorithms to evaluation settings can vary, making it difficult to fairly compare them. In this work we re-implement 6 different IL algorithms, updating 3 of them to be off-policy, base them on a common off-policy algorithm (SAC), and evaluate them on a widely-used expert trajectory dataset (D4RL) for the most common benchmark (MuJoCo). After giving all algorithms the same hyperparameter optimisation budget, we compare their results for a range of expert trajectories. In summary, GAIL, with all of its improvements, consistently performs well across a range of sample sizes, AdRIL is a simple contender that performs well with one important hyperparameter to tune, and behavioural cloning remains a strong baseline when data is more plentiful.
△ Less
Submitted 19 September, 2023; v1 submitted 4 August, 2021;
originally announced August 2021.
-
Privileged Information Dropout in Reinforcement Learning
Authors:
Pierre-Alexandre Kamienny,
Kai Arulkumaran,
Feryal Behbahani,
Wendelin Boehmer,
Shimon Whiteson
Abstract:
Using privileged information during training can improve the sample efficiency and performance of machine learning systems. This paradigm has been applied to reinforcement learning (RL), primarily in the form of distillation or auxiliary tasks, and less commonly in the form of augmenting the inputs of agents. In this work, we investigate Privileged Information Dropout (\pid) for achieving the latt…
▽ More
Using privileged information during training can improve the sample efficiency and performance of machine learning systems. This paradigm has been applied to reinforcement learning (RL), primarily in the form of distillation or auxiliary tasks, and less commonly in the form of augmenting the inputs of agents. In this work, we investigate Privileged Information Dropout (\pid) for achieving the latter which can be applied equally to value-based and policy-based RL algorithms. Within a simple partially-observed environment, we demonstrate that \pid outperforms alternatives for leveraging privileged information, including distillation and auxiliary tasks, and can successfully utilise different types of privileged information. Finally, we analyse its effect on the learned representations.
△ Less
Submitted 19 May, 2020;
originally announced May 2020.
-
Analysing Deep Reinforcement Learning Agents Trained with Domain Randomisation
Authors:
Tianhong Dai,
Kai Arulkumaran,
Tamara Gerbert,
Samyakh Tukra,
Feryal Behbahani,
Anil Anthony Bharath
Abstract:
Deep reinforcement learning has the potential to train robots to perform complex tasks in the real world without requiring accurate models of the robot or its environment. A practical approach is to train agents in simulation, and then transfer them to the real world. One popular method for achieving transferability is to use domain randomisation, which involves randomly perturbing various aspects…
▽ More
Deep reinforcement learning has the potential to train robots to perform complex tasks in the real world without requiring accurate models of the robot or its environment. A practical approach is to train agents in simulation, and then transfer them to the real world. One popular method for achieving transferability is to use domain randomisation, which involves randomly perturbing various aspects of a simulated environment in order to make trained agents robust to the reality gap. However, less work has gone into understanding such agents - which are deployed in the real world - beyond task performance. In this work we examine such agents, through qualitative and quantitative comparisons between agents trained with and without visual domain randomisation. We train agents for Fetch and Jaco robots on a visuomotor control task and evaluate how well they generalise using different testing conditions. Finally, we investigate the internals of the trained agents by using a suite of interpretability techniques. Our results show that the primary outcome of domain randomisation is more robust, entangled representations, accompanied with larger weights with greater spatial structure; moreover, the types of changes are heavily influenced by the task setup and presence of additional proprioceptive inputs. Additionally, we demonstrate that our domain randomised agents require higher sample complexity, can overfit and more heavily rely on recurrent processing. Furthermore, even with an improved saliency method introduced in this work, we show that qualitative studies may not always correspond with quantitative measures, necessitating the combination of inspection tools in order to provide sufficient insights into the behaviour of trained agents.
△ Less
Submitted 17 February, 2020; v1 submitted 17 December, 2019;
originally announced December 2019.
-
Sample-Efficient Reinforcement Learning with Maximum Entropy Mellowmax Episodic Control
Authors:
Marta Sarrico,
Kai Arulkumaran,
Andrea Agostinelli,
Pierre Richemond,
Anil Anthony Bharath
Abstract:
Deep networks have enabled reinforcement learning to scale to more complex and challenging domains, but these methods typically require large quantities of training data. An alternative is to use sample-efficient episodic control methods: neuro-inspired algorithms which use non-/semi-parametric models that predict values based on storing and retrieving previously experienced transitions. One way t…
▽ More
Deep networks have enabled reinforcement learning to scale to more complex and challenging domains, but these methods typically require large quantities of training data. An alternative is to use sample-efficient episodic control methods: neuro-inspired algorithms which use non-/semi-parametric models that predict values based on storing and retrieving previously experienced transitions. One way to further improve the sample efficiency of these approaches is to use more principled exploration strategies. In this work, we therefore propose maximum entropy mellowmax episodic control (MEMEC), which samples actions according to a Boltzmann policy with a state-dependent temperature. We demonstrate that MEMEC outperforms other uncertainty- and softmax-based exploration methods on classic reinforcement learning environments and Atari games, achieving both more rapid learning and higher final rewards.
△ Less
Submitted 21 November, 2019;
originally announced November 2019.
-
Memory-Efficient Episodic Control Reinforcement Learning with Dynamic Online k-means
Authors:
Andrea Agostinelli,
Kai Arulkumaran,
Marta Sarrico,
Pierre Richemond,
Anil Anthony Bharath
Abstract:
Recently, neuro-inspired episodic control (EC) methods have been developed to overcome the data-inefficiency of standard deep reinforcement learning approaches. Using non-/semi-parametric models to estimate the value function, they learn rapidly, retrieving cached values from similar past states. In realistic scenarios, with limited resources and noisy data, maintaining meaningful representations…
▽ More
Recently, neuro-inspired episodic control (EC) methods have been developed to overcome the data-inefficiency of standard deep reinforcement learning approaches. Using non-/semi-parametric models to estimate the value function, they learn rapidly, retrieving cached values from similar past states. In realistic scenarios, with limited resources and noisy data, maintaining meaningful representations in memory is essential to speed up the learning and avoid catastrophic forgetting. Unfortunately, EC methods have a large space and time complexity. We investigate different solutions to these problems based on prioritising and ranking stored states, as well as online clustering techniques. We also propose a new dynamic online k-means algorithm that is both computationally-efficient and yields significantly better performance at smaller memory sizes; we validate this approach on classic reinforcement learning environments and Atari games.
△ Less
Submitted 21 November, 2019;
originally announced November 2019.
-
AlphaStar: An Evolutionary Computation Perspective
Authors:
Kai Arulkumaran,
Antoine Cully,
Julian Togelius
Abstract:
In January 2019, DeepMind revealed AlphaStar to the world-the first artificial intelligence (AI) system to beat a professional player at the game of StarCraft II-representing a milestone in the progress of AI. AlphaStar draws on many areas of AI research, including deep learning, reinforcement learning, game theory, and evolutionary computation (EC). In this paper we analyze AlphaStar primarily th…
▽ More
In January 2019, DeepMind revealed AlphaStar to the world-the first artificial intelligence (AI) system to beat a professional player at the game of StarCraft II-representing a milestone in the progress of AI. AlphaStar draws on many areas of AI research, including deep learning, reinforcement learning, game theory, and evolutionary computation (EC). In this paper we analyze AlphaStar primarily through the lens of EC, presenting a new look at the system and relating it to many concepts in the field. We highlight some of its most interesting aspects-the use of Lamarckian evolution, competitive co-evolution, and quality diversity. In doing so, we hope to provide a bridge between the wider EC community and one of the most significant AI systems developed in recent times.
△ Less
Submitted 14 July, 2019; v1 submitted 5 February, 2019;
originally announced February 2019.
-
Adaptive Neural Trees
Authors:
Ryutaro Tanno,
Kai Arulkumaran,
Daniel C. Alexander,
Antonio Criminisi,
Aditya Nori
Abstract:
Deep neural networks and decision trees operate on largely separate paradigms; typically, the former performs representation learning with pre-specified architectures, while the latter is characterised by learning hierarchies over pre-specified features with data-driven architectures. We unite the two via adaptive neural trees (ANTs) that incorporates representation learning into edges, routing fu…
▽ More
Deep neural networks and decision trees operate on largely separate paradigms; typically, the former performs representation learning with pre-specified architectures, while the latter is characterised by learning hierarchies over pre-specified features with data-driven architectures. We unite the two via adaptive neural trees (ANTs) that incorporates representation learning into edges, routing functions and leaf nodes of a decision tree, along with a backpropagation-based training algorithm that adaptively grows the architecture from primitive modules (e.g., convolutional layers). We demonstrate that, whilst achieving competitive performance on classification and regression datasets, ANTs benefit from (i) lightweight inference via conditional computation, (ii) hierarchical separation of features useful to the task e.g. learning meaningful class associations, such as separating natural vs. man-made objects, and (iii) a mechanism to adapt the architecture to the size and complexity of the training dataset.
△ Less
Submitted 9 June, 2019; v1 submitted 17 July, 2018;
originally announced July 2018.
-
Variational Inference for Data-Efficient Model Learning in POMDPs
Authors:
Sebastian Tschiatschek,
Kai Arulkumaran,
Jan Stühmer,
Katja Hofmann
Abstract:
Partially observable Markov decision processes (POMDPs) are a powerful abstraction for tasks that require decision making under uncertainty, and capture a wide range of real world tasks. Today, effective planning approaches exist that generate effective strategies given black-box models of a POMDP task. Yet, an open question is how to acquire accurate models for complex domains. In this paper we p…
▽ More
Partially observable Markov decision processes (POMDPs) are a powerful abstraction for tasks that require decision making under uncertainty, and capture a wide range of real world tasks. Today, effective planning approaches exist that generate effective strategies given black-box models of a POMDP task. Yet, an open question is how to acquire accurate models for complex domains. In this paper we propose DELIP, an approach to model learning for POMDPs that utilizes amortized structured variational inference. We empirically show that our model leads to effective control strategies when coupled with state-of-the-art planners. Intuitively, model-based approaches should be particularly beneficial in environments with changing reward structures, or where rewards are initially unknown. Our experiments confirm that DELIP is particularly effective in this setting.
△ Less
Submitted 23 May, 2018;
originally announced May 2018.
-
Generative Adversarial Networks: An Overview
Authors:
Antonia Creswell,
Tom White,
Vincent Dumoulin,
Kai Arulkumaran,
Biswa Sengupta,
Anil A Bharath
Abstract:
Generative adversarial networks (GANs) provide a way to learn deep representations without extensively annotated training data. They achieve this through deriving backpropagation signals through a competitive process involving a pair of networks. The representations that can be learned by GANs may be used in a variety of applications, including image synthesis, semantic image editing, style transf…
▽ More
Generative adversarial networks (GANs) provide a way to learn deep representations without extensively annotated training data. They achieve this through deriving backpropagation signals through a competitive process involving a pair of networks. The representations that can be learned by GANs may be used in a variety of applications, including image synthesis, semantic image editing, style transfer, image super-resolution and classification. The aim of this review paper is to provide an overview of GANs for the signal processing community, drawing on familiar analogies and concepts where possible. In addition to identifying different methods for training and constructing GANs, we also point to remaining challenges in their theory and application.
△ Less
Submitted 19 October, 2017;
originally announced October 2017.
-
On denoising autoencoders trained to minimise binary cross-entropy
Authors:
Antonia Creswell,
Kai Arulkumaran,
Anil A. Bharath
Abstract:
Denoising autoencoders (DAEs) are powerful deep learning models used for feature extraction, data generation and network pre-training. DAEs consist of an encoder and decoder which may be trained simultaneously to minimise a loss (function) between an input and the reconstruction of a corrupted version of the input. There are two common loss functions used for training autoencoders, these include t…
▽ More
Denoising autoencoders (DAEs) are powerful deep learning models used for feature extraction, data generation and network pre-training. DAEs consist of an encoder and decoder which may be trained simultaneously to minimise a loss (function) between an input and the reconstruction of a corrupted version of the input. There are two common loss functions used for training autoencoders, these include the mean-squared error (MSE) and the binary cross-entropy (BCE). When training autoencoders on image data a natural choice of loss function is BCE, since pixel values may be normalised to take values in [0,1] and the decoder model may be designed to generate samples that take values in (0,1). We show theoretically that DAEs trained to minimise BCE may be used to take gradient steps in the data space towards regions of high probability under the data-generating distribution. Previously this had only been shown for DAEs trained using MSE. As a consequence of the theory, iterative application of a trained DAE moves a data sample from regions of low probability to regions of higher probability under the data-generating distribution. Firstly, we validate the theory by showing that novel data samples, consistent with the training data, may be synthesised when the initial data samples are random noise. Secondly, we motivate the theory by showing that initial data samples synthesised via other methods may be improved via iterative application of a trained DAE to those initial samples.
△ Less
Submitted 9 October, 2017; v1 submitted 28 August, 2017;
originally announced August 2017.
-
A Brief Survey of Deep Reinforcement Learning
Authors:
Kai Arulkumaran,
Marc Peter Deisenroth,
Miles Brundage,
Anil Anthony Bharath
Abstract:
Deep reinforcement learning is poised to revolutionise the field of AI and represents a step towards building autonomous systems with a higher level understanding of the visual world. Currently, deep learning is enabling reinforcement learning to scale to problems that were previously intractable, such as learning to play video games directly from pixels. Deep reinforcement learning algorithms are…
▽ More
Deep reinforcement learning is poised to revolutionise the field of AI and represents a step towards building autonomous systems with a higher level understanding of the visual world. Currently, deep learning is enabling reinforcement learning to scale to problems that were previously intractable, such as learning to play video games directly from pixels. Deep reinforcement learning algorithms are also applied to robotics, allowing control policies for robots to be learned directly from camera inputs in the real world. In this survey, we begin with an introduction to the general field of reinforcement learning, then progress to the main streams of value-based and policy-based methods. Our survey will cover central algorithms in deep reinforcement learning, including the deep $Q$-network, trust region policy optimisation, and asynchronous advantage actor-critic. In parallel, we highlight the unique advantages of deep neural networks, focusing on visual understanding via reinforcement learning. To conclude, we describe several current areas of research within the field.
△ Less
Submitted 28 September, 2017; v1 submitted 19 August, 2017;
originally announced August 2017.
-
Deep Unsupervised Clustering with Gaussian Mixture Variational Autoencoders
Authors:
Nat Dilokthanakul,
Pedro A. M. Mediano,
Marta Garnelo,
Matthew C. H. Lee,
Hugh Salimbeni,
Kai Arulkumaran,
Murray Shanahan
Abstract:
We study a variant of the variational autoencoder model (VAE) with a Gaussian mixture as a prior distribution, with the goal of performing unsupervised clustering through deep generative models. We observe that the known problem of over-regularisation that has been shown to arise in regular VAEs also manifests itself in our model and leads to cluster degeneracy. We show that a heuristic called min…
▽ More
We study a variant of the variational autoencoder model (VAE) with a Gaussian mixture as a prior distribution, with the goal of performing unsupervised clustering through deep generative models. We observe that the known problem of over-regularisation that has been shown to arise in regular VAEs also manifests itself in our model and leads to cluster degeneracy. We show that a heuristic called minimum information constraint that has been shown to mitigate this effect in VAEs can also be applied to improve unsupervised clustering performance with our model. Furthermore we analyse the effect of this heuristic and provide an intuition of the various processes with the help of visualizations. Finally, we demonstrate the performance of our model on synthetic data, MNIST and SVHN, showing that the obtained clusters are distinct, interpretable and result in achieving competitive performance on unsupervised clustering to the state-of-the-art results.
△ Less
Submitted 13 January, 2017; v1 submitted 8 November, 2016;
originally announced November 2016.
-
Improving Sampling from Generative Autoencoders with Markov Chains
Authors:
Antonia Creswell,
Kai Arulkumaran,
Anil Anthony Bharath
Abstract:
We focus on generative autoencoders, such as variational or adversarial autoencoders, which jointly learn a generative model alongside an inference model. Generative autoencoders are those which are trained to softly enforce a prior on the latent distribution learned by the inference model. We call the distribution to which the inference model maps observed samples, the learned latent distribution…
▽ More
We focus on generative autoencoders, such as variational or adversarial autoencoders, which jointly learn a generative model alongside an inference model. Generative autoencoders are those which are trained to softly enforce a prior on the latent distribution learned by the inference model. We call the distribution to which the inference model maps observed samples, the learned latent distribution, which may not be consistent with the prior. We formulate a Markov chain Monte Carlo (MCMC) sampling process, equivalent to iteratively decoding and encoding, which allows us to sample from the learned latent distribution. Since, the generative model learns to map from the learned latent distribution, rather than the prior, we may use MCMC to improve the quality of samples drawn from the generative model, especially when the learned latent distribution is far from the prior. Using MCMC sampling, we are able to reveal previously unseen differences between generative autoencoders trained either with or without a denoising criterion.
△ Less
Submitted 12 January, 2017; v1 submitted 28 October, 2016;
originally announced October 2016.
-
Towards Deep Symbolic Reinforcement Learning
Authors:
Marta Garnelo,
Kai Arulkumaran,
Murray Shanahan
Abstract:
Deep reinforcement learning (DRL) brings the power of deep neural networks to bear on the generic task of trial-and-error learning, and its effectiveness has been convincingly demonstrated on tasks such as Atari video games and the game of Go. However, contemporary DRL systems inherit a number of shortcomings from the current generation of deep learning techniques. For example, they require very l…
▽ More
Deep reinforcement learning (DRL) brings the power of deep neural networks to bear on the generic task of trial-and-error learning, and its effectiveness has been convincingly demonstrated on tasks such as Atari video games and the game of Go. However, contemporary DRL systems inherit a number of shortcomings from the current generation of deep learning techniques. For example, they require very large datasets to work effectively, entailing that they are slow to learn even when such datasets are available. Moreover, they lack the ability to reason on an abstract level, which makes it difficult to implement high-level cognitive functions such as transfer learning, analogical reasoning, and hypothesis-based reasoning. Finally, their operation is largely opaque to humans, rendering them unsuitable for domains in which verifiability is important. In this paper, we propose an end-to-end reinforcement learning architecture comprising a neural back end and a symbolic front end with the potential to overcome each of these shortcomings. As proof-of-concept, we present a preliminary implementation of the architecture and apply it to several variants of a simple video game. We show that the resulting system -- though just a prototype -- learns effectively, and, by acquiring a set of symbolic rules that are easily comprehensible to humans, dramatically outperforms a conventional, fully neural DRL system on a stochastic variant of the game.
△ Less
Submitted 1 October, 2016; v1 submitted 18 September, 2016;
originally announced September 2016.
-
Classifying Options for Deep Reinforcement Learning
Authors:
Kai Arulkumaran,
Nat Dilokthanakul,
Murray Shanahan,
Anil Anthony Bharath
Abstract:
In this paper we combine one method for hierarchical reinforcement learning - the options framework - with deep Q-networks (DQNs) through the use of different "option heads" on the policy network, and a supervisory network for choosing between the different options. We utilise our setup to investigate the effects of architectural constraints in subtasks with positive and negative transfer, across…
▽ More
In this paper we combine one method for hierarchical reinforcement learning - the options framework - with deep Q-networks (DQNs) through the use of different "option heads" on the policy network, and a supervisory network for choosing between the different options. We utilise our setup to investigate the effects of architectural constraints in subtasks with positive and negative transfer, across a range of network capacities. We empirically show that our augmented DQN has lower sample complexity when simultaneously learning subtasks with negative transfer, without degrading performance when learning subtasks with positive transfer.
△ Less
Submitted 19 June, 2017; v1 submitted 27 April, 2016;
originally announced April 2016.