Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–17 of 17 results for author: Lillicrap, T P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.00451  [pdf, other

    cs.AI cs.LG

    Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning

    Authors: Yuxi Xie, Anirudh Goyal, Wenyue Zheng, Min-Yen Kan, Timothy P. Lillicrap, Kenji Kawaguchi, Michael Shieh

    Abstract: We introduce an approach aimed at enhancing the reasoning capabilities of Large Language Models (LLMs) through an iterative preference learning process inspired by the successful strategy employed by AlphaZero. Our work leverages Monte Carlo Tree Search (MCTS) to iteratively collect preference data, utilizing its look-ahead ability to break down instance-level rewards into more granular step-level… ▽ More

    Submitted 17 June, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

    Comments: 10 pages, 4 figures, 4 tables (24 pages, 9 figures, 9 tables including references and appendices)

  2. arXiv:2106.13031  [pdf, other

    cs.LG cs.NE q-bio.NC

    Towards Biologically Plausible Convolutional Networks

    Authors: Roman Pogodin, Yash Mehta, Timothy P. Lillicrap, Peter E. Latham

    Abstract: Convolutional networks are ubiquitous in deep learning. They are particularly useful for images, as they reduce the number of parameters, reduce training time, and increase accuracy. However, as a model of the brain they are seriously problematic, since they require weight sharing - something real neurons simply cannot do. Consequently, while neurons in the brain can be locally connected (one of t… ▽ More

    Submitted 15 January, 2022; v1 submitted 22 June, 2021; originally announced June 2021.

  3. arXiv:2010.15040  [pdf, other

    stat.ML cs.LG

    Training Generative Adversarial Networks by Solving Ordinary Differential Equations

    Authors: Chongli Qin, Yan Wu, Jost Tobias Springenberg, Andrew Brock, Jeff Donahue, Timothy P. Lillicrap, Pushmeet Kohli

    Abstract: The instability of Generative Adversarial Network (GAN) training has frequently been attributed to gradient descent. Consequently, recent methods have aimed to tailor the models and training procedures to stabilise the discrete updates. In contrast, we study the continuous-time dynamics induced by GAN training. Both theory and toy experiments suggest that these dynamics are in fact surprisingly st… ▽ More

    Submitted 28 November, 2020; v1 submitted 28 October, 2020; originally announced October 2020.

  4. arXiv:1911.05507  [pdf, other

    cs.LG stat.ML

    Compressive Transformers for Long-Range Sequence Modelling

    Authors: Jack W. Rae, Anna Potapenko, Siddhant M. Jayakumar, Timothy P. Lillicrap

    Abstract: We present the Compressive Transformer, an attentive sequence model which compresses past memories for long-range sequence learning. We find the Compressive Transformer obtains state-of-the-art language modelling results in the WikiText-103 and Enwik8 benchmarks, achieving 17.1 ppl and 0.97 bpc respectively. We also find it can model high-frequency speech effectively and can be used as a memory me… ▽ More

    Submitted 13 November, 2019; originally announced November 2019.

    Comments: 19 pages, 6 figures, 10 tables

  5. arXiv:1910.02720  [pdf, other

    stat.ML cs.LG cs.NE

    Meta-Learning Deep Energy-Based Memory Models

    Authors: Sergey Bartunov, Jack W Rae, Simon Osindero, Timothy P Lillicrap

    Abstract: We study the problem of learning associative memory -- a system which is able to retrieve a remembered pattern based on its distorted or incomplete version. Attractor networks provide a sound model of associative memory: patterns are stored as attractors of the network dynamics and associative retrieval is performed by running the dynamics starting from a query pattern until it converges to an att… ▽ More

    Submitted 20 April, 2021; v1 submitted 7 October, 2019; originally announced October 2019.

    Comments: ICLR 2020

  6. arXiv:1909.12892  [pdf, other

    cs.LG cs.AI stat.ML

    Automated curricula through setter-solver interactions

    Authors: Sebastien Racaniere, Andrew K. Lampinen, Adam Santoro, David P. Reichert, Vlad Firoiu, Timothy P. Lillicrap

    Abstract: Reinforcement learning algorithms use correlations between policies and rewards to improve agent performance. But in dynamic or sparsely rewarding environments these correlations are often too small, or rewarding events are too infrequent to make learning feasible. Human education instead relies on curricula--the breakdown of tasks into simpler, static challenges with dense rewards--to build up to… ▽ More

    Submitted 21 January, 2020; v1 submitted 27 September, 2019; originally announced September 2019.

    Journal ref: International Conference on Learning Representations, 2020

  7. arXiv:1907.06374  [pdf, other

    cs.LG q-bio.NC stat.ML

    What does it mean to understand a neural network?

    Authors: Timothy P. Lillicrap, Konrad P. Kording

    Abstract: We can define a neural network that can learn to recognize objects in less than 100 lines of code. However, after training, it is characterized by millions of weights that contain the knowledge about many object types across visual scenes. Such networks are thus dramatically easier to understand in terms of the code that makes them than the resulting properties, such as tuning or connections. In a… ▽ More

    Submitted 15 July, 2019; originally announced July 2019.

    Comments: 9 pages, 2 figures

  8. arXiv:1906.04304  [pdf, other

    cs.LG cs.DB cs.DS stat.ML

    Meta-Learning Neural Bloom Filters

    Authors: Jack W Rae, Sergey Bartunov, Timothy P Lillicrap

    Abstract: There has been a recent trend in training neural networks to replace data structures that have been crafted by hand, with an aim for faster execution, better accuracy, or greater compression. In this setting, a neural data structure is instantiated by training a network over many epochs of its inputs until convergence. In applications where inputs arrive at high throughput, or are ephemeral, train… ▽ More

    Submitted 10 June, 2019; originally announced June 2019.

    Comments: International Conference on Machine Learning 2019

  9. arXiv:1812.02216  [pdf, other

    cs.LG stat.ML

    Composing Entropic Policies using Divergence Correction

    Authors: Jonathan J Hunt, Andre Barreto, Timothy P Lillicrap, Nicolas Heess

    Abstract: Composing previously mastered skills to solve novel tasks promises dramatic improvements in the data efficiency of reinforcement learning. Here, we analyze two recent works composing behaviors represented in the form of action-value functions and show that they perform poorly in some situations. As part of this analysis, we extend an important generalization of policy improvement to the maximum en… ▽ More

    Submitted 5 July, 2019; v1 submitted 5 December, 2018; originally announced December 2018.

  10. arXiv:1811.11682  [pdf, other

    cs.LG cs.AI stat.ML

    Experience Replay for Continual Learning

    Authors: David Rolnick, Arun Ahuja, Jonathan Schwarz, Timothy P. Lillicrap, Greg Wayne

    Abstract: Continual learning is the problem of learning new tasks or knowledge while protecting old knowledge and ideally generalizing from old experience to learn new tasks faster. Neural networks trained by stochastic gradient descent often degrade on old tasks when trained successively on new tasks with different data distributions. This phenomenon, referred to as catastrophic forgetting, is considered a… ▽ More

    Submitted 26 November, 2019; v1 submitted 28 November, 2018; originally announced November 2018.

    Comments: NeurIPS 2019

  11. arXiv:1803.10049  [pdf, other

    cs.LG stat.ML

    Fast Parametric Learning with Activation Memorization

    Authors: Jack W Rae, Chris Dyer, Peter Dayan, Timothy P Lillicrap

    Abstract: Neural networks trained with backpropagation often struggle to identify classes that have been observed a small number of times. In applications where most class labels are rare, such as language modelling, this can become a performance bottleneck. One potential remedy is to augment the network with a fast-learning non-parametric model which stores recent activations and class labels into an exter… ▽ More

    Submitted 27 March, 2018; originally announced March 2018.

  12. arXiv:1611.03824  [pdf, other

    stat.ML cs.LG

    Learning to Learn without Gradient Descent by Gradient Descent

    Authors: Yutian Chen, Matthew W. Hoffman, Sergio Gomez Colmenarejo, Misha Denil, Timothy P. Lillicrap, Matt Botvinick, Nando de Freitas

    Abstract: We learn recurrent neural network optimizers trained on simple synthetic functions by gradient descent. We show that these learned optimizers exhibit a remarkable degree of transfer in that they can be used to efficiently optimize a broad range of derivative-free black-box functions, including Gaussian process bandits, simple control objectives, global optimization benchmarks and hyper-parameter t… ▽ More

    Submitted 12 June, 2017; v1 submitted 11 November, 2016; originally announced November 2016.

    Comments: Accepted by ICML 2017. Previous version "Learning to Learn for Global Optimization of Black Box Functions" was published in the Deep Reinforcement Learning Workshop, NIPS 2016

  13. arXiv:1610.09027  [pdf, other

    cs.LG

    Scaling Memory-Augmented Neural Networks with Sparse Reads and Writes

    Authors: Jack W Rae, Jonathan J Hunt, Tim Harley, Ivo Danihelka, Andrew Senior, Greg Wayne, Alex Graves, Timothy P Lillicrap

    Abstract: Neural networks augmented with external memory have the ability to learn algorithmic solutions to complex tasks. These models appear promising for applications such as language modeling and machine translation. However, they scale poorly in both space and time as the amount of memory grows --- limiting their applicability to real-world domains. Here, we present an end-to-end differentiable memory… ▽ More

    Submitted 27 October, 2016; originally announced October 2016.

    Comments: in 30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain

  14. arXiv:1602.01783  [pdf, other

    cs.LG

    Asynchronous Methods for Deep Reinforcement Learning

    Authors: Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza, Alex Graves, Timothy P. Lillicrap, Tim Harley, David Silver, Koray Kavukcuoglu

    Abstract: We propose a conceptually simple and lightweight framework for deep reinforcement learning that uses asynchronous gradient descent for optimization of deep neural network controllers. We present asynchronous variants of four standard reinforcement learning algorithms and show that parallel actor-learners have a stabilizing effect on training allowing all four methods to successfully train neural n… ▽ More

    Submitted 16 June, 2016; v1 submitted 4 February, 2016; originally announced February 2016.

    Journal ref: ICML 2016

  15. arXiv:1512.04455  [pdf, other

    cs.LG

    Memory-based control with recurrent neural networks

    Authors: Nicolas Heess, Jonathan J Hunt, Timothy P Lillicrap, David Silver

    Abstract: Partially observed control problems are a challenging aspect of reinforcement learning. We extend two related, model-free algorithms for continuous control -- deterministic policy gradient and stochastic value gradient -- to solve partially observed domains using recurrent neural networks trained with backpropagation through time. We demonstrate that this approach, coupled with long-short term m… ▽ More

    Submitted 14 December, 2015; originally announced December 2015.

    Comments: NIPS Deep Reinforcement Learning Workshop 2015

  16. arXiv:1509.02971  [pdf, other

    cs.LG stat.ML

    Continuous control with deep reinforcement learning

    Authors: Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, Daan Wierstra

    Abstract: We adapt the ideas underlying the success of Deep Q-Learning to the continuous action domain. We present an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces. Using the same learning algorithm, network architecture and hyper-parameters, our algorithm robustly solves more than 20 simulated physics tasks, including classic pr… ▽ More

    Submitted 5 July, 2019; v1 submitted 9 September, 2015; originally announced September 2015.

    Comments: 10 pages + supplementary

  17. arXiv:1411.0247  [pdf, other

    q-bio.NC cs.NE

    Random feedback weights support learning in deep neural networks

    Authors: Timothy P. Lillicrap, Daniel Cownden, Douglas B. Tweed, Colin J. Akerman

    Abstract: The brain processes information through many layers of neurons. This deep architecture is representationally powerful, but it complicates learning by making it hard to identify the responsible neurons when a mistake is made. In machine learning, the backpropagation algorithm assigns blame to a neuron by computing exactly how it contributed to an error. To do this, it multiplies error signals by ma… ▽ More

    Submitted 2 November, 2014; originally announced November 2014.

    Comments: 14 pages, 5 figures in main text; 13 pages appendix