Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 155 results for author: Teh, Y W

.
  1. arXiv:2406.11905  [pdf, other

    cs.NE cs.LG

    EvIL: Evolution Strategies for Generalisable Imitation Learning

    Authors: Silvia Sapora, Gokul Swamy, Chris Lu, Yee Whye Teh, Jakob Nicolaus Foerster

    Abstract: Often times in imitation learning (IL), the environment we collect expert demonstrations in and the environment we want to deploy our learned policy in aren't exactly the same (e.g. demonstrations collected in simulation but deployment in the real world). Compared to policy-centric approaches to IL like behavioural cloning, reward-centric approaches like inverse reinforcement learning (IRL) often… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: 17 pages, 8 figures, ICML 2024

  2. arXiv:2404.07839  [pdf, other

    cs.LG cs.AI cs.CL

    RecurrentGemma: Moving Past Transformers for Efficient Open Language Models

    Authors: Aleksandar Botev, Soham De, Samuel L Smith, Anushan Fernando, George-Cristian Muraru, Ruba Haroun, Leonard Berrada, Razvan Pascanu, Pier Giuseppe Sessa, Robert Dadashi, Léonard Hussenot, Johan Ferret, Sertan Girgin, Olivier Bachem, Alek Andreev, Kathleen Kenealy, Thomas Mesnard, Cassidy Hardin, Surya Bhupatiraju, Shreya Pathak, Laurent Sifre, Morgane Rivière, Mihir Sanjay Kale, Juliette Love, Pouya Tafti , et al. (37 additional authors not shown)

    Abstract: We introduce RecurrentGemma, an open language model which uses Google's novel Griffin architecture. Griffin combines linear recurrences with local attention to achieve excellent performance on language. It has a fixed-sized state, which reduces memory use and enables efficient inference on long sequences. We provide a pre-trained model with 2B non-embedding parameters, and an instruction tuned var… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  3. arXiv:2403.08477  [pdf, other

    cs.CV cs.LG

    Unleashing the Power of Meta-tuning for Few-shot Generalization Through Sparse Interpolated Experts

    Authors: Shengzhuang Chen, Jihoon Tack, Yunqiao Yang, Yee Whye Teh, Jonathan Richard Schwarz, Ying Wei

    Abstract: Recent successes suggest that parameter-efficient fine-tuning of foundation models as the state-of-the-art method for transfer learning in vision, replacing the rich literature of alternatives such as meta-learning. In trying to harness the best of both worlds, meta-tuning introduces a subsequent optimization stage of foundation models but has so far only shown limited success and crucially tends… ▽ More

    Submitted 23 June, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

    Comments: The Forty-first International Conference on Machine Learning, 2024

  4. arXiv:2403.04317  [pdf, other

    cs.LG cs.CL

    Online Adaptation of Language Models with a Memory of Amortized Contexts

    Authors: Jihoon Tack, Jaehyung Kim, Eric Mitchell, Jinwoo Shin, Yee Whye Teh, Jonathan Richard Schwarz

    Abstract: Due to the rapid generation and dissemination of information, large language models (LLMs) quickly run out of date despite enormous development costs. Due to this crucial need to keep models updated, online learning has emerged as a critical necessity when utilizing LLMs for real-world applications. However, given the ever-expanding corpus of unseen documents and the large parameter space of moder… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

    Comments: 14 pages

  5. arXiv:2403.01518  [pdf, other

    cs.CL cs.LG

    Revisiting Dynamic Evaluation: Online Adaptation for Large Language Models

    Authors: Amal Rannen-Triki, Jorg Bornschein, Razvan Pascanu, Marcus Hutter, Andras György, Alexandre Galashov, Yee Whye Teh, Michalis K. Titsias

    Abstract: We consider the problem of online fine tuning the parameters of a language model at test time, also known as dynamic evaluation. While it is generally known that this approach improves the overall predictive performance, especially when considering distributional shift between training and evaluation data, we here emphasize the perspective that online adaptation turns parameters into temporally ch… ▽ More

    Submitted 3 March, 2024; originally announced March 2024.

  6. arXiv:2402.19427  [pdf, other

    cs.LG cs.CL

    Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models

    Authors: Soham De, Samuel L. Smith, Anushan Fernando, Aleksandar Botev, George Cristian-Muraru, Albert Gu, Ruba Haroun, Leonard Berrada, Yutian Chen, Srivatsan Srinivasan, Guillaume Desjardins, Arnaud Doucet, David Budden, Yee Whye Teh, Razvan Pascanu, Nando De Freitas, Caglar Gulcehre

    Abstract: Recurrent neural networks (RNNs) have fast inference and scale efficiently on long sequences, but they are difficult to train and hard to scale. We propose Hawk, an RNN with gated linear recurrences, and Griffin, a hybrid model that mixes gated linear recurrences with local attention. Hawk exceeds the reported performance of Mamba on downstream tasks, while Griffin matches the performance of Llama… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

    Comments: 25 pages, 11 figures

  7. arXiv:2402.12527  [pdf, other

    cs.LG cs.AI

    The Edge-of-Reach Problem in Offline Model-Based Reinforcement Learning

    Authors: Anya Sims, Cong Lu, Yee Whye Teh

    Abstract: Offline reinforcement learning aims to enable agents to be trained from pre-collected datasets, however, this comes with the added challenge of estimating the value of behavior not covered in the dataset. Model-based methods offer a solution by allowing agents to collect additional synthetic data via rollouts in a learned dynamics model. The prevailing theoretical understanding is that this can th… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

    Comments: Code open-sourced at: https://github.com/anyasims/edge-of-reach

  8. arXiv:2402.00809  [pdf, other

    cs.LG stat.ML

    Position: Bayesian Deep Learning is Needed in the Age of Large-Scale AI

    Authors: Theodore Papamarkou, Maria Skoularidou, Konstantina Palla, Laurence Aitchison, Julyan Arbel, David Dunson, Maurizio Filippone, Vincent Fortuin, Philipp Hennig, José Miguel Hernández-Lobato, Aliaksandr Hubin, Alexander Immer, Theofanis Karaletsos, Mohammad Emtiyaz Khan, Agustinus Kristiadi, Yingzhen Li, Stephan Mandt, Christopher Nemeth, Michael A. Osborne, Tim G. J. Rudner, David Rügamer, Yee Whye Teh, Max Welling, Andrew Gordon Wilson, Ruqi Zhang

    Abstract: In the current landscape of deep learning research, there is a predominant emphasis on achieving high predictive accuracy in supervised tasks involving large image and language datasets. However, a broader perspective reveals a multitude of overlooked metrics, tasks, and data types, such as uncertainty, active and continual learning, and scientific data, that demand attention. Bayesian deep learni… ▽ More

    Submitted 2 June, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

  9. arXiv:2312.17210  [pdf, other

    stat.ML cs.AI cs.LG

    Continual Learning via Sequential Function-Space Variational Inference

    Authors: Tim G. J. Rudner, Freddie Bickford Smith, Qixuan Feng, Yee Whye Teh, Yarin Gal

    Abstract: Sequential Bayesian inference over predictive functions is a natural framework for continual learning from streams of data. However, applying it to neural networks has proved challenging in practice. Addressing the drawbacks of existing techniques, we propose an optimization objective derived by formulating continual learning as sequential function-space variational inference. In contrast to exist… ▽ More

    Submitted 28 December, 2023; originally announced December 2023.

    Comments: Published in Proceedings of the 39th International Conference on Machine Learning (ICML 2022)

  10. arXiv:2312.17199  [pdf, other

    stat.ML cs.AI cs.LG

    Tractable Function-Space Variational Inference in Bayesian Neural Networks

    Authors: Tim G. J. Rudner, Zonghao Chen, Yee Whye Teh, Yarin Gal

    Abstract: Reliable predictive uncertainty estimation plays an important role in enabling the deployment of neural networks to safety-critical settings. A popular approach for estimating the predictive uncertainty of neural networks is to define a prior distribution over the network parameters, infer an approximate posterior distribution, and use it to make stochastic predictions. However, explicit inference… ▽ More

    Submitted 28 December, 2023; originally announced December 2023.

    Comments: Published in Advances in Neural Information Processing Systems 35 (NeurIPS 2022)

  11. arXiv:2308.00436  [pdf, other

    cs.AI cs.CL cs.LG

    SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning

    Authors: Ning Miao, Yee Whye Teh, Tom Rainforth

    Abstract: The recent progress in large language models (LLMs), especially the invention of chain-of-thought prompting, has made it possible to automatically answer questions by stepwise reasoning. However, when faced with more complicated problems that require non-linear thinking, even the strongest LLMs make mistakes. To address this, we explore whether LLMs are able to recognize errors in their own step-b… ▽ More

    Submitted 5 October, 2023; v1 submitted 1 August, 2023; originally announced August 2023.

  12. arXiv:2307.15073  [pdf, other

    q-bio.BM cs.LG stat.ML

    Drug Discovery under Covariate Shift with Domain-Informed Prior Distributions over Functions

    Authors: Leo Klarner, Tim G. J. Rudner, Michael Reutlinger, Torsten Schindler, Garrett M. Morris, Charlotte Deane, Yee Whye Teh

    Abstract: Accelerating the discovery of novel and more effective therapeutics is an important pharmaceutical problem in which deep learning is playing an increasingly significant role. However, real-world drug discovery tasks are often characterized by a scarcity of labeled data and significant covariate shift$\unicode{x2013}\unicode{x2013}$a setting that poses a challenge to standard deep learning methods.… ▽ More

    Submitted 14 July, 2023; originally announced July 2023.

    Comments: Published in the Proceedings of the 40th International Conference on Machine Learning (ICML 2023)

  13. arXiv:2307.05431  [pdf, other

    stat.ML cs.LG

    Geometric Neural Diffusion Processes

    Authors: Emile Mathieu, Vincent Dutordoir, Michael J. Hutchinson, Valentin De Bortoli, Yee Whye Teh, Richard E. Turner

    Abstract: Denoising diffusion models have proven to be a flexible and effective paradigm for generative modelling. Their recent extension to infinite dimensional Euclidean spaces has allowed for the modelling of stochastic processes. However, many problems in the natural sciences incorporate symmetries and involve data living in non-Euclidean spaces. In this work, we extend the framework of diffusion models… ▽ More

    Submitted 11 July, 2023; originally announced July 2023.

  14. arXiv:2306.08448  [pdf, other

    cs.LG cs.AI

    Kalman Filter for Online Classification of Non-Stationary Data

    Authors: Michalis K. Titsias, Alexandre Galashov, Amal Rannen-Triki, Razvan Pascanu, Yee Whye Teh, Jorg Bornschein

    Abstract: In Online Continual Learning (OCL) a learning system receives a stream of data and sequentially performs prediction and training steps. Important challenges in OCL are concerned with automatic adaptation to the particular non-stationary structure of the data, and with quantification of predictive uncertainty. Motivated by these challenges we introduce a probabilistic Bayesian online learning model… ▽ More

    Submitted 14 June, 2023; originally announced June 2023.

  15. arXiv:2305.15574  [pdf, other

    stat.ML cs.LG

    Deep Stochastic Processes via Functional Markov Transition Operators

    Authors: Jin Xu, Emilien Dupont, Kaspar Märtens, Tom Rainforth, Yee Whye Teh

    Abstract: We introduce Markov Neural Processes (MNPs), a new class of Stochastic Processes (SPs) which are constructed by stacking sequences of neural parameterised Markov transition operators in function space. We prove that these Markov transition operators can preserve the exchangeability and consistency of SPs. Therefore, the proposed iterative construction adds substantial flexibility and expressivity… ▽ More

    Submitted 24 May, 2023; originally announced May 2023.

    Comments: 18 pages, 5 figures

  16. arXiv:2304.01762  [pdf, other

    cs.LG cs.AI stat.ML

    Incorporating Unlabelled Data into Bayesian Neural Networks

    Authors: Mrinank Sharma, Tom Rainforth, Yee Whye Teh, Vincent Fortuin

    Abstract: Conventional Bayesian Neural Networks (BNNs) cannot leverage unlabelled data to improve their predictions. To overcome this limitation, we introduce Self-Supervised Bayesian Neural Networks, which use unlabelled data to learn improved prior predictive distributions by maximising an evidence lower bound during an unsupervised pre-training step. With a novel methodology developed to better understan… ▽ More

    Submitted 19 May, 2023; v1 submitted 4 April, 2023; originally announced April 2023.

  17. arXiv:2303.06614  [pdf, other

    cs.LG cs.AI stat.ML

    Synthetic Experience Replay

    Authors: Cong Lu, Philip J. Ball, Yee Whye Teh, Jack Parker-Holder

    Abstract: A key theme in the past decade has been that when large neural networks and large datasets combine they can produce remarkable results. In deep reinforcement learning (RL), this paradigm is commonly made possible through experience replay, whereby a dataset of past experiences is used to train a policy or value function. However, unlike in supervised or self-supervised learning, an RL agent has to… ▽ More

    Submitted 26 October, 2023; v1 submitted 12 March, 2023; originally announced March 2023.

    Comments: Published at NeurIPS, 2023

  18. arXiv:2303.01064  [pdf, other

    cs.CL

    Adopting the Multi-answer Questioning Task with an Auxiliary Metric for Extreme Multi-label Text Classification Utilizing the Label Hierarchy

    Authors: Li Wang, Ying Wah Teh, Mohammed Ali Al-Garadi

    Abstract: Extreme multi-label text classification utilizes the label hierarchy to partition extreme labels into multiple label groups, turning the task into simple multi-group multi-label classification tasks. Current research encodes labels as a vector with fixed length which needs establish multiple classifiers for different label groups. The problem is how to build only one classifier without sacrificing… ▽ More

    Submitted 2 March, 2023; originally announced March 2023.

  19. arXiv:2302.10322  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    Deep Transformers without Shortcuts: Modifying Self-attention for Faithful Signal Propagation

    Authors: Bobby He, James Martens, Guodong Zhang, Aleksandar Botev, Andrew Brock, Samuel L Smith, Yee Whye Teh

    Abstract: Skip connections and normalisation layers form two standard architectural components that are ubiquitous for the training of Deep Neural Networks (DNNs), but whose precise roles are poorly understood. Recent approaches such as Deep Kernel Shaping have made progress towards reducing our reliance on them, using insights from wide NN kernel theory to improve signal propagation in vanilla DNNs (which… ▽ More

    Submitted 20 February, 2023; originally announced February 2023.

    Comments: ICLR 2023

  20. arXiv:2301.09479  [pdf, other

    stat.ML cs.AI cs.LG

    Modality-Agnostic Variational Compression of Implicit Neural Representations

    Authors: Jonathan Richard Schwarz, Jihoon Tack, Yee Whye Teh, Jaeho Lee, Jinwoo Shin

    Abstract: We introduce a modality-agnostic neural compression algorithm based on a functional view of data and parameterised as an Implicit Neural Representation (INR). Bridging the gap between latent coding and sparsity, we obtain compact latent representations non-linearly mapped to a soft gating mechanism. This allows the specialisation of a shared INR network to each data item through subnetwork selecti… ▽ More

    Submitted 7 April, 2023; v1 submitted 23 January, 2023; originally announced January 2023.

  21. arXiv:2212.13936  [pdf, other

    cs.LG cs.AI stat.ME stat.ML

    On Pathologies in KL-Regularized Reinforcement Learning from Expert Demonstrations

    Authors: Tim G. J. Rudner, Cong Lu, Michael A. Osborne, Yarin Gal, Yee Whye Teh

    Abstract: KL-regularized reinforcement learning from expert demonstrations has proved successful in improving the sample efficiency of deep reinforcement learning algorithms, allowing them to be applied to challenging physical real-world tasks. However, we show that KL-regularized reinforcement learning with behavioral reference policies derived from expert demonstrations can suffer from pathological traini… ▽ More

    Submitted 28 December, 2022; originally announced December 2022.

    Comments: Published in Advances in Neural Information Processing Systems 34 (NeurIPS 2021)

  22. arXiv:2211.11747  [pdf, other

    cs.LG cs.CV

    NEVIS'22: A Stream of 100 Tasks Sampled from 30 Years of Computer Vision Research

    Authors: Jorg Bornschein, Alexandre Galashov, Ross Hemsley, Amal Rannen-Triki, Yutian Chen, Arslan Chaudhry, Xu Owen He, Arthur Douillard, Massimo Caccia, Qixuang Feng, Jiajun Shen, Sylvestre-Alvise Rebuffi, Kitty Stacpoole, Diego de las Casas, Will Hawkins, Angeliki Lazaridou, Yee Whye Teh, Andrei A. Rusu, Razvan Pascanu, Marc'Aurelio Ranzato

    Abstract: A shared goal of several machine learning communities like continual learning, meta-learning and transfer learning, is to design algorithms and models that efficiently and robustly adapt to unseen tasks. An even more ambitious goal is to build models that never stop adapting, and that become increasingly more efficient through time by suitably transferring the accrued knowledge. Beyond the study o… ▽ More

    Submitted 16 May, 2023; v1 submitted 15 November, 2022; originally announced November 2022.

  23. arXiv:2207.03024  [pdf, other

    stat.ML cs.LG

    Riemannian Diffusion Schrödinger Bridge

    Authors: James Thornton, Michael Hutchinson, Emile Mathieu, Valentin De Bortoli, Yee Whye Teh, Arnaud Doucet

    Abstract: Score-based generative models exhibit state of the art performance on density estimation and generative modeling tasks. These models typically assume that the data geometry is flat, yet recent extensions have been developed to synthesize data living on Riemannian manifolds. Existing methods to accelerate sampling of diffusion models are typically not applicable in the Riemannian setting and Rieman… ▽ More

    Submitted 6 July, 2022; originally announced July 2022.

    Comments: Accepted to Continuous Time Methods for Machine Learning, ICML 2022

  24. arXiv:2206.10011  [pdf, other

    cs.LG cs.CV stat.ML

    When Does Re-initialization Work?

    Authors: Sheheryar Zaidi, Tudor Berariu, Hyunjik Kim, Jörg Bornschein, Claudia Clopath, Yee Whye Teh, Razvan Pascanu

    Abstract: Re-initializing a neural network during training has been observed to improve generalization in recent works. Yet it is neither widely adopted in deep learning practice nor is it often used in state-of-the-art training protocols. This raises the question of when re-initialization works, and whether it should be used together with regularization techniques such as data augmentation, weight decay an… ▽ More

    Submitted 2 April, 2023; v1 submitted 20 June, 2022; originally announced June 2022.

    Comments: Published in PMLR Volume 187; spotlight presentation at I Can't Believe It's Not Better Workshop at NeurIPS 2022

  25. arXiv:2206.04779  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Challenges and Opportunities in Offline Reinforcement Learning from Visual Observations

    Authors: Cong Lu, Philip J. Ball, Tim G. J. Rudner, Jack Parker-Holder, Michael A. Osborne, Yee Whye Teh

    Abstract: Offline reinforcement learning has shown great promise in leveraging large pre-collected datasets for policy learning, allowing agents to forgo often-expensive online data collection. However, offline reinforcement learning from visual observations with continuous action spaces remains under-explored, with a limited understanding of the key challenges in this complex domain. In this paper, we esta… ▽ More

    Submitted 6 July, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: Published at TMLR, 2023

  26. arXiv:2206.04405  [pdf, other

    stat.ML cs.LG

    Conformal Off-Policy Prediction in Contextual Bandits

    Authors: Muhammad Faaiz Taufiq, Jean-Francois Ton, Rob Cornish, Yee Whye Teh, Arnaud Doucet

    Abstract: Most off-policy evaluation methods for contextual bandits have focused on the expected outcome of a policy, which is estimated via methods that at best provide only asymptotic guarantees. However, in many applications, the expectation may not be the best measure of performance as it does not capture the variability of the outcome. In addition, particularly in safety-critical settings, stronger gua… ▽ More

    Submitted 26 October, 2022; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: Proceedings of 36th Conference on Neural Information Processing System (NeurIPS 2022)

  27. arXiv:2206.00133  [pdf, other

    cs.LG q-bio.BM stat.ML

    Pre-training via Denoising for Molecular Property Prediction

    Authors: Sheheryar Zaidi, Michael Schaarschmidt, James Martens, Hyunjik Kim, Yee Whye Teh, Alvaro Sanchez-Gonzalez, Peter Battaglia, Razvan Pascanu, Jonathan Godwin

    Abstract: Many important problems involving molecular property prediction from 3D structures have limited data, posing a generalization challenge for neural networks. In this paper, we describe a pre-training technique based on denoising that achieves a new state-of-the-art in molecular property prediction by utilizing large datasets of 3D molecular structures at equilibrium to learn meaningful representati… ▽ More

    Submitted 24 October, 2022; v1 submitted 31 May, 2022; originally announced June 2022.

  28. arXiv:2206.00051  [pdf, other

    cs.LG

    Learning Instance-Specific Augmentations by Capturing Local Invariances

    Authors: Ning Miao, Tom Rainforth, Emile Mathieu, Yann Dubois, Yee Whye Teh, Adam Foster, Hyunjik Kim

    Abstract: We introduce InstaAug, a method for automatically learning input-specific augmentations from data. Previous methods for learning augmentations have typically assumed independence between the original input and the transformation applied to that input. This can be highly restrictive, as the invariances we hope our augmentation will capture are themselves often highly input dependent. InstaAug inste… ▽ More

    Submitted 30 May, 2023; v1 submitted 31 May, 2022; originally announced June 2022.

  29. arXiv:2205.08957  [pdf, other

    stat.ML cs.AI cs.LG

    Meta-Learning Sparse Compression Networks

    Authors: Jonathan Richard Schwarz, Yee Whye Teh

    Abstract: Recent work in Deep Learning has re-imagined the representation of data as functions mapping from a coordinate space to an underlying continuous signal. When such functions are approximated by neural networks this introduces a compelling alternative to the more common multi-dimensional array representation. Recent work on such Implicit Neural Representations (INRs) has shown that - following caref… ▽ More

    Submitted 8 August, 2022; v1 submitted 18 May, 2022; originally announced May 2022.

    Comments: Published in TMLR (2022)

  30. arXiv:2202.10847  [pdf, other

    eess.IV cs.CV cs.LG

    UncertaINR: Uncertainty Quantification of End-to-End Implicit Neural Representations for Computed Tomography

    Authors: Francisca Vasconcelos, Bobby He, Nalini Singh, Yee Whye Teh

    Abstract: Implicit neural representations (INRs) have achieved impressive results for scene reconstruction and computer graphics, where their performance has primarily been assessed on reconstruction accuracy. As INRs make their way into other domains, where model predictions inform high-stakes decision-making, uncertainty quantification of INR inference is becoming critical. To that end, we study a Bayesia… ▽ More

    Submitted 2 May, 2023; v1 submitted 22 February, 2022; originally announced February 2022.

    Comments: Published in the Transactions on Machine Learning Research (TMLR) April 2023 [https://openreview.net/forum?id=jdGMBgYvfX]

  31. arXiv:2202.02763  [pdf, other

    cs.LG math.PR stat.ML

    Riemannian Score-Based Generative Modelling

    Authors: Valentin De Bortoli, Emile Mathieu, Michael Hutchinson, James Thornton, Yee Whye Teh, Arnaud Doucet

    Abstract: Score-based generative models (SGMs) are a powerful class of generative models that exhibit remarkable empirical performance. Score-based generative modelling (SGM) consists of a ``noising'' stage, whereby a diffusion is used to gradually add Gaussian noise to data, and a generative model, which entails a ``denoising'' process defined by approximating the time-reversal of the diffusion. Existing S… ▽ More

    Submitted 22 November, 2022; v1 submitted 6 February, 2022; originally announced February 2022.

    Comments: Neurips 2022 camera ready

  32. arXiv:2201.12904  [pdf, other

    cs.LG cs.CV eess.IV stat.ML

    COIN++: Neural Compression Across Modalities

    Authors: Emilien Dupont, Hrushikesh Loya, Milad Alizadeh, Adam Goliński, Yee Whye Teh, Arnaud Doucet

    Abstract: Neural compression algorithms are typically based on autoencoders that require specialized encoder and decoder architectures for different data modalities. In this paper, we propose COIN++, a neural compression framework that seamlessly handles a wide range of data modalities. Our approach is based on converting data to implicit neural representations, i.e. neural functions that map coordinates (s… ▽ More

    Submitted 8 December, 2022; v1 submitted 30 January, 2022; originally announced January 2022.

    Comments: TMLR camera ready

  33. arXiv:2110.14423  [pdf, other

    stat.ML cs.LG

    Vector-valued Gaussian Processes on Riemannian Manifolds via Gauge Independent Projected Kernels

    Authors: Michael Hutchinson, Alexander Terenin, Viacheslav Borovitskiy, So Takao, Yee Whye Teh, Marc Peter Deisenroth

    Abstract: Gaussian processes are machine learning models capable of learning unknown functions in a way that represents uncertainty, thereby facilitating construction of optimal decision-making systems. Motivated by a desire to deploy Gaussian processes in novel areas of science, a rapidly-growing line of research has focused on constructively extending these models to handle non-Euclidean domains, includin… ▽ More

    Submitted 25 November, 2021; v1 submitted 27 October, 2021; originally announced October 2021.

    Journal ref: Advances in Neural Information Processing Systems, 2021

  34. arXiv:2110.00296  [pdf, other

    stat.ML cs.AI cs.LG

    Powerpropagation: A sparsity inducing weight reparameterisation

    Authors: Jonathan Schwarz, Siddhant M. Jayakumar, Razvan Pascanu, Peter E. Latham, Yee Whye Teh

    Abstract: The training of sparse neural networks is becoming an increasingly important tool for reducing the computational footprint of models at training and evaluation, as well enabling the effective scaling up of models. Whereas much work over the years has been dedicated to specialised pruning techniques, little attention has been paid to the inherent effect of gradient based training on model sparsity.… ▽ More

    Submitted 6 October, 2021; v1 submitted 1 October, 2021; originally announced October 2021.

    Comments: Accepted at NeurIPS 2021

  35. arXiv:2109.13730  [pdf, other

    stat.ME stat.AP

    Interoperability of statistical models in pandemic preparedness: principles and reality

    Authors: George Nicholson, Marta Blangiardo, Mark Briers, Peter J. Diggle, Tor Erlend Fjelde, Hong Ge, Robert J. B. Goudie, Radka Jersakova, Ruairidh E. King, Brieuc C. L. Lehmann, Ann-Marie Mallon, Tullia Padellini, Yee Whye Teh, Chris Holmes, Sylvia Richardson

    Abstract: We present "interoperability" as a guiding framework for statistical modelling to assist policy makers asking multiple questions using diverse datasets in the face of an evolving pandemic response. Interoperability provides an important set of principles for future pandemic preparedness, through the joint design and deployment of adaptable systems of statistical models for disease surveillance usi… ▽ More

    Submitted 28 September, 2021; originally announced September 2021.

    Comments: 26 pages, 10 figures, for associated mpeg file Movie 1 please see https://www.dropbox.com/s/kn9y1v6zvivfla1/Interoperability_of_models_Movie_1.mp4?dl=0

    MSC Class: 62P10

  36. arXiv:2106.13746  [pdf, other

    stat.ML cs.LG

    On Incorporating Inductive Biases into VAEs

    Authors: Ning Miao, Emile Mathieu, N. Siddharth, Yee Whye Teh, Tom Rainforth

    Abstract: We explain why directly changing the prior can be a surprisingly ineffective mechanism for incorporating inductive biases into VAEs, and introduce a simple and effective alternative approach: Intermediary Latent Space VAEs(InteL-VAEs). InteL-VAEs use an intermediary set of latent variables to control the stochasticity of the encoding process, before mapping these in turn to the latent representati… ▽ More

    Submitted 14 February, 2022; v1 submitted 25 June, 2021; originally announced June 2021.

  37. arXiv:2106.10052  [pdf, other

    stat.ML cs.LG

    On Contrastive Representations of Stochastic Processes

    Authors: Emile Mathieu, Adam Foster, Yee Whye Teh

    Abstract: Learning representations of stochastic processes is an emerging problem in machine learning with applications from meta-learning to physical object models to time series. Typical methods rely on exact reconstruction of observations, but this approach breaks down as observations become high-dimensional or noise distributions become complex. To address this, we propose a unifying framework for learn… ▽ More

    Submitted 29 October, 2021; v1 submitted 18 June, 2021; originally announced June 2021.

    Comments: NeurIPS 2021 Camera ready

  38. arXiv:2106.05886  [pdf, other

    cs.LG stat.ML

    Group Equivariant Subsampling

    Authors: Jin Xu, Hyunjik Kim, Tom Rainforth, Yee Whye Teh

    Abstract: Subsampling is used in convolutional neural networks (CNNs) in the form of pooling or strided convolutions, to reduce the spatial dimensions of feature maps and to allow the receptive fields to grow exponentially with depth. However, it is known that such subsampling operations are not translation equivariant, unlike convolutions that are translation equivariant. Here, we first introduce translati… ▽ More

    Submitted 10 June, 2021; originally announced June 2021.

  39. arXiv:2106.03477  [pdf, other

    stat.ML cs.LG

    BayesIMP: Uncertainty Quantification for Causal Data Fusion

    Authors: Siu Lun Chau, Jean-François Ton, Javier González, Yee Whye Teh, Dino Sejdinovic

    Abstract: While causal models are becoming one of the mainstays of machine learning, the problem of uncertainty quantification in causal inference remains challenging. In this paper, we study the causal data fusion problem, where datasets pertaining to multiple causal graphs are combined to estimate the average treatment effect of a target variable. As data arises from multiple sources and can vary in quali… ▽ More

    Submitted 7 June, 2021; originally announced June 2021.

    Comments: 10 pages main text, 10 pages supplementary materials

  40. arXiv:2103.03123  [pdf, other

    eess.IV cs.CV cs.LG

    COIN: COmpression with Implicit Neural representations

    Authors: Emilien Dupont, Adam Goliński, Milad Alizadeh, Yee Whye Teh, Arnaud Doucet

    Abstract: We propose a new simple approach for image compression: instead of storing the RGB values for each pixel of an image, we store the weights of a neural network overfitted to the image. Specifically, to encode an image, we fit it with an MLP which maps pixel locations to RGB values. We then quantize and store the weights of this MLP as a code for the image. To decode the image, we simply evaluate th… ▽ More

    Submitted 10 April, 2021; v1 submitted 3 March, 2021; originally announced March 2021.

    Comments: Added qualitative comparisons and link to github repo https://github.com/EmilienDupont/coin

  41. arXiv:2102.04776  [pdf, other

    cs.LG cs.CV stat.ML

    Generative Models as Distributions of Functions

    Authors: Emilien Dupont, Yee Whye Teh, Arnaud Doucet

    Abstract: Generative models are typically trained on grid-like data such as images. As a result, the size of these models usually scales directly with the underlying grid resolution. In this paper, we abandon discretized grids and instead parameterize individual data points by continuous functions. We then build generative models by learning distributions over such functions. By treating data points as func… ▽ More

    Submitted 17 February, 2022; v1 submitted 9 February, 2021; originally announced February 2021.

    Comments: AISTATS 2022 Oral camera ready. Incorporated reviewer feedback

  42. arXiv:2012.10885  [pdf, other

    cs.LG stat.ML

    LieTransformer: Equivariant self-attention for Lie Groups

    Authors: Michael Hutchinson, Charline Le Lan, Sheheryar Zaidi, Emilien Dupont, Yee Whye Teh, Hyunjik Kim

    Abstract: Group equivariant neural networks are used as building blocks of group invariant neural networks, which have been shown to improve generalisation performance and data efficiency through principled parameter sharing. Such works have mostly focused on group equivariant convolutions, building on the result that group equivariant linear maps are necessarily convolutions. In this work, we extend the sc… ▽ More

    Submitted 16 June, 2021; v1 submitted 20 December, 2020; originally announced December 2020.

  43. arXiv:2011.12916  [pdf, other

    cs.LG stat.ML

    Equivariant Learning of Stochastic Fields: Gaussian Processes and Steerable Conditional Neural Processes

    Authors: Peter Holderrieth, Michael Hutchinson, Yee Whye Teh

    Abstract: Motivated by objects such as electric fields or fluid streams, we study the problem of learning stochastic fields, i.e. stochastic processes whose samples are fields like those occurring in physics and engineering. Considering general transformations such as rotations and reflections, we show that spatial invariance of stochastic fields requires an inference model to be equivariant. Leveraging rec… ▽ More

    Submitted 17 July, 2021; v1 submitted 25 November, 2020; originally announced November 2020.

    Journal ref: Proceedings of the 38th International Conference on Machine Learning, PMLR 139, 2021

  44. arXiv:2010.15727  [pdf, other

    stat.ML cs.LG

    Amortized Probabilistic Detection of Communities in Graphs

    Authors: Yueqi Wang, Yoonho Lee, Pallab Basu, Juho Lee, Yee Whye Teh, Liam Paninski, Ari Pakman

    Abstract: Learning community structures in graphs has broad applications across scientific domains. While graph neural networks (GNNs) have been successful in encoding graph structures, existing GNN-based methods for community detection are limited by requiring knowledge of the number of communities in advance, in addition to lacking a proper probabilistic formulation to handle uncertainty. We propose a sim… ▽ More

    Submitted 15 June, 2021; v1 submitted 29 October, 2020; originally announced October 2020.

  45. arXiv:2010.14274  [pdf, other

    cs.AI cs.LG

    Behavior Priors for Efficient Reinforcement Learning

    Authors: Dhruva Tirumala, Alexandre Galashov, Hyeonwoo Noh, Leonard Hasenclever, Razvan Pascanu, Jonathan Schwarz, Guillaume Desjardins, Wojciech Marian Czarnecki, Arun Ahuja, Yee Whye Teh, Nicolas Heess

    Abstract: As we deploy reinforcement learning agents to solve increasingly challenging problems, methods that allow us to inject prior knowledge about the structure of the world and effective solution strategies becomes increasingly important. In this work we consider how information and architectural constraints can be combined with ideas from the probabilistic modeling literature to learn behavior priors… ▽ More

    Submitted 27 October, 2020; originally announced October 2020.

    Comments: Submitted to Journal of Machine Learning Research (JMLR)

  46. arXiv:2009.04875  [pdf, other

    cs.LG cs.AI stat.ML

    Importance Weighted Policy Learning and Adaptation

    Authors: Alexandre Galashov, Jakub Sygnowski, Guillaume Desjardins, Jan Humplik, Leonard Hasenclever, Rae Jeong, Yee Whye Teh, Nicolas Heess

    Abstract: The ability to exploit prior experience to solve novel problems rapidly is a hallmark of biological learning systems and of great practical importance for artificial ones. In the meta reinforcement learning literature much recent work has focused on the problem of optimizing the learning process itself. In this paper we study a complementary approach which is conceptually simple, general, modular… ▽ More

    Submitted 4 June, 2021; v1 submitted 10 September, 2020; originally announced September 2020.

  47. arXiv:2008.02956  [pdf, other

    cs.LG stat.ML

    Bootstrapping Neural Processes

    Authors: Juho Lee, Yoonho Lee, Jungtaek Kim, Eunho Yang, Sung Ju Hwang, Yee Whye Teh

    Abstract: Unlike in the traditional statistical modeling for which a user typically hand-specify a prior, Neural Processes (NPs) implicitly define a broad class of stochastic processes with neural networks. Given a data stream, NP learns a stochastic process that best describes the data. While this "data-driven" way of learning stochastic processes has proven to handle various types of data, NPs still rely… ▽ More

    Submitted 27 October, 2020; v1 submitted 6 August, 2020; originally announced August 2020.

    Comments: Published in Thirty-fourth Conference on Neural Information Processing Systems (NeurIPS 2020) Code is available at https://github.com/juho-lee/bnp

  48. arXiv:2007.13454  [pdf, other

    stat.AP cs.LG q-bio.PE q-bio.QM stat.ML

    How Robust are the Estimated Effects of Nonpharmaceutical Interventions against COVID-19?

    Authors: Mrinank Sharma, Sören Mindermann, Jan Markus Brauner, Gavin Leech, Anna B. Stephenson, Tomáš Gavenčiak, Jan Kulveit, Yee Whye Teh, Leonid Chindelevitch, Yarin Gal

    Abstract: To what extent are effectiveness estimates of nonpharmaceutical interventions (NPIs) against COVID-19 influenced by the assumptions our models make? To answer this question, we investigate 2 state-of-the-art NPI effectiveness models and propose 6 variants that make different structural assumptions. In particular, we investigate how well NPI effectiveness estimates generalise to unseen countries, a… ▽ More

    Submitted 20 December, 2020; v1 submitted 27 July, 2020; originally announced July 2020.

    Journal ref: NeurIPS 2020, Advances in Neural Information Processing Systems 33

  49. arXiv:2007.08243  [pdf, ps, other

    cs.LG stat.ML

    Lottery Tickets in Linear Models: An Analysis of Iterative Magnitude Pruning

    Authors: Bryn Elesedy, Varun Kanade, Yee Whye Teh

    Abstract: We analyse the pruning procedure behind the lottery ticket hypothesis arXiv:1803.03635v5, iterative magnitude pruning (IMP), when applied to linear models trained by gradient flow. We begin by presenting sufficient conditions on the statistical structure of the features under which IMP prunes those features that have smallest projection onto the data. Following this, we explore IMP as a method for… ▽ More

    Submitted 5 July, 2021; v1 submitted 16 July, 2020; originally announced July 2020.

    Comments: Updated for Sparsity in Neural Networks Workshop

    ACM Class: I.5.1

  50. arXiv:2007.05864  [pdf, other

    stat.ML cs.LG

    Bayesian Deep Ensembles via the Neural Tangent Kernel

    Authors: Bobby He, Balaji Lakshminarayanan, Yee Whye Teh

    Abstract: We explore the link between deep ensembles and Gaussian processes (GPs) through the lens of the Neural Tangent Kernel (NTK): a recent development in understanding the training dynamics of wide neural networks (NNs). Previous work has shown that even in the infinite width limit, when NNs become GPs, there is no GP posterior interpretation to a deep ensemble trained with squared error loss. We intro… ▽ More

    Submitted 24 October, 2020; v1 submitted 11 July, 2020; originally announced July 2020.