Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 98 results for author: Wilson, A G

Searching in archive stat. Search in all archives.
.
  1. arXiv:2407.18158  [pdf, other

    stat.ML cs.LG

    Unlocking Tokens as Data Points for Generalization Bounds on Larger Language Models

    Authors: Sanae Lotfi, Yilun Kuang, Brandon Amos, Micah Goldblum, Marc Finzi, Andrew Gordon Wilson

    Abstract: Large language models (LLMs) with billions of parameters excel at predicting the next token in a sequence. Recent work computes non-vacuous compression-based generalization bounds for LLMs, but these bounds are vacuous for large models at the billion-parameter scale. Moreover, these bounds are obtained through restrictive compression techniques, bounding compressed models that generate low-quality… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  2. arXiv:2406.11463  [pdf, other

    cs.LG stat.ML

    Just How Flexible are Neural Networks in Practice?

    Authors: Ravid Shwartz-Ziv, Micah Goldblum, Arpit Bansal, C. Bayan Bruss, Yann LeCun, Andrew Gordon Wilson

    Abstract: It is widely believed that a neural network can fit a training set containing at least as many samples as it has parameters, underpinning notions of overparameterized and underparameterized models. In practice, however, we only find solutions accessible via our training procedure, including the optimizer and regularizers, limiting flexibility. Moreover, the exact parameterization of the function c… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  3. arXiv:2406.09177  [pdf, other

    stat.ML cs.LG

    Scalable and Flexible Causal Discovery with an Efficient Test for Adjacency

    Authors: Alan Nawzad Amin, Andrew Gordon Wilson

    Abstract: To make accurate predictions, understand mechanisms, and design interventions in systems of many variables, we wish to learn causal graphs from large scale data. Unfortunately the space of all possible causal graphs is enormous so scalably and accurately searching for the best fit to the data is a challenge. In principle we could substantially decrease the search space, or learn the graph entirely… ▽ More

    Submitted 18 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: ICML 2024; Code at https://github.com/AlanNawzadAmin/DAT-graph

  4. arXiv:2406.08391  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    Large Language Models Must Be Taught to Know What They Don't Know

    Authors: Sanyam Kapoor, Nate Gruver, Manley Roberts, Katherine Collins, Arka Pal, Umang Bhatt, Adrian Weller, Samuel Dooley, Micah Goldblum, Andrew Gordon Wilson

    Abstract: When using large language models (LLMs) in high-stakes applications, we need to know when we can trust their predictions. Some works argue that prompting high-performance LLMs is sufficient to produce calibrated uncertainties, while others introduce sampling methods that can be prohibitively expensive. In this work, we first argue that prompting on its own is insufficient to achieve good calibrati… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Code available at: https://github.com/activatedgeek/calibration-tuning

  5. arXiv:2403.09869  [pdf, other

    stat.ML cs.AI cs.LG stat.ME

    Mind the GAP: Improving Robustness to Subpopulation Shifts with Group-Aware Priors

    Authors: Tim G. J. Rudner, Ya Shi Zhang, Andrew Gordon Wilson, Julia Kempe

    Abstract: Machine learning models often perform poorly under subpopulation shifts in the data distribution. Developing methods that allow machine learning models to better generalize to such shifts is crucial for safe deployment in real-world settings. In this paper, we develop a family of group-aware prior (GAP) distributions over neural network parameters that explicitly favor models that generalize well… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: Published in Proceedings of the 27th International Conference on Artificial Intelligence and Statistics (AISTATS 2024)

  6. arXiv:2402.00809  [pdf, other

    cs.LG stat.ML

    Position: Bayesian Deep Learning is Needed in the Age of Large-Scale AI

    Authors: Theodore Papamarkou, Maria Skoularidou, Konstantina Palla, Laurence Aitchison, Julyan Arbel, David Dunson, Maurizio Filippone, Vincent Fortuin, Philipp Hennig, José Miguel Hernández-Lobato, Aliaksandr Hubin, Alexander Immer, Theofanis Karaletsos, Mohammad Emtiyaz Khan, Agustinus Kristiadi, Yingzhen Li, Stephan Mandt, Christopher Nemeth, Michael A. Osborne, Tim G. J. Rudner, David Rügamer, Yee Whye Teh, Max Welling, Andrew Gordon Wilson, Ruqi Zhang

    Abstract: In the current landscape of deep learning research, there is a predominant emphasis on achieving high predictive accuracy in supervised tasks involving large image and language datasets. However, a broader perspective reveals a multitude of overlooked metrics, tasks, and data types, such as uncertainty, active and continual learning, and scientific data, that demand attention. Bayesian deep learni… ▽ More

    Submitted 6 August, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

    Comments: Proceedings of the 41st International Conference on Machine Learning, Vienna, Austria. PMLR 235, 2024

  7. arXiv:2312.17173  [pdf, other

    stat.ML cs.LG

    Non-Vacuous Generalization Bounds for Large Language Models

    Authors: Sanae Lotfi, Marc Finzi, Yilun Kuang, Tim G. J. Rudner, Micah Goldblum, Andrew Gordon Wilson

    Abstract: Modern language models can contain billions of parameters, raising the question of whether they can generalize beyond the training data or simply parrot their training corpora. We provide the first non-vacuous generalization bounds for pretrained large language models (LLMs), indicating that language models are capable of discovering regularities that generalize to unseen data. In particular, we d… ▽ More

    Submitted 17 July, 2024; v1 submitted 28 December, 2023; originally announced December 2023.

    Comments: ICML 2024

  8. arXiv:2312.17162  [pdf, other

    stat.ML cs.AI cs.LG

    Function-Space Regularization in Neural Networks: A Probabilistic Perspective

    Authors: Tim G. J. Rudner, Sanyam Kapoor, Shikai Qiu, Andrew Gordon Wilson

    Abstract: Parameter-space regularization in neural network optimization is a fundamental tool for improving generalization. However, standard parameter-space regularization methods make it challenging to encode explicit preferences about desired predictive functions into neural network training. In this work, we approach regularization in neural networks from a probabilistic perspective and show that by vie… ▽ More

    Submitted 28 December, 2023; originally announced December 2023.

    Comments: Published in Proceedings of the 40th International Conference on Machine Learning (ICML 2023)

  9. arXiv:2311.15990  [pdf, other

    cs.LG stat.ML

    Should We Learn Most Likely Functions or Parameters?

    Authors: Shikai Qiu, Tim G. J. Rudner, Sanyam Kapoor, Andrew Gordon Wilson

    Abstract: Standard regularized training procedures correspond to maximizing a posterior distribution over parameters, known as maximum a posteriori (MAP) estimation. However, model parameters are of interest only insomuch as they combine with the functional form of a model to provide a function that can make good predictions. Moreover, the most likely parameters under the parameter posterior do not generall… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

    Comments: NeurIPS 2023. Code available at https://github.com/activatedgeek/function-space-map

  10. arXiv:2309.03060  [pdf, other

    cs.LG math.NA stat.ML

    CoLA: Exploiting Compositional Structure for Automatic and Efficient Numerical Linear Algebra

    Authors: Andres Potapczynski, Marc Finzi, Geoff Pleiss, Andrew Gordon Wilson

    Abstract: Many areas of machine learning and science involve large linear algebra problems, such as eigendecompositions, solving linear systems, computing matrix exponentials, and trace estimation. The matrices involved often have Kronecker, convolutional, block diagonal, sum, or product structure. In this paper, we propose a simple but general framework for large-scale linear algebra problems in machine le… ▽ More

    Submitted 29 November, 2023; v1 submitted 6 September, 2023; originally announced September 2023.

    Comments: Code available at https://github.com/wilson-labs/cola. NeurIPS 2023

  11. arXiv:2306.11074  [pdf, other

    cs.LG stat.ML

    Simple and Fast Group Robustness by Automatic Feature Reweighting

    Authors: Shikai Qiu, Andres Potapczynski, Pavel Izmailov, Andrew Gordon Wilson

    Abstract: A major challenge to out-of-distribution generalization is reliance on spurious features -- patterns that are predictive of the class label in the training data distribution, but not causally related to the target. Standard methods for reducing the reliance on spurious features typically assume that we know what the spurious feature is, which is rarely true in the real world. Methods that attempt… ▽ More

    Submitted 19 June, 2023; originally announced June 2023.

    Comments: ICML 23. Code available at https://github.com/AndPotap/afr

    Journal ref: 40th International Conference on Machine Learning 2023

  12. arXiv:2305.20028  [pdf, other

    cs.LG stat.ML

    A Study of Bayesian Neural Network Surrogates for Bayesian Optimization

    Authors: Yucen Lily Li, Tim G. J. Rudner, Andrew Gordon Wilson

    Abstract: Bayesian optimization is a highly efficient approach to optimizing objective functions which are expensive to query. These objectives are typically represented by Gaussian process (GP) surrogate models which are easy to optimize and support exact inference. While standard GP surrogates have been well-established in Bayesian optimization, Bayesian neural networks (BNNs) have recently become practic… ▽ More

    Submitted 8 May, 2024; v1 submitted 31 May, 2023; originally announced May 2023.

    Comments: ICLR 2024. Code available at https://github.com/yucenli/bnn-bo

  13. arXiv:2304.14994  [pdf, other

    cs.LG math.NA stat.ML

    A Stable and Scalable Method for Solving Initial Value PDEs with Neural Networks

    Authors: Marc Finzi, Andres Potapczynski, Matthew Choptuik, Andrew Gordon Wilson

    Abstract: Unlike conventional grid and mesh based methods for solving partial differential equations (PDEs), neural networks have the potential to break the curse of dimensionality, providing approximate solutions to problems where using classical solvers is difficult or impossible. While global minimization of the PDE residual over the network parameters works well for boundary value problems, catastrophic… ▽ More

    Submitted 30 August, 2023; v1 submitted 28 April, 2023; originally announced April 2023.

    Comments: ICLR 2023. Code available at https://github.com/mfinzi/neural-ivp

  14. arXiv:2304.05366  [pdf, other

    cs.LG stat.ML

    The No Free Lunch Theorem, Kolmogorov Complexity, and the Role of Inductive Biases in Machine Learning

    Authors: Micah Goldblum, Marc Finzi, Keefer Rowan, Andrew Gordon Wilson

    Abstract: No free lunch theorems for supervised learning state that no learner can solve all problems or that all learners achieve exactly the same accuracy on average over a uniform distribution on learning problems. Accordingly, these theorems are often referenced in support of the notion that individual problems require specially tailored inductive biases. While virtually all uniformly sampled datasets h… ▽ More

    Submitted 7 June, 2024; v1 submitted 11 April, 2023; originally announced April 2023.

    Comments: Published at the International Conference on Machine Learning (ICML) 2024

  15. arXiv:2302.04019  [pdf, other

    cs.LG stat.ML

    Fortuna: A Library for Uncertainty Quantification in Deep Learning

    Authors: Gianluca Detommaso, Alberto Gasparin, Michele Donini, Matthias Seeger, Andrew Gordon Wilson, Cedric Archambeau

    Abstract: We present Fortuna, an open-source library for uncertainty quantification in deep learning. Fortuna supports a range of calibration techniques, such as conformal prediction that can be applied to any trained neural network to generate reliable uncertainty estimates, and scalable Bayesian inference methods that can be applied to Flax-based deep neural networks trained from scratch for improved unce… ▽ More

    Submitted 8 February, 2023; originally announced February 2023.

  16. arXiv:2211.13609  [pdf, other

    cs.LG stat.ML

    PAC-Bayes Compression Bounds So Tight That They Can Explain Generalization

    Authors: Sanae Lotfi, Marc Finzi, Sanyam Kapoor, Andres Potapczynski, Micah Goldblum, Andrew Gordon Wilson

    Abstract: While there has been progress in developing non-vacuous generalization bounds for deep neural networks, these bounds tend to be uninformative about why deep learning works. In this paper, we develop a compression approach based on quantizing neural network parameters in a linear subspace, profoundly improving on previous results to provide state-of-the-art generalization bounds on a variety of tas… ▽ More

    Submitted 24 November, 2022; originally announced November 2022.

    Comments: NeurIPS 2022. Code is available at https://github.com/activatedgeek/tight-pac-bayes

  17. arXiv:2210.12496  [pdf, other

    cs.LG stat.ML

    Bayesian Optimization with Conformal Prediction Sets

    Authors: Samuel Stanton, Wesley Maddox, Andrew Gordon Wilson

    Abstract: Bayesian optimization is a coherent, ubiquitous approach to decision-making under uncertainty, with applications including multi-arm bandits, active learning, and black-box optimization. Bayesian optimization selects decisions (i.e. objective function queries) with maximal expected utility with respect to the posterior distribution of a Bayesian model, which quantifies reducible, epistemic uncerta… ▽ More

    Submitted 12 December, 2023; v1 submitted 22 October, 2022; originally announced October 2022.

    Comments: For code, see https://www.github.com/samuelstanton/conformal-bayesopt.git

    Journal ref: Proceedings of Machine Learning Research, Volume 206, 959-986, PMLR, 2023

  18. arXiv:2210.11369  [pdf, other

    cs.LG cs.CV stat.ML

    On Feature Learning in the Presence of Spurious Correlations

    Authors: Pavel Izmailov, Polina Kirichenko, Nate Gruver, Andrew Gordon Wilson

    Abstract: Deep classifiers are known to rely on spurious features $\unicode{x2013}$ patterns which are correlated with the target on the training data but not inherently relevant to the learning problem, such as the image backgrounds when classifying the foregrounds. In this paper we evaluate the amount of information about the core (non-spurious) features that can be decoded from the representations learne… ▽ More

    Submitted 20 October, 2022; originally announced October 2022.

    Comments: NeurIPS 2022. Code available at https://github.com/izmailovpavel/spurious_feature_learning

  19. arXiv:2210.02984  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    The Lie Derivative for Measuring Learned Equivariance

    Authors: Nate Gruver, Marc Finzi, Micah Goldblum, Andrew Gordon Wilson

    Abstract: Equivariance guarantees that a model's predictions capture key symmetries in data. When an image is translated or rotated, an equivariant model's representation of that image will translate or rotate accordingly. The success of convolutional neural networks has historically been tied to translation equivariance directly encoded in their architecture. The rising success of vision transformers, whic… ▽ More

    Submitted 18 June, 2024; v1 submitted 6 October, 2022; originally announced October 2022.

    Comments: ICLR 2023. Code available at: https://github.com/ngruver/lie-deriv

  20. arXiv:2207.06544  [pdf, other

    cs.LG q-fin.ST stat.ML

    Volatility Based Kernels and Moving Average Means for Accurate Forecasting with Gaussian Processes

    Authors: Gregory Benton, Wesley J. Maddox, Andrew Gordon Wilson

    Abstract: A broad class of stochastic volatility models are defined by systems of stochastic differential equations. While these models have seen widespread success in domains such as finance and statistical climatology, they typically lack an ability to condition on historical data to produce a true posterior distribution. To address this fundamental limitation, we show how to re-cast a class of stochastic… ▽ More

    Submitted 13 July, 2022; originally announced July 2022.

    Comments: ICML 2022. Code available at https://github.com/g-benton/Volt

  21. arXiv:2206.15306  [pdf, other

    cs.LG stat.ML

    Transfer Learning with Deep Tabular Models

    Authors: Roman Levin, Valeriia Cherepanova, Avi Schwarzschild, Arpit Bansal, C. Bayan Bruss, Tom Goldstein, Andrew Gordon Wilson, Micah Goldblum

    Abstract: Recent work on deep learning for tabular data demonstrates the strong performance of deep tabular models, often bridging the gap between gradient boosted decision trees and neural networks. Accuracy aside, a major advantage of neural models is that they learn reusable features and are easily fine-tuned in new domains. This property is often exploited in computer vision and natural language applica… ▽ More

    Submitted 7 August, 2023; v1 submitted 30 June, 2022; originally announced June 2022.

    Journal ref: International Conference on Learning Representations (ICLR), 2023

  22. arXiv:2206.09909  [pdf, other

    cs.LG stat.ML

    Low-Precision Stochastic Gradient Langevin Dynamics

    Authors: Ruqi Zhang, Andrew Gordon Wilson, Christopher De Sa

    Abstract: While low-precision optimization has been widely used to accelerate deep learning, low-precision sampling remains largely unexplored. As a consequence, sampling is simply infeasible in many large-scale scenarios, despite providing remarkable benefits to generalization and uncertainty estimation for neural networks. In this paper, we provide the first study of low-precision Stochastic Gradient Lang… ▽ More

    Submitted 20 June, 2022; originally announced June 2022.

    Comments: Published at ICML 2022

  23. arXiv:2204.02937  [pdf, other

    cs.LG cs.CV stat.ML

    Last Layer Re-Training is Sufficient for Robustness to Spurious Correlations

    Authors: Polina Kirichenko, Pavel Izmailov, Andrew Gordon Wilson

    Abstract: Neural network classifiers can largely rely on simple spurious features, such as backgrounds, to make predictions. However, even in these cases, we show that they still often learn core features associated with the desired attributes of the data, contrary to recent findings. Inspired by this insight, we demonstrate that simple last layer retraining can match or outperform state-of-the-art approach… ▽ More

    Submitted 30 June, 2023; v1 submitted 6 April, 2022; originally announced April 2022.

    Comments: ICLR 2023. Code is available at https://github.com/PolinaKirichenko/deep_feature_reweighting

  24. arXiv:2203.16481  [pdf, other

    cs.LG stat.ML

    On Uncertainty, Tempering, and Data Augmentation in Bayesian Classification

    Authors: Sanyam Kapoor, Wesley J. Maddox, Pavel Izmailov, Andrew Gordon Wilson

    Abstract: Aleatoric uncertainty captures the inherent randomness of the data, such as measurement noise. In Bayesian regression, we often use a Gaussian observation model, where we control the level of aleatoric uncertainty with a noise variance parameter. By contrast, for Bayesian classification we use a categorical distribution with no mechanism to represent our beliefs about aleatoric uncertainty. Our wo… ▽ More

    Submitted 30 March, 2022; originally announced March 2022.

  25. arXiv:2203.12742  [pdf, other

    cs.LG cs.NE q-bio.QM stat.ML

    Accelerating Bayesian Optimization for Biological Sequence Design with Denoising Autoencoders

    Authors: Samuel Stanton, Wesley Maddox, Nate Gruver, Phillip Maffettone, Emily Delaney, Peyton Greenside, Andrew Gordon Wilson

    Abstract: Bayesian optimization (BayesOpt) is a gold standard for query-efficient continuous optimization. However, its adoption for drug design has been hindered by the discrete, high-dimensional nature of the decision variables. We develop a new approach (LaMBO) which jointly trains a denoising autoencoder with a discriminative multi-task Gaussian process head, allowing gradient-based optimization of mult… ▽ More

    Submitted 12 July, 2022; v1 submitted 23 March, 2022; originally announced March 2022.

    Comments: ICML 2022. Code available at https://github.com/samuelstanton/lambo

  26. arXiv:2202.11678  [pdf, other

    cs.LG stat.ML

    Bayesian Model Selection, the Marginal Likelihood, and Generalization

    Authors: Sanae Lotfi, Pavel Izmailov, Gregory Benton, Micah Goldblum, Andrew Gordon Wilson

    Abstract: How do we compare between hypotheses that are entirely consistent with observations? The marginal likelihood (aka Bayesian evidence), which represents the probability of generating our observations from a prior, provides a distinctive approach to this foundational question, automatically encoding Occam's razor. Although it has been observed that the marginal likelihood can overfit and is sensitive… ▽ More

    Submitted 1 May, 2023; v1 submitted 23 February, 2022; originally announced February 2022.

    Comments: Extended version. Shorter ICML version available at arXiv:2202.11678v2

  27. arXiv:2202.04836  [pdf, other

    cs.LG math.DS physics.data-an stat.ML

    Deconstructing the Inductive Biases of Hamiltonian Neural Networks

    Authors: Nate Gruver, Marc Finzi, Samuel Stanton, Andrew Gordon Wilson

    Abstract: Physics-inspired neural networks (NNs), such as Hamiltonian or Lagrangian NNs, dramatically outperform other learned dynamics models by leveraging strong inductive biases. These models, however, are challenging to apply to many real world systems, such as those that don't conserve energy or contain contacts, a common setting for robotics and reinforcement learning. In this paper, we examine the in… ▽ More

    Submitted 11 February, 2022; v1 submitted 10 February, 2022; originally announced February 2022.

    Comments: ICLR 2022. Code available at https://github.com/ngruver/decon-hnn

  28. arXiv:2112.15246  [pdf, other

    cs.LG stat.ML

    When are Iterative Gaussian Processes Reliably Accurate?

    Authors: Wesley J. Maddox, Sanyam Kapoor, Andrew Gordon Wilson

    Abstract: While recent work on conjugate gradient methods and Lanczos decompositions have achieved scalable Gaussian process inference with highly accurate point predictions, in several implementations these iterative methods appear to struggle with numerical instabilities in learning kernel hyperparameters, and poor test likelihoods. By investigating CG tolerance, preconditioner rank, and Lanczos decomposi… ▽ More

    Submitted 30 December, 2021; originally announced December 2021.

    Comments: ICML 2021 OPTML Workshop

  29. arXiv:2112.12899  [pdf, other

    stat.ME

    Monitoring Deforestation Using Multivariate Bayesian Online Changepoint Detection with Outliers

    Authors: Laura J. Wendelberger, Josh M. Gray, Brian J. Reich, Alyson G. Wilson

    Abstract: Near real time change detection is important for a variety of Earth monitoring applications and remains a high priority for remote sensing science. Data sparsity, subtle changes, seasonal trends, and the presence of outliers make detecting actual landscape changes challenging. Adams and MacKay (2007) introduced Bayesian Online Changepoint Detection (BOCPD), a computationally efficient, exact Bayes… ▽ More

    Submitted 27 December, 2021; v1 submitted 23 December, 2021; originally announced December 2021.

    Comments: 21 pages, 6 figures

  30. arXiv:2112.01388  [pdf, other

    cs.LG stat.ML

    Residual Pathway Priors for Soft Equivariance Constraints

    Authors: Marc Finzi, Gregory Benton, Andrew Gordon Wilson

    Abstract: There is often a trade-off between building deep learning systems that are expressive enough to capture the nuances of the reality, and having the right inductive biases for efficient learning. We introduce Residual Pathway Priors (RPPs) as a method for converting hard architectural constraints into soft priors, guiding models towards structured solutions, while retaining the ability to capture ad… ▽ More

    Submitted 2 December, 2021; originally announced December 2021.

    Comments: NeurIPS 2021. Code available at https://github.com/mfinzi/residual-pathway-priors

  31. arXiv:2110.15172  [pdf, other

    cs.LG stat.ML

    Conditioning Sparse Variational Gaussian Processes for Online Decision-making

    Authors: Wesley J. Maddox, Samuel Stanton, Andrew Gordon Wilson

    Abstract: With a principled representation of uncertainty and closed form posterior updates, Gaussian processes (GPs) are a natural choice for online decision making. However, Gaussian processes typically require at least $\mathcal{O}(n^2)$ computations for $n$ training points, limiting their general applicability. Stochastic variational Gaussian processes (SVGPs) can provide scalable inference for a datase… ▽ More

    Submitted 28 October, 2021; originally announced October 2021.

    Comments: NeurIPS 2021

  32. arXiv:2106.12997  [pdf, other

    cs.LG cs.AI stat.ML

    Bayesian Optimization with High-Dimensional Outputs

    Authors: Wesley J. Maddox, Maximilian Balandat, Andrew Gordon Wilson, Eytan Bakshy

    Abstract: Bayesian Optimization is a sample-efficient black-box optimization procedure that is typically applied to problems with a small number of independent objectives. However, in practice we often wish to optimize objectives defined over many correlated outcomes (or "tasks"). For example, scientists may want to optimize the coverage of a cell tower network across a dense grid of locations. Similarly, e… ▽ More

    Submitted 28 October, 2021; v1 submitted 24 June, 2021; originally announced June 2021.

    Comments: NeurIPS 2021

  33. arXiv:2106.12772  [pdf, other

    cs.LG stat.ML

    Task-agnostic Continual Learning with Hybrid Probabilistic Models

    Authors: Polina Kirichenko, Mehrdad Farajtabar, Dushyant Rao, Balaji Lakshminarayanan, Nir Levine, Ang Li, Huiyi Hu, Andrew Gordon Wilson, Razvan Pascanu

    Abstract: Learning new tasks continuously without forgetting on a constantly changing data distribution is essential for real-world problems but extremely challenging for modern deep learning. In this work we propose HCL, a Hybrid generative-discriminative approach to Continual Learning for classification. We model the distribution of each task and each class with a normalizing flow. The flow is used to lea… ▽ More

    Submitted 24 June, 2021; originally announced June 2021.

  34. arXiv:2106.11905  [pdf, other

    cs.LG stat.ML

    Dangers of Bayesian Model Averaging under Covariate Shift

    Authors: Pavel Izmailov, Patrick Nicholson, Sanae Lotfi, Andrew Gordon Wilson

    Abstract: Approximate Bayesian inference for neural networks is considered a robust alternative to standard training, often providing good performance on out-of-distribution data. However, Bayesian neural networks (BNNs) with high-fidelity approximate inference via full-batch Hamiltonian Monte Carlo achieve poor generalization under covariate shift, even underperforming classical estimation. We explain this… ▽ More

    Submitted 6 December, 2021; v1 submitted 22 June, 2021; originally announced June 2021.

    Comments: NeurIPS 2021. Code is available at https://github.com/izmailovpavel/bnn_covariate_shift

  35. arXiv:2106.06695  [pdf, other

    cs.LG stat.ML

    SKIing on Simplices: Kernel Interpolation on the Permutohedral Lattice for Scalable Gaussian Processes

    Authors: Sanyam Kapoor, Marc Finzi, Ke Alexander Wang, Andrew Gordon Wilson

    Abstract: State-of-the-art methods for scalable Gaussian processes use iterative algorithms, requiring fast matrix vector multiplies (MVMs) with the covariance kernel. The Structured Kernel Interpolation (SKI) framework accelerates these MVMs by performing efficient MVMs on a grid and interpolating back to the original space. In this work, we develop a connection between SKI and the permutohedral lattice us… ▽ More

    Submitted 12 June, 2021; originally announced June 2021.

    Comments: International Conference on Machine Learning (ICML), 2021

  36. arXiv:2106.05992  [pdf, other

    cs.LG stat.ML

    Scalable Variational Gaussian Processes via Harmonic Kernel Decomposition

    Authors: Shengyang Sun, Jiaxin Shi, Andrew Gordon Wilson, Roger Grosse

    Abstract: We introduce a new scalable variational Gaussian process approximation which provides a high fidelity approximation while retaining general applicability. We propose the harmonic kernel decomposition (HKD), which uses Fourier series to decompose a kernel as a sum of orthogonal kernels. Our variational approximation exploits this orthogonality to enable a large number of inducing points at a low co… ▽ More

    Submitted 10 June, 2021; originally announced June 2021.

    Comments: ICML2021, 21 pages

  37. arXiv:2106.05945  [pdf, other

    cs.LG stat.ML

    Does Knowledge Distillation Really Work?

    Authors: Samuel Stanton, Pavel Izmailov, Polina Kirichenko, Alexander A. Alemi, Andrew Gordon Wilson

    Abstract: Knowledge distillation is a popular technique for training a small student network to emulate a larger teacher model, such as an ensemble of networks. We show that while knowledge distillation can improve student generalization, it does not typically work as it is commonly understood: there often remains a surprisingly large discrepancy between the predictive distributions of the teacher and the s… ▽ More

    Submitted 6 December, 2021; v1 submitted 10 June, 2021; originally announced June 2021.

    Comments: NeurIPS 2021. Code available at https://github.com/samuelstanton/gnosis

  38. arXiv:2104.14421  [pdf, other

    cs.LG stat.ML

    What Are Bayesian Neural Network Posteriors Really Like?

    Authors: Pavel Izmailov, Sharad Vikram, Matthew D. Hoffman, Andrew Gordon Wilson

    Abstract: The posterior over Bayesian neural network (BNN) parameters is extremely high-dimensional and non-convex. For computational reasons, researchers approximate this posterior using inexpensive mini-batch methods such as mean-field variational inference or stochastic-gradient Markov chain Monte Carlo (SGMCMC). To investigate foundational questions in Bayesian deep learning, we instead use full-batch H… ▽ More

    Submitted 29 April, 2021; originally announced April 2021.

  39. arXiv:2104.09459  [pdf, other

    cs.LG math.DS stat.ML

    A Practical Method for Constructing Equivariant Multilayer Perceptrons for Arbitrary Matrix Groups

    Authors: Marc Finzi, Max Welling, Andrew Gordon Wilson

    Abstract: Symmetries and equivariance are fundamental to the generalization of neural networks on domains such as images, graphs, and point clouds. Existing work has primarily focused on a small number of groups, such as the translation, rotation, and permutation groups. In this work we provide a completely general algorithm for solving for the equivariant layers of matrix groups. In addition to recovering… ▽ More

    Submitted 19 April, 2021; originally announced April 2021.

    Comments: Library: https://github.com/mfinzi/equivariant-MLP, Documentation: https://emlp.readthedocs.io/en/latest/, Examples: https://colab.research.google.com/github/mfinzi/equivariant-MLP/blob/master/docs/notebooks/colabs/all.ipynb

  40. arXiv:2103.01454  [pdf, other

    stat.ML cs.LG

    Kernel Interpolation for Scalable Online Gaussian Processes

    Authors: Samuel Stanton, Wesley J. Maddox, Ian Delbridge, Andrew Gordon Wilson

    Abstract: Gaussian processes (GPs) provide a gold standard for performance in online settings, such as sample-efficient control and black box optimization, where we need to update a posterior distribution as we acquire data in a sequential fashion. However, updating a GP posterior to accommodate even a single new observation after having observed $n$ points incurs at least $O(n)$ computations in the exact s… ▽ More

    Submitted 1 March, 2021; originally announced March 2021.

    Comments: AISTATS 2021

  41. arXiv:2103.01439  [pdf, other

    stat.ML cs.LG

    Fast Adaptation with Linearized Neural Networks

    Authors: Wesley J. Maddox, Shuai Tang, Pablo Garcia Moreno, Andrew Gordon Wilson, Andreas Damianou

    Abstract: The inductive biases of trained neural networks are difficult to understand and, consequently, to adapt to new settings. We study the inductive biases of linearizations of neural networks, which we show to be surprisingly good summaries of the full network functions. Inspired by this finding, we propose a technique for embedding these inductive biases into Gaussian processes through a kernel desig… ▽ More

    Submitted 28 April, 2021; v1 submitted 1 March, 2021; originally announced March 2021.

    Comments: AISTATS 2021

  42. arXiv:2102.13042  [pdf, other

    cs.LG cs.CV stat.ML

    Loss Surface Simplexes for Mode Connecting Volumes and Fast Ensembling

    Authors: Gregory W. Benton, Wesley J. Maddox, Sanae Lotfi, Andrew Gordon Wilson

    Abstract: With a better understanding of the loss surfaces for multilayer networks, we can build more robust and accurate training procedures. Recently it was discovered that independently trained SGD solutions can be connected along one-dimensional paths of near-constant training loss. In this paper, we show that there are mode-connecting simplicial complexes that form multi-dimensional manifolds of low lo… ▽ More

    Submitted 15 November, 2021; v1 submitted 25 February, 2021; originally announced February 2021.

    Comments: ICML 2021

  43. arXiv:2010.13581  [pdf, other

    cs.LG math.DS physics.comp-ph physics.data-an stat.ML

    Simplifying Hamiltonian and Lagrangian Neural Networks via Explicit Constraints

    Authors: Marc Finzi, Ke Alexander Wang, Andrew Gordon Wilson

    Abstract: Reasoning about the physical world requires models that are endowed with the right inductive biases to learn the underlying dynamics. Recent works improve generalization for predicting trajectories by learning the Hamiltonian or Lagrangian of a system rather than the differential equations directly. While these methods encode the constraints of the systems using generalized coordinates, we show th… ▽ More

    Submitted 26 October, 2020; originally announced October 2020.

    Comments: NeurIPS 2020. Code available at https://github.com/mfinzi/constrained-hamiltonian-neural-networks

  44. arXiv:2010.11882  [pdf, other

    cs.LG stat.ML

    Learning Invariances in Neural Networks

    Authors: Gregory Benton, Marc Finzi, Pavel Izmailov, Andrew Gordon Wilson

    Abstract: Invariances to translations have imbued convolutional neural networks with powerful generalization properties. However, we often do not know a priori what invariances are present in the data, or to what extent a model should be invariant to a given symmetry group. We show how to \emph{learn} invariances and equivariances by parameterizing a distribution over augmentations and optimizing the traini… ▽ More

    Submitted 1 December, 2020; v1 submitted 22 October, 2020; originally announced October 2020.

    Comments: NeurIPS 2020. Code available at https://github.com/g-benton/learning-invariances

  45. arXiv:2008.12775  [pdf, other

    cs.LG cs.AI cs.RO stat.ML

    On the model-based stochastic value gradient for continuous reinforcement learning

    Authors: Brandon Amos, Samuel Stanton, Denis Yarats, Andrew Gordon Wilson

    Abstract: For over a decade, model-based reinforcement learning has been seen as a way to leverage control-based domain knowledge to improve the sample-efficiency of reinforcement learning agents. While model-based agents are conceptually appealing, their policies tend to lag behind those of model-free agents in terms of final reward, especially in non-trivial environments. In response, researchers have pro… ▽ More

    Submitted 27 May, 2021; v1 submitted 28 August, 2020; originally announced August 2020.

    Comments: L4DC 2021

  46. arXiv:2006.09157  [pdf, other

    stat.ME stat.ML

    Selecting Diverse Models for Scientific Insight

    Authors: Laura J. Wendelberger, Brian J. Reich, Alyson G. Wilson

    Abstract: Model selection often aims to choose a single model, assuming that the form of the model is correct. However, there may be multiple possible underlying explanatory patterns in a set of predictors that could explain a response. Model selection without regard for model uncertainty can fail to bring these patterns to light. We explore multi-model penalized regression (MMPR) to acknowledge model uncer… ▽ More

    Submitted 15 December, 2021; v1 submitted 16 June, 2020; originally announced June 2020.

    Comments: 37 Pages, 14 Figures. Presented at Conference on Data Analysis (CoDA) 2020 (Feb 25-27)

  47. arXiv:2006.08545  [pdf, other

    stat.ML cs.LG

    Why Normalizing Flows Fail to Detect Out-of-Distribution Data

    Authors: Polina Kirichenko, Pavel Izmailov, Andrew Gordon Wilson

    Abstract: Detecting out-of-distribution (OOD) data is crucial for robust machine learning systems. Normalizing flows are flexible deep generative models that often surprisingly fail to distinguish between in- and out-of-distribution data: a flow trained on pictures of clothing assigns higher likelihood to handwritten digits. We investigate why normalizing flows perform poorly for OOD detection. We demonstra… ▽ More

    Submitted 15 June, 2020; originally announced June 2020.

    Comments: Code is available at https://github.com/PolinaKirichenko/flows_ood

  48. arXiv:2006.06900  [pdf, other

    cs.LG cs.CL stat.ML

    Improving GAN Training with Probability Ratio Clipping and Sample Reweighting

    Authors: Yue Wu, Pan Zhou, Andrew Gordon Wilson, Eric P. Xing, Zhiting Hu

    Abstract: Despite success on a wide range of problems related to vision, generative adversarial networks (GANs) often suffer from inferior performance due to unstable training, especially for text generation. To solve this issue, we propose a new variational GAN training framework which enjoys superior training stability. Our approach is inspired by a connection of GANs and reinforcement learning under a va… ▽ More

    Submitted 30 October, 2020; v1 submitted 11 June, 2020; originally announced June 2020.

    Comments: NeurIPS 2020 camera ready version (citations updated)

  49. arXiv:2003.02139  [pdf, other

    cs.LG stat.ML

    Rethinking Parameter Counting in Deep Models: Effective Dimensionality Revisited

    Authors: Wesley J. Maddox, Gregory Benton, Andrew Gordon Wilson

    Abstract: Neural networks appear to have mysterious generalization properties when using parameter counting as a proxy for complexity. Indeed, neural networks often have many more parameters than there are data points, yet still provide good generalization performance. Moreover, when we measure generalization as a function of parameters, we see double descent behaviour, where the test error decreases, incre… ▽ More

    Submitted 25 May, 2020; v1 submitted 4 March, 2020; originally announced March 2020.

  50. arXiv:2002.12880  [pdf, other

    stat.ML cs.LG

    Generalizing Convolutional Neural Networks for Equivariance to Lie Groups on Arbitrary Continuous Data

    Authors: Marc Finzi, Samuel Stanton, Pavel Izmailov, Andrew Gordon Wilson

    Abstract: The translation equivariance of convolutional layers enables convolutional neural networks to generalize well on image problems. While translation equivariance provides a powerful inductive bias for images, we often additionally desire equivariance to other transformations, such as rotations, especially for non-image data. We propose a general method to construct a convolutional layer that is equi… ▽ More

    Submitted 24 September, 2020; v1 submitted 25 February, 2020; originally announced February 2020.

    Comments: ICML 2020. Code available at https://github.com/mfinzi/LieConv