Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–9 of 9 results for author: Potapczynski, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.06248  [pdf, other

    cs.LG

    Compute Better Spent: Replacing Dense Layers with Structured Matrices

    Authors: Shikai Qiu, Andres Potapczynski, Marc Finzi, Micah Goldblum, Andrew Gordon Wilson

    Abstract: Dense linear layers are the dominant computational bottleneck in foundation models. Identifying more efficient alternatives to dense matrices has enormous potential for building more compute-efficient models, as exemplified by the success of convolutional networks in the image domain. In this work, we systematically explore structured matrices as replacements for dense matrices. We show that diffe… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: ICML 24. Code available at https://github.com/shikaiqiu/compute-better-spent

  2. arXiv:2309.03060  [pdf, other

    cs.LG math.NA stat.ML

    CoLA: Exploiting Compositional Structure for Automatic and Efficient Numerical Linear Algebra

    Authors: Andres Potapczynski, Marc Finzi, Geoff Pleiss, Andrew Gordon Wilson

    Abstract: Many areas of machine learning and science involve large linear algebra problems, such as eigendecompositions, solving linear systems, computing matrix exponentials, and trace estimation. The matrices involved often have Kronecker, convolutional, block diagonal, sum, or product structure. In this paper, we propose a simple but general framework for large-scale linear algebra problems in machine le… ▽ More

    Submitted 29 November, 2023; v1 submitted 6 September, 2023; originally announced September 2023.

    Comments: Code available at https://github.com/wilson-labs/cola. NeurIPS 2023

  3. arXiv:2306.11074  [pdf, other

    cs.LG stat.ML

    Simple and Fast Group Robustness by Automatic Feature Reweighting

    Authors: Shikai Qiu, Andres Potapczynski, Pavel Izmailov, Andrew Gordon Wilson

    Abstract: A major challenge to out-of-distribution generalization is reliance on spurious features -- patterns that are predictive of the class label in the training data distribution, but not causally related to the target. Standard methods for reducing the reliance on spurious features typically assume that we know what the spurious feature is, which is rarely true in the real world. Methods that attempt… ▽ More

    Submitted 19 June, 2023; originally announced June 2023.

    Comments: ICML 23. Code available at https://github.com/AndPotap/afr

    Journal ref: 40th International Conference on Machine Learning 2023

  4. arXiv:2304.14994  [pdf, other

    cs.LG math.NA stat.ML

    A Stable and Scalable Method for Solving Initial Value PDEs with Neural Networks

    Authors: Marc Finzi, Andres Potapczynski, Matthew Choptuik, Andrew Gordon Wilson

    Abstract: Unlike conventional grid and mesh based methods for solving partial differential equations (PDEs), neural networks have the potential to break the curse of dimensionality, providing approximate solutions to problems where using classical solvers is difficult or impossible. While global minimization of the PDE residual over the network parameters works well for boundary value problems, catastrophic… ▽ More

    Submitted 30 August, 2023; v1 submitted 28 April, 2023; originally announced April 2023.

    Comments: ICLR 2023. Code available at https://github.com/mfinzi/neural-ivp

  5. arXiv:2211.13609  [pdf, other

    cs.LG stat.ML

    PAC-Bayes Compression Bounds So Tight That They Can Explain Generalization

    Authors: Sanae Lotfi, Marc Finzi, Sanyam Kapoor, Andres Potapczynski, Micah Goldblum, Andrew Gordon Wilson

    Abstract: While there has been progress in developing non-vacuous generalization bounds for deep neural networks, these bounds tend to be uninformative about why deep learning works. In this paper, we develop a compression approach based on quantizing neural network parameters in a linear subspace, profoundly improving on previous results to provide state-of-the-art generalization bounds on a variety of tas… ▽ More

    Submitted 24 November, 2022; originally announced November 2022.

    Comments: NeurIPS 2022. Code is available at https://github.com/activatedgeek/tight-pac-bayes

  6. arXiv:2207.06856  [pdf, other

    cs.LG

    Low-Precision Arithmetic for Fast Gaussian Processes

    Authors: Wesley J. Maddox, Andres Potapczynski, Andrew Gordon Wilson

    Abstract: Low-precision arithmetic has had a transformative effect on the training of neural networks, reducing computation, memory and energy requirements. However, despite its promise, low-precision arithmetic has received little attention for Gaussian processes (GPs), largely because GPs require sophisticated linear algebra routines that are unstable in low-precision. We study the different failure modes… ▽ More

    Submitted 14 July, 2022; originally announced July 2022.

    Comments: UAI 2022. Code available at https://github.com/AndPotap/halfpres_gps

  7. arXiv:2204.13290  [pdf, other

    stat.ML cs.LG

    On the Normalizing Constant of the Continuous Categorical Distribution

    Authors: Elliott Gordon-Rodriguez, Gabriel Loaiza-Ganem, Andres Potapczynski, John P. Cunningham

    Abstract: Probability distributions supported on the simplex enjoy a wide range of applications across statistics and machine learning. Recently, a novel family of such distributions has been discovered: the continuous categorical. This family enjoys remarkable mathematical simplicity; its density function resembles that of the Dirichlet distribution, but with a normalizing constant that can be written in c… ▽ More

    Submitted 28 April, 2022; originally announced April 2022.

  8. arXiv:2102.06695  [pdf, other

    cs.LG stat.ML

    Bias-Free Scalable Gaussian Processes via Randomized Truncations

    Authors: Andres Potapczynski, Luhuan Wu, Dan Biderman, Geoff Pleiss, John P. Cunningham

    Abstract: Scalable Gaussian Process methods are computationally attractive, yet introduce modeling biases that require rigorous study. This paper analyzes two common techniques: early truncated conjugate gradients (CG) and random Fourier features (RFF). We find that both methods introduce a systematic bias on the learned hyperparameters: CG tends to underfit while RFF tends to overfit. We address these issu… ▽ More

    Submitted 28 June, 2021; v1 submitted 12 February, 2021; originally announced February 2021.

    Journal ref: 38th International Conference on Machine Learning (ICML 2021)

  9. arXiv:1912.09588  [pdf, other

    stat.ML cs.LG

    Invertible Gaussian Reparameterization: Revisiting the Gumbel-Softmax

    Authors: Andres Potapczynski, Gabriel Loaiza-Ganem, John P. Cunningham

    Abstract: The Gumbel-Softmax is a continuous distribution over the simplex that is often used as a relaxation of discrete distributions. Because it can be readily interpreted and easily reparameterized, it enjoys widespread use. We propose a modular and more flexible family of reparameterizable distributions where Gaussian noise is transformed into a one-hot approximation through an invertible function. Thi… ▽ More

    Submitted 29 August, 2022; v1 submitted 19 December, 2019; originally announced December 2019.

    Comments: Accepted at NeurIPS 2020

    Journal ref: Published: NeurIPS 2020