Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–5 of 5 results for author: Kerekes, A

Searching in archive stat. Search in all archives.
.
  1. arXiv:2405.01964  [pdf, other

    stat.ML cs.LG

    Position: Understanding LLMs Requires More Than Statistical Generalization

    Authors: Patrik Reizinger, Szilvia Ujváry, Anna Mészáros, Anna Kerekes, Wieland Brendel, Ferenc Huszár

    Abstract: The last decade has seen blossoming research in deep learning theory attempting to answer, "Why does deep learning generalize?" A powerful shift in perspective precipitated this progress: the study of overparametrized models in the interpolation regime. In this paper, we argue that another perspective shift is due, since some of the desirable qualities of LLMs are not a consequence of good statist… ▽ More

    Submitted 17 June, 2024; v1 submitted 3 May, 2024; originally announced May 2024.

    Comments: Accepted as a position paper at ICML2024, Code: https://github.com/rpatrik96/llm-non-identifiability

  2. arXiv:2305.09605  [pdf, other

    stat.ML cs.LG

    To smooth a cloud or to pin it down: Guarantees and Insights on Score Matching in Denoising Diffusion Models

    Authors: Francisco Vargas, Teodora Reu, Anna Kerekes, Michael M Bronstein

    Abstract: Denoising diffusion models are a class of generative models which have recently achieved state-of-the-art results across many domains. Gradual noise is added to the data using a diffusion process, which transforms the data distribution into a Gaussian. Samples from the generative model are then obtained by simulating an approximation of the time reversal of this diffusion initialized by Gaussian s… ▽ More

    Submitted 26 June, 2024; v1 submitted 16 May, 2023; originally announced May 2023.

    Comments: arXiv admin note: text overlap with arXiv:1903.01608 by other authors

  3. arXiv:2210.10452  [pdf, other

    stat.ML cs.LG

    Rethinking Sharpness-Aware Minimization as Variational Inference

    Authors: Szilvia Ujváry, Zsigmond Telek, Anna Kerekes, Anna Mészáros, Ferenc Huszár

    Abstract: Sharpness-aware minimization (SAM) aims to improve the generalisation of gradient-based learning by seeking out flat minima. In this work, we establish connections between SAM and Mean-Field Variational Inference (MFVI) of neural network parameters. We show that both these methods have interpretations as optimizing notions of flatness, and when using the reparametrisation trick, they both boil dow… ▽ More

    Submitted 19 October, 2022; originally announced October 2022.

  4. arXiv:2111.11542  [pdf, other

    stat.ML cs.LG

    Depth Without the Magic: Inductive Bias of Natural Gradient Descent

    Authors: Anna Kerekes, Anna Mészáros, Ferenc Huszár

    Abstract: In gradient descent, changing how we parametrize the model can lead to drastically different optimization trajectories, giving rise to a surprising range of meaningful inductive biases: identifying sparse classifiers or reconstructing low-rank matrices without explicit regularization. This implicit regularization has been hypothesised to be a contributing factor to good generalization in deep lear… ▽ More

    Submitted 22 November, 2021; originally announced November 2021.

  5. Robust Group Subspace Recovery: A New Approach for Multi-Modality Data Fusion

    Authors: Sally Ghanem, Ashkan Panahi, Hamid Krim, Ryan A. Kerekes

    Abstract: Robust Subspace Recovery (RoSuRe) algorithm was recently introduced as a principled and numerically efficient algorithm that unfolds underlying Unions of Subspaces (UoS) structure, present in the data. The union of Subspaces (UoS) is capable of identifying more complex trends in data sets than simple linear models. We build on and extend RoSuRe to prospect the structure of different data modalitie… ▽ More

    Submitted 18 June, 2020; originally announced June 2020.

    Comments: 10 pages