Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–30 of 30 results for author: Ablin, P

Searching in archive stat. Search in all archives.
.
  1. arXiv:2405.01702  [pdf, other

    cs.LG math.OC stat.ML

    Optimization without Retraction on the Random Generalized Stiefel Manifold

    Authors: Simon Vary, Pierre Ablin, Bin Gao, P. -A. Absil

    Abstract: Optimization over the set of matrices $X$ that satisfy $X^\top B X = I_p$, referred to as the generalized Stiefel manifold, appears in many applications involving sampled covariance matrices such as the canonical correlation analysis (CCA), independent component analysis (ICA), and the generalized eigenvalue problem (GEVP). Solving these problems is typically done by iterative methods that require… ▽ More

    Submitted 5 June, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

    Comments: This v2 is the camera-ready version for ICML 2024

    MSC Class: 90C26; 90C15

  2. arXiv:2402.02998  [pdf, other

    cs.LG stat.ML

    Careful with that Scalpel: Improving Gradient Surgery with an EMA

    Authors: Yu-Guan Hsieh, James Thornton, Eugene Ndiaye, Michal Klein, Marco Cuturi, Pierre Ablin

    Abstract: Beyond minimizing a single training loss, many deep learning estimation pipelines rely on an auxiliary objective to quantify and encourage desirable properties of the model (e.g. performance on another dataset, robustness, agreement with a prior). Although the simplest approach to incorporating an auxiliary loss is to sum it with the training loss as a regularizer, recent works have shown that one… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

  3. arXiv:2310.17386  [pdf, other

    stat.ML cs.LG

    A Challenge in Reweighting Data with Bilevel Optimization

    Authors: Anastasia Ivanova, Pierre Ablin

    Abstract: In many scenarios, one uses a large training set to train a model with the goal of performing well on a smaller testing set with a different distribution. Learning a weight for each data point of the training set is an appealing solution, as it ideally allows one to automatically learn the importance of each training point for generalization on the testing set. This task is usually formalized as a… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

  4. arXiv:2307.13813  [pdf, other

    stat.ML cs.AI cs.LG

    How to Scale Your EMA

    Authors: Dan Busbridge, Jason Ramapuram, Pierre Ablin, Tatiana Likhomanenko, Eeshan Gunesh Dhekane, Xavier Suau, Russ Webb

    Abstract: Preserving training dynamics across batch sizes is an important tool for practical machine learning as it enables the trade-off between batch size and wall-clock time. This trade-off is typically enabled by a scaling rule, for example, in stochastic gradient descent, one should scale the learning rate linearly with the batch size. Another important machine learning tool is the model EMA, a functio… ▽ More

    Submitted 7 November, 2023; v1 submitted 25 July, 2023; originally announced July 2023.

    Comments: Spotlight at NeurIPS 2023, 53 pages, 32 figures, 17 tables

  5. arXiv:2306.11895  [pdf, other

    stat.ML cs.LG

    Learning Elastic Costs to Shape Monge Displacements

    Authors: Michal Klein, Aram-Alexandre Pooladian, Pierre Ablin, Eugène Ndiaye, Jonathan Niles-Weed, Marco Cuturi

    Abstract: Given a source and a target probability measure supported on $\mathbb{R}^d$, the Monge problem asks to find the most efficient way to map one distribution to the other. This efficiency is quantified by defining a \textit{cost} function between source and target data. Such a cost is often set by default in the machine learning literature to the squared-Euclidean distance,… ▽ More

    Submitted 23 May, 2024; v1 submitted 20 June, 2023; originally announced June 2023.

  6. arXiv:2305.15042  [pdf, other

    cs.LG stat.ML

    Test like you Train in Implicit Deep Learning

    Authors: Zaccharie Ramzi, Pierre Ablin, Gabriel Peyré, Thomas Moreau

    Abstract: Implicit deep learning has recently gained popularity with applications ranging from meta-learning to Deep Equilibrium Networks (DEQs). In its general formulation, it relies on expressing some components of deep learning pipelines implicitly, typically via a root equation called the inner problem. In practice, the solution of the inner problem is approximated during training with an iterative proc… ▽ More

    Submitted 24 May, 2023; originally announced May 2023.

  7. arXiv:2303.16510  [pdf, other

    stat.ML cs.LG math.OC

    Infeasible Deterministic, Stochastic, and Variance-Reduction Algorithms for Optimization under Orthogonality Constraints

    Authors: Pierre Ablin, Simon Vary, Bin Gao, P. -A. Absil

    Abstract: Orthogonality constraints naturally appear in many machine learning problems, from Principal Components Analysis to robust neural network training. They are usually solved using Riemannian optimization algorithms, which minimize the objective function while enforcing the constraint. However, enforcing the orthogonality constraint can be the most time-consuming operation in such algorithms. Recentl… ▽ More

    Submitted 29 March, 2023; originally announced March 2023.

  8. arXiv:2302.08766  [pdf, other

    stat.ML cs.LG math.OC

    A Lower Bound and a Near-Optimal Algorithm for Bilevel Empirical Risk Minimization

    Authors: Mathieu Dagréou, Thomas Moreau, Samuel Vaiter, Pierre Ablin

    Abstract: Bilevel optimization problems, which are problems where two optimization problems are nested, have more and more applications in machine learning. In many practical cases, the upper and the lower objectives correspond to empirical risk minimization problems and therefore have a sum structure. In this context, we propose a bilevel extension of the celebrated SARAH algorithm. We demonstrate that the… ▽ More

    Submitted 20 February, 2024; v1 submitted 17 February, 2023; originally announced February 2023.

    Comments: Accepted at AISTATS 2024

  9. arXiv:2302.04065  [pdf, other

    stat.ML cs.LG q-bio.GN

    Monge, Bregman and Occam: Interpretable Optimal Transport in High-Dimensions with Feature-Sparse Maps

    Authors: Marco Cuturi, Michal Klein, Pierre Ablin

    Abstract: Optimal transport (OT) theory focuses, among all maps $T:\mathbb{R}^d\rightarrow \mathbb{R}^d$ that can morph a probability measure onto another, on those that are the ``thriftiest'', i.e. such that the averaged cost $c(x, T(x))$ between $x$ and its image $T(x)$ be as small as possible. Many computational approaches have been proposed to estimate such Monge maps when $c$ is the $\ell_2^2$ distance… ▽ More

    Submitted 8 February, 2023; originally announced February 2023.

  10. arXiv:2206.13424  [pdf, other

    cs.LG math.OC stat.ML

    Benchopt: Reproducible, efficient and collaborative optimization benchmarks

    Authors: Thomas Moreau, Mathurin Massias, Alexandre Gramfort, Pierre Ablin, Pierre-Antoine Bannier, Benjamin Charlier, Mathieu Dagréou, Tom Dupré la Tour, Ghislain Durif, Cassio F. Dantas, Quentin Klopfenstein, Johan Larsson, En Lai, Tanguy Lefort, Benoit Malézieux, Badr Moufad, Binh T. Nguyen, Alain Rakotomamonjy, Zaccharie Ramzi, Joseph Salmon, Samuel Vaiter

    Abstract: Numerical validation is at the core of machine learning research as it allows to assess the actual impact of new methods, and to confirm the agreement between theory and practice. Yet, the rapid development of the field poses several challenges: researchers are confronted with a profusion of methods to compare, limited transparency and consensus on best practices, as well as tedious re-implementat… ▽ More

    Submitted 28 October, 2022; v1 submitted 27 June, 2022; originally announced June 2022.

    Comments: Accepted in proceedings of NeurIPS 22; Benchopt library documentation is available at https://benchopt.github.io/

  11. arXiv:2205.14612  [pdf, other

    cs.LG stat.ML

    Do Residual Neural Networks discretize Neural Ordinary Differential Equations?

    Authors: Michael E. Sander, Pierre Ablin, Gabriel Peyré

    Abstract: Neural Ordinary Differential Equations (Neural ODEs) are the continuous analog of Residual Neural Networks (ResNets). We investigate whether the discrete dynamics defined by a ResNet are close to the continuous one of a Neural ODE. We first quantify the distance between the ResNet's hidden state trajectory and the solution of its corresponding Neural ODE. Our bound is tight and, on the negative si… ▽ More

    Submitted 15 September, 2022; v1 submitted 29 May, 2022; originally announced May 2022.

    Comments: Accepted at NeurIPS 2022 24 pages

  12. arXiv:2201.13409  [pdf, other

    stat.ML cs.LG math.OC

    A framework for bilevel optimization that enables stochastic and global variance reduction algorithms

    Authors: Mathieu Dagréou, Pierre Ablin, Samuel Vaiter, Thomas Moreau

    Abstract: Bilevel optimization, the problem of minimizing a value function which involves the arg-minimum of another function, appears in many areas of machine learning. In a large scale empirical risk minimization setting where the number of samples is huge, it is crucial to develop stochastic methods, which only use a few samples at a time to progress. However, computing the gradient of the value function… ▽ More

    Submitted 10 November, 2022; v1 submitted 31 January, 2022; originally announced January 2022.

    Comments: Accepted at NeurIPS 2022

  13. arXiv:2110.11773  [pdf, other

    cs.LG stat.ML

    Sinkformers: Transformers with Doubly Stochastic Attention

    Authors: Michael E. Sander, Pierre Ablin, Mathieu Blondel, Gabriel Peyré

    Abstract: Attention based models such as Transformers involve pairwise interactions between data points, modeled with a learnable attention matrix. Importantly, this attention matrix is normalized with the SoftMax operator, which makes it row-wise stochastic. In this paper, we propose instead to use Sinkhorn's algorithm to make attention matrices doubly stochastic. We call the resulting model a Sinkformer.… ▽ More

    Submitted 24 January, 2022; v1 submitted 22 October, 2021; originally announced October 2021.

    Comments: Accepted at AISTATS

  14. arXiv:2105.09994  [pdf, other

    stat.ML cs.LG

    Kernel Stein Discrepancy Descent

    Authors: Anna Korba, Pierre-Cyril Aubin-Frankowski, Szymon Majewski, Pierre Ablin

    Abstract: Among dissimilarities between probability distributions, the Kernel Stein Discrepancy (KSD) has received much interest recently. We investigate the properties of its Wasserstein gradient flow to approximate a target probability distribution $π$ on $\mathbb{R}^d$, known up to a normalization constant. This leads to a straightforwardly implementable, deterministic score-based method to sample from… ▽ More

    Submitted 20 May, 2021; originally announced May 2021.

  15. arXiv:2102.10964  [pdf, other

    stat.ML cs.LG

    Adaptive Multi-View ICA: Estimation of noise levels for optimal inference

    Authors: Hugo Richard, Pierre Ablin, Aapo Hyvärinen, Alexandre Gramfort, Bertrand Thirion

    Abstract: We consider a multi-view learning problem known as group independent component analysis (group ICA), where the goal is to recover shared independent sources from many views. The statistical modeling of this problem requires to take noise into account. When the model includes additive noise on the observations, the likelihood is intractable. By contrast, we propose Adaptive multiView ICA (AVICA), a… ▽ More

    Submitted 22 February, 2021; originally announced February 2021.

  16. arXiv:2102.07870  [pdf, other

    cs.LG cs.AI stat.ML

    Momentum Residual Neural Networks

    Authors: Michael E. Sander, Pierre Ablin, Mathieu Blondel, Gabriel Peyré

    Abstract: The training of deep residual neural networks (ResNets) with backpropagation has a memory cost that increases linearly with respect to the depth of the network. A way to circumvent this issue is to use reversible architectures. In this paper, we propose to change the forward rule of a ResNet by adding a momentum term. The resulting networks, momentum residual neural networks (Momentum ResNets), ar… ▽ More

    Submitted 22 July, 2021; v1 submitted 15 February, 2021; originally announced February 2021.

    Comments: 24 pages

  17. arXiv:2102.07432  [pdf, other

    stat.ML cs.LG stat.CO

    Fast and accurate optimization on the orthogonal manifold without retraction

    Authors: Pierre Ablin, Gabriel Peyré

    Abstract: We consider the problem of minimizing a function over the manifold of orthogonal matrices. The majority of algorithms for this problem compute a direction in the tangent space, and then use a retraction to move in that direction while staying on the manifold. Unfortunately, the numerical computation of retractions on the orthogonal manifold always involves some expensive linear algebra operation,… ▽ More

    Submitted 31 January, 2022; v1 submitted 15 February, 2021; originally announced February 2021.

  18. arXiv:2011.13831  [pdf, ps, other

    stat.ML cs.LG

    Deep orthogonal linear networks are shallow

    Authors: Pierre Ablin

    Abstract: We consider the problem of training a deep orthogonal linear network, which consists of a product of orthogonal matrices, with no non-linearity in-between. We show that training the weights with Riemannian gradient descent is equivalent to training the whole factorization by gradient descent. This means that there is no effect of overparametrization and implicit bias at all in this setting: traini… ▽ More

    Submitted 27 November, 2020; originally announced November 2020.

  19. arXiv:2008.09693  [pdf, other

    eess.SP stat.AP

    Spectral independent component analysis with noise modeling for M/EEG source separation

    Authors: Pierre Ablin, Jean-François Cardoso, Alexandre Gramfort

    Abstract: Background: Independent Component Analysis (ICA) is a widespread tool for exploration and denoising of electroencephalography (EEG) or magnetoencephalography (MEG) signals. In its most common formulation, ICA assumes that the signal matrix is a noiseless linear mixture of independent sources that are assumed non-Gaussian. A limitation is that it enforces to estimate as many sources as sensors or t… ▽ More

    Submitted 21 August, 2020; originally announced August 2020.

  20. arXiv:2006.06635  [pdf, other

    stat.ML cs.LG

    Modeling Shared Responses in Neuroimaging Studies through MultiView ICA

    Authors: Hugo Richard, Luigi Gresele, Aapo Hyvärinen, Bertrand Thirion, Alexandre Gramfort, Pierre Ablin

    Abstract: Group studies involving large cohorts of subjects are important to draw general conclusions about brain functional organization. However, the aggregation of data coming from multiple subjects is challenging, since it requires accounting for large variability in anatomy, functional topography and stimulus response across individuals. Data modeling is especially hard for ecologically relevant condit… ▽ More

    Submitted 24 December, 2020; v1 submitted 11 June, 2020; originally announced June 2020.

    Comments: Accepted to NeurIPS 2020

  21. arXiv:2005.11890  [pdf, other

    stat.ML cs.LG stat.CO

    mvlearn: Multiview Machine Learning in Python

    Authors: Ronan Perry, Gavin Mischler, Richard Guo, Theodore Lee, Alexander Chang, Arman Koul, Cameron Franz, Hugo Richard, Iain Carmichael, Pierre Ablin, Alexandre Gramfort, Joshua T. Vogelstein

    Abstract: As data are generated more and more from multiple disparate sources, multiview data sets, where each sample has features in distinct views, have ballooned in recent years. However, no comprehensive package exists that enables non-specialists to use these methods easily. mvlearn is a Python library which implements the leading multiview machine learning methods. Its simple API closely follows that… ▽ More

    Submitted 25 May, 2021; v1 submitted 24 May, 2020; originally announced May 2020.

    Comments: 6 pages, 2 figures, 1 table

  22. arXiv:2002.03722  [pdf, other

    stat.ML cs.LG

    Super-efficiency of automatic differentiation for functions defined as a minimum

    Authors: Pierre Ablin, Gabriel Peyré, Thomas Moreau

    Abstract: In min-min optimization or max-min optimization, one has to compute the gradient of a function defined as a minimum. In most cases, the minimum has no closed-form, and an approximation is obtained via an iterative algorithm. There are two usual ways of estimating the gradient of the function: using either an analytic formula obtained by assuming exactness of the approximation, or automatic differe… ▽ More

    Submitted 10 February, 2020; originally announced February 2020.

    Comments: 31 pages

  23. arXiv:1906.02687  [pdf, other

    eess.SP cs.LG stat.ML

    Manifold-regression to predict from MEG/EEG brain signals without source modeling

    Authors: David Sabbagh, Pierre Ablin, Gael Varoquaux, Alexandre Gramfort, Denis A. Engemann

    Abstract: Magnetoencephalography and electroencephalography (M/EEG) can reveal neuronal dynamics non-invasively in real-time and are therefore appreciated methods in medicine and neuroscience. Recent advances in modeling brain-behavior relationships have highlighted the effectiveness of Riemannian geometry for summarizing the spatially correlated time-series from M/EEG in terms of their covariance. However,… ▽ More

    Submitted 22 November, 2019; v1 submitted 4 June, 2019; originally announced June 2019.

  24. arXiv:1905.11071  [pdf, other

    stat.ML cs.LG

    Learning step sizes for unfolded sparse coding

    Authors: Pierre Ablin, Thomas Moreau, Mathurin Massias, Alexandre Gramfort

    Abstract: Sparse coding is typically solved by iterative optimization techniques, such as the Iterative Shrinkage-Thresholding Algorithm (ISTA). Unfolding and learning weights of ISTA using neural networks is a practical way to accelerate estimation. In this paper, we study the selection of adapted step sizes for ISTA. We show that a simple step size strategy can improve the convergence rate of ISTA by leve… ▽ More

    Submitted 27 May, 2019; originally announced May 2019.

    Comments: 22 pages

  25. arXiv:1811.11433  [pdf, other

    math.NA cs.LG stat.ML

    Beyond Pham's algorithm for joint diagonalization

    Authors: Pierre Ablin, Jean-François Cardoso, Alexandre Gramfort

    Abstract: The approximate joint diagonalization of a set of matrices consists in finding a basis in which these matrices are as diagonal as possible. This problem naturally appears in several statistical learning tasks such as blind signal separation. We consider the diagonalization criterion studied in a seminal paper by Pham (2001), and propose a new quasi-Newton method for its optimization. Through numer… ▽ More

    Submitted 28 November, 2018; originally announced November 2018.

  26. arXiv:1811.02225  [pdf, other

    stat.ML cs.LG

    A Quasi-Newton algorithm on the orthogonal manifold for NMF with transform learning

    Authors: Pierre Ablin, Dylan Fagot, Herwig Wendt, Alexandre Gramfort, Cédric Févotte

    Abstract: Nonnegative matrix factorization (NMF) is a popular method for audio spectral unmixing. While NMF is traditionally applied to off-the-shelf time-frequency representations based on the short-time Fourier or Cosine transforms, the ability to learn transforms from raw data attracts increasing attention. However, this adds an important computational overhead. When assumed orthogonal (like the Fourier… ▽ More

    Submitted 6 November, 2018; originally announced November 2018.

  27. arXiv:1806.09390  [pdf, other

    stat.ML cs.LG

    Accelerating likelihood optimization for ICA on real signals

    Authors: Pierre Ablin, Jean-François Cardoso, Alexandre Gramfort

    Abstract: We study optimization methods for solving the maximum likelihood formulation of independent component analysis (ICA). We consider both the the problem constrained to white signals and the unconstrained problem. The Hessian of the objective function is costly to compute, which renders Newton's method impractical for large data sets. Many algorithms proposed in the literature can be rewritten as qua… ▽ More

    Submitted 25 June, 2018; originally announced June 2018.

    Journal ref: LVA-ICA 2018, Jul 2018, Guildford, United Kingdom

  28. arXiv:1805.10054  [pdf, other

    stat.ML cs.LG stat.AP

    Stochastic algorithms with descent guarantees for ICA

    Authors: Pierre Ablin, Alexandre Gramfort, Jean-François Cardoso, Francis Bach

    Abstract: Independent component analysis (ICA) is a widespread data exploration technique, where observed signals are modeled as linear mixtures of independent components. From a machine learning point of view, it amounts to a matrix factorization problem with a statistical independence criterion. Infomax is one of the most used ICA algorithms. It is based on a loss function which is a non-convex log-likeli… ▽ More

    Submitted 27 May, 2019; v1 submitted 25 May, 2018; originally announced May 2018.

  29. arXiv:1711.10873  [pdf, other

    stat.ML

    Faster ICA under orthogonal constraint

    Authors: Pierre Ablin, Jean-François Cardoso, Alexandre Gramfort

    Abstract: Independent Component Analysis (ICA) is a technique for unsupervised exploration of multi-channel data widely used in observational sciences. In its classical form, ICA relies on modeling the data as a linear mixture of non-Gaussian independent sources. The problem can be seen as a likelihood maximization problem. We introduce Picard-O, a preconditioned L-BFGS strategy over the set of orthogonal m… ▽ More

    Submitted 29 November, 2017; originally announced November 2017.

    Comments: 11 pages, 1 figure

  30. Faster independent component analysis by preconditioning with Hessian approximations

    Authors: Pierre Ablin, Jean-François Cardoso, Alexandre Gramfort

    Abstract: Independent Component Analysis (ICA) is a technique for unsupervised exploration of multi-channel data that is widely used in observational sciences. In its classic form, ICA relies on modeling the data as linear mixtures of non-Gaussian independent sources. The maximization of the corresponding likelihood is a challenging problem if it has to be completed quickly and accurately on large sets of r… ▽ More

    Submitted 8 September, 2017; v1 submitted 25 June, 2017; originally announced June 2017.

    Comments: 23 pages, 3 figures