Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 70 results for author: Harchaoui, Z

.
  1. arXiv:2406.16838  [pdf, other

    cs.CL cs.LG

    From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models

    Authors: Sean Welleck, Amanda Bertsch, Matthew Finlayson, Hailey Schoelkopf, Alex Xie, Graham Neubig, Ilia Kulikov, Zaid Harchaoui

    Abstract: One of the most striking findings in modern research on large language models (LLMs) is that scaling up compute during training leads to better results. However, less attention has been given to the benefits of scaling compute during inference. This survey focuses on these inference-time approaches. We explore three areas under a unified mathematical formalism: token-level generation algorithms, m… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  2. arXiv:2406.10823  [pdf, other

    math.PR stat.ML

    Iterated Schrödinger bridge approximation to Wasserstein Gradient Flows

    Authors: Medha Agarwal, Zaid Harchaoui, Garrett Mulcahy, Soumik Pal

    Abstract: We introduce a novel discretization scheme for Wasserstein gradient flows that involves successively computing Schrödinger bridges with the same marginals. This is different from both the forward/geodesic approximation and the backward/Jordan-Kinderlehrer-Otto (JKO) approximations. The proposed scheme has two advantages: one, it avoids the use of the score function, and, two, it is amenable to par… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: 36 pages, 1 figure

    MSC Class: 49N99; 49Q22; 60J60

  3. arXiv:2403.10763  [pdf, other

    stat.ML cs.LG math.OC

    A Primal-Dual Algorithm for Faster Distributionally Robust Optimization

    Authors: Ronak Mehta, Jelena Diakonikolas, Zaid Harchaoui

    Abstract: We consider the penalized distributionally robust optimization (DRO) problem with a closed, convex uncertainty set, a setting that encompasses the $f$-DRO, Wasserstein-DRO, and spectral/$L$-risk formulations used in practice. We present Drago, a stochastic primal-dual algorithm that achieves a state-of-the-art linear convergence rate on strongly convex-strongly concave DRO problems. The method com… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  4. arXiv:2402.08761  [pdf, other

    cs.CL cs.AI

    JAMDEC: Unsupervised Authorship Obfuscation using Constrained Decoding over Small Language Models

    Authors: Jillian Fisher, Ximing Lu, Jaehun Jung, Liwei Jiang, Zaid Harchaoui, Yejin Choi

    Abstract: The permanence of online content combined with the enhanced authorship identification techniques calls for stronger computational methods to protect the identity and privacy of online authorship when needed, e.g., blind reviews for scientific papers, anonymous online reviews, or anonymous interactions in the mental health forums. In this paper, we propose an unsupervised inference-time approach to… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

    Comments: Code is available at https://github.com/jfisher52/JAMDecoding

  5. arXiv:2310.13863  [pdf, other

    stat.ML cs.LG math.OC

    Distributionally Robust Optimization with Bias and Variance Reduction

    Authors: Ronak Mehta, Vincent Roulet, Krishna Pillutla, Zaid Harchaoui

    Abstract: We consider the distributionally robust optimization (DRO) problem with spectral risk-based uncertainty set and $f$-divergence penalty. This formulation includes common risk-sensitive learning objectives such as regularized condition value-at-risk (CVaR) and average top-$k$ loss. We present Prospect, a stochastic gradient-based algorithm that only requires tuning a single learning rate hyperparame… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

  6. arXiv:2310.09930  [pdf, other

    cs.CL

    FiLM: Fill-in Language Models for Any-Order Generation

    Authors: Tianxiao Shen, Hao Peng, Ruoqi Shen, Yao Fu, Zaid Harchaoui, Yejin Choi

    Abstract: Language models have become the backbone of today's AI systems. However, their predominant left-to-right generation limits the use of bidirectional context, which is essential for tasks that involve filling text in the middle. We propose the Fill-in Language Model (FiLM), a new language modeling approach that allows for flexible generation at any position without adhering to a specific generation… ▽ More

    Submitted 15 October, 2023; originally announced October 2023.

  7. arXiv:2305.18654  [pdf, other

    cs.CL cs.AI cs.LG

    Faith and Fate: Limits of Transformers on Compositionality

    Authors: Nouha Dziri, Ximing Lu, Melanie Sclar, Xiang Lorraine Li, Liwei Jiang, Bill Yuchen Lin, Peter West, Chandra Bhagavatula, Ronan Le Bras, Jena D. Hwang, Soumya Sanyal, Sean Welleck, Xiang Ren, Allyson Ettinger, Zaid Harchaoui, Yejin Choi

    Abstract: Transformer large language models (LLMs) have sparked admiration for their exceptional performance on tasks that demand intricate multi-step reasoning. Yet, these models simultaneously show failures on surprisingly trivial problems. This begs the question: Are these errors incidental, or do they signal more substantial limitations? In an attempt to demystify transformer LLMs, we investigate the li… ▽ More

    Submitted 31 October, 2023; v1 submitted 29 May, 2023; originally announced May 2023.

    Comments: 10 pages + appendix (40 pages)

  8. arXiv:2305.10634  [pdf, other

    math.OC cs.LG

    Modified Gauss-Newton Algorithms under Noise

    Authors: Krishna Pillutla, Vincent Roulet, Sham Kakade, Zaid Harchaoui

    Abstract: Gauss-Newton methods and their stochastic version have been widely used in machine learning and signal processing. Their nonsmooth counterparts, modified Gauss-Newton or prox-linear algorithms, can lead to contrasting outcomes when compared to gradient descent in large-scale statistical settings. We explore the contrasting performance of these two classes of algorithms in theory on a stylized stat… ▽ More

    Submitted 17 May, 2023; originally announced May 2023.

    Comments: IEEE SSP 2023

  9. arXiv:2301.00260  [pdf, other

    math.ST stat.ML

    Confidence Sets under Generalized Self-Concordance

    Authors: Lang Liu, Zaid Harchaoui

    Abstract: This paper revisits a fundamental problem in statistical inference from a non-asymptotic theoretical viewpoint $\unicode{x2013}$ the construction of confidence sets. We establish a finite-sample bound for the estimator, characterizing its asymptotic behavior in a non-asymptotic fashion. An important feature of our bound is that its dimension dependency is captured by the effective dimension… ▽ More

    Submitted 31 December, 2022; originally announced January 2023.

  10. arXiv:2212.14578  [pdf, other

    cs.LG cs.AI cs.CL

    MAUVE Scores for Generative Models: Theory and Practice

    Authors: Krishna Pillutla, Lang Liu, John Thickstun, Sean Welleck, Swabha Swayamdipta, Rowan Zellers, Sewoong Oh, Yejin Choi, Zaid Harchaoui

    Abstract: Generative artificial intelligence has made significant strides, producing text indistinguishable from human prose and remarkably photorealistic images. Automatically measuring how close the generated data distribution is to the target distribution is central to diagnosing existing models and developing better ones. We present MAUVE, a family of comparison measures between pairs of distributions s… ▽ More

    Submitted 7 December, 2023; v1 submitted 30 December, 2022; originally announced December 2022.

    Comments: Published in Journal of Machine Learning Research

  11. arXiv:2212.05149  [pdf, other

    stat.ML cs.LG math.OC

    Stochastic Optimization for Spectral Risk Measures

    Authors: Ronak Mehta, Vincent Roulet, Krishna Pillutla, Lang Liu, Zaid Harchaoui

    Abstract: Spectral risk objectives - also called $L$-risks - allow for learning systems to interpolate between optimizing average-case performance (as in empirical risk minimization) and worst-case performance on a task. We develop stochastic algorithms to optimize these quantities by characterizing their subdifferential and addressing challenges such as biasedness of subgradient estimates and non-smoothnes… ▽ More

    Submitted 9 December, 2022; originally announced December 2022.

  12. arXiv:2212.04014  [pdf, other

    stat.ML cs.LG math.ST

    Statistical and Computational Guarantees for Influence Diagnostics

    Authors: Jillian Fisher, Lang Liu, Krishna Pillutla, Yejin Choi, Zaid Harchaoui

    Abstract: Influence diagnostics such as influence functions and approximate maximum influence perturbations are popular in machine learning and in AI domain applications. Influence diagnostics are powerful statistical tools to identify influential datapoints or subsets of datapoints. We establish finite-sample statistical bounds, as well as computational complexity bounds, for influence functions and approx… ▽ More

    Submitted 19 September, 2023; v1 submitted 7 December, 2022; originally announced December 2022.

    Comments: For AISTATS 2023. Software see https://github.com/jfisher52/influence_theory

  13. arXiv:2210.00422  [pdf, ps, other

    math.PR cs.LG stat.ML

    Stochastic optimization on matrices and a graphon McKean-Vlasov limit

    Authors: Zaid Harchaoui, Sewoong Oh, Soumik Pal, Raghav Somani, Raghavendra Tripathi

    Abstract: We consider stochastic gradient descents on the space of large symmetric matrices of suitable functions that are invariant under permuting the rows and columns using the same permutation. We establish deterministic limits of these random curves as the dimensions of the matrices go to infinity while the entries remain bounded. Under a ``small noise'' assumption the limit is shown to be the gradient… ▽ More

    Submitted 27 May, 2024; v1 submitted 2 October, 2022; originally announced October 2022.

    Comments: 37 pages+ references, introduction modified and new examples added. Improved presentation

    MSC Class: 05C60; 05C63; 05C80; 68R10; 60K35; 60G09

  14. arXiv:2207.06362  [pdf, other

    math.OC cs.LG eess.SY

    Iterative Linear Quadratic Optimization for Nonlinear Control: Differentiable Programming Algorithmic Templates

    Authors: Vincent Roulet, Siddhartha Srinivasa, Maryam Fazel, Zaid Harchaoui

    Abstract: We present the implementation of nonlinear control algorithms based on linear and quadratic approximations of the objective from a functional viewpoint. We present a gradient descent, a Gauss-Newton method, a Newton method, differential dynamic programming approaches with linear quadratic or quadratic approximations, various line-search strategies, and regularized variants of these algorithms. We… ▽ More

    Submitted 13 July, 2022; originally announced July 2022.

    Comments: This is a companion report to the arXiv report "Complexity Bounds of Iterative Linear Quadratic Optimization Algorithms for Discrete Time Nonlinear Control" <arXiv:2204.02322> by the same authors

    MSC Class: 68Q25; 49M37 ACM Class: G.1.6

  15. arXiv:2205.00350  [pdf, other

    stat.ML cs.IT cs.LG

    Orthogonal Statistical Learning with Self-Concordant Loss

    Authors: Lang Liu, Carlos Cinelli, Zaid Harchaoui

    Abstract: Orthogonal statistical learning and double machine learning have emerged as general frameworks for two-stage statistical prediction in the presence of a nuisance component. We establish non-asymptotic bounds on the excess risk of orthogonal statistical learning methods with a loss function satisfying a self-concordance property. Our bounds improve upon existing bounds by a dimension factor while l… ▽ More

    Submitted 19 June, 2022; v1 submitted 30 April, 2022; originally announced May 2022.

    Comments: COLT 2022

  16. arXiv:2204.02322  [pdf, ps, other

    math.OC

    Complexity Bounds of Iterative Linear Quadratic Optimization Algorithms for Discrete Time Nonlinear Control

    Authors: Vincent Roulet, Siddhartha Srinivasa, Maryam Fazel, Zaid Harchaoui

    Abstract: A classical approach for solving discrete time nonlinear control on a finite horizon consists in repeatedly minimizing linear quadratic approximations of the original problem around current candidate solutions. While widely popular in many domains, such an approach has mainly been analyzed locally. We observe that global convergence guarantees can be ensured provided that the linearized discrete t… ▽ More

    Submitted 5 April, 2022; originally announced April 2022.

    MSC Class: 68Q25; 49M37 ACM Class: G.1.6

  17. arXiv:2203.03756  [pdf, other

    cs.LG math.OC stat.ML

    Flat minima generalize for low-rank matrix recovery

    Authors: Lijun Ding, Dmitriy Drusvyatskiy, Maryam Fazel, Zaid Harchaoui

    Abstract: Empirical evidence suggests that for a variety of overparameterized nonlinear models, most notably in neural network training, the growth of the loss around a minimizer strongly impacts its performance. Flat minima -- those around which the loss grows slowly -- appear to generalize well. This work takes a step towards understanding this phenomenon by focusing on the simplest class of overparameter… ▽ More

    Submitted 17 February, 2023; v1 submitted 7 March, 2022; originally announced March 2022.

    Comments: 36 pages

  18. arXiv:2201.00508  [pdf, other

    math.OC

    Superquantiles at Work: Machine Learning Applications and Efficient Subgradient Computation

    Authors: Yassine Laguel, Krishna Pillutla, Jérôme Malick, Zaid Harchaoui

    Abstract: R. Tyrell Rockafellar and collaborators introduced, in a series of works, new regression modeling methods based on the notion of superquantile (or conditional value-at-risk). These methods have been influential in economics, finance, management science, and operations research in general. Recently, they have been the subject of a renewed interest in machine learning, to address issues of distribut… ▽ More

    Submitted 3 January, 2022; originally announced January 2022.

  19. arXiv:2201.00505  [pdf, other

    math.OC

    Superquantile-based learning: a direct approach using gradient-based optimization

    Authors: Yassine Laguel, Jérôme Malick, Zaid Harchaoui

    Abstract: We consider a formulation of supervised learning that endows models with robustness to distributional shifts from training to testing. The formulation hinges upon the superquantile risk measure, also known as the conditional value-at-risk, which has shown promise in recent applications of machine learning and signal processing. We show that, thanks to a direct smoothing of the superquantile functi… ▽ More

    Submitted 3 January, 2022; originally announced January 2022.

  20. arXiv:2112.15595  [pdf, other

    stat.ML cs.LG math.PR

    Triangular Flows for Generative Modeling: Statistical Consistency, Smoothness Classes, and Fast Rates

    Authors: Nicholas J. Irons, Meyer Scetbon, Soumik Pal, Zaid Harchaoui

    Abstract: Triangular flows, also known as Knöthe-Rosenblatt measure couplings, comprise an important building block of normalizing flow models for generative modeling and density estimation, including popular autoregressive flow models such as real-valued non-volume preserving transformation models (Real NVP). We present statistical guarantees and sample complexity bounds for triangular flow statistical mod… ▽ More

    Submitted 31 December, 2021; originally announced December 2021.

  21. arXiv:2112.15265  [pdf, other

    stat.ML cs.LG

    Entropy Regularized Optimal Transport Independence Criterion

    Authors: Lang Liu, Soumik Pal, Zaid Harchaoui

    Abstract: We introduce an independence criterion based on entropy regularized optimal transport. Our criterion can be used to test for independence between two samples. We establish non-asymptotic bounds for our test statistic and study its statistical behavior under both the null hypothesis and the alternative hypothesis. The theoretical results involve tools from U-process theory and optimal transport the… ▽ More

    Submitted 19 April, 2022; v1 submitted 30 December, 2021; originally announced December 2021.

  22. arXiv:2112.09429  [pdf, other

    cs.LG math.OC stat.ML

    Federated Learning with Superquantile Aggregation for Heterogeneous Data

    Authors: Krishna Pillutla, Yassine Laguel, Jérôme Malick, Zaid Harchaoui

    Abstract: We present a federated learning framework that is designed to robustly deliver good predictive performance across individual clients with heterogeneous data. The proposed approach hinges upon a superquantile-based learning objective that captures the tail statistics of the error distribution over heterogeneous clients. We present a stochastic training algorithm that interleaves differentially priv… ▽ More

    Submitted 6 December, 2022; v1 submitted 17 December, 2021; originally announced December 2021.

    Comments: Machine Learning Journal, Special Issue on Safe and Fair Machine Learning (To appear)

    Journal ref: Machine Learning (2023): 1-68

  23. arXiv:2112.01453  [pdf, other

    cs.LG

    Target Propagation via Regularized Inversion

    Authors: Vincent Roulet, Zaid Harchaoui

    Abstract: Target Propagation (TP) algorithms compute targets instead of gradients along neural networks and propagate them backward in a way that is similar yet different than gradient back-propagation (BP). The idea was first presented as a perturbative alternative to back-propagation that may achieve greater accuracy in gradient evaluation when training multi-layer neural networks (LeCun et al., 1989). Ho… ▽ More

    Submitted 2 December, 2021; originally announced December 2021.

  24. arXiv:2108.07356  [pdf, other

    math.OC cs.LG

    Stochastic Optimization under Distributional Drift

    Authors: Joshua Cutler, Dmitriy Drusvyatskiy, Zaid Harchaoui

    Abstract: We consider the problem of minimizing a convex function that is evolving according to unknown and possibly stochastic dynamics, which may depend jointly on time and on the decision variable itself. Such problems abound in the machine learning and signal processing literature, under the names of concept drift, stochastic tracking, and performative prediction. We provide novel non-asymptotic converg… ▽ More

    Submitted 26 May, 2023; v1 submitted 16 August, 2021; originally announced August 2021.

    Comments: 56 pages, 7 figures. v2: unified analysis of time- and decision-dependent settings; updated numerical experiments. v3: added references and updated exposition. v4: minor updates to match the version published in JMLR

    MSC Class: 90C15; 90C25

    Journal ref: Journal of Machine Learning Research, 24(147):1-56, 2023

  25. arXiv:2106.14122  [pdf, other

    stat.ML cs.LG

    Score-Based Change Detection for Gradient-Based Learning Machines

    Authors: Lang Liu, Joseph Salmon, Zaid Harchaoui

    Abstract: The widespread use of machine learning algorithms calls for automatic change detection algorithms to monitor their behavior over time. As a machine learning algorithm learns from a continuous, possibly evolving, stream of data, it is desirable and often critical to supplement it with a companion change detection algorithm to facilitate its monitoring and control. We present a generic score-based c… ▽ More

    Submitted 26 June, 2021; originally announced June 2021.

  26. arXiv:2106.07898  [pdf, other

    stat.ML cs.LG

    Divergence Frontiers for Generative Models: Sample Complexity, Quantization Effects, and Frontier Integrals

    Authors: Lang Liu, Krishna Pillutla, Sean Welleck, Sewoong Oh, Yejin Choi, Zaid Harchaoui

    Abstract: The spectacular success of deep generative models calls for quantitative tools to measure their statistical performance. Divergence frontiers have recently been proposed as an evaluation framework for generative models, due to their ability to measure the quality-diversity trade-off inherent to deep generative modeling. We establish non-asymptotic bounds on the sample complexity of divergence fron… ▽ More

    Submitted 11 December, 2021; v1 submitted 15 June, 2021; originally announced June 2021.

  27. arXiv:2102.01454  [pdf, other

    cs.CL

    MAUVE: Measuring the Gap Between Neural Text and Human Text using Divergence Frontiers

    Authors: Krishna Pillutla, Swabha Swayamdipta, Rowan Zellers, John Thickstun, Sean Welleck, Yejin Choi, Zaid Harchaoui

    Abstract: As major progress is made in open-ended text generation, measuring how close machine-generated text is to human language remains a critical open problem. We introduce MAUVE, a comparison measure for open-ended text generation, which directly compares the learnt distribution from a text generation model to the distribution of human-written text using divergence frontiers. MAUVE scales up to modern… ▽ More

    Submitted 23 November, 2021; v1 submitted 2 February, 2021; originally announced February 2021.

    Comments: NeurIPS 2021 (Oral Presentation). Package: https://github.com/krishnap25/mauve

  28. arXiv:2012.15458  [pdf, other

    math.OC cs.LG stat.ML

    Differentiable Programming à la Moreau

    Authors: Vincent Roulet, Zaid Harchaoui

    Abstract: The notion of a Moreau envelope is central to the analysis of first-order optimization algorithms for machine learning. Yet, it has not been developed and extended to be applied to a deep network and, more broadly, to a machine learning system with a differentiable programming implementation. We define a compositional calculus adapted to Moreau envelopes and show how to integrate it within differe… ▽ More

    Submitted 11 December, 2022; v1 submitted 31 December, 2020; originally announced December 2020.

    Comments: Short version appeared in ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

  29. arXiv:2012.06684  [pdf, other

    cs.LG stat.ML

    Faster Policy Learning with Continuous-Time Gradients

    Authors: Samuel Ainsworth, Kendall Lowrey, John Thickstun, Zaid Harchaoui, Siddhartha Srinivasa

    Abstract: We study the estimation of policy gradients for continuous-time systems with known dynamics. By reframing policy learning in continuous-time, we show that it is possible construct a more efficient and accurate gradient estimator. The standard back-propagation through time estimator (BPTT) computes exact gradients for a crude discretization of the continuous-time system. In contrast, we approximate… ▽ More

    Submitted 24 June, 2021; v1 submitted 11 December, 2020; originally announced December 2020.

    Journal ref: L4DC 2021

  30. arXiv:2011.08963  [pdf, ps, other

    math.PR stat.ML

    Asymptotics of Discrete Schrödinger Bridges via Chaos Decomposition

    Authors: Zaid Harchaoui, Lang Liu, Soumik Pal

    Abstract: Consider the problem of matching two independent i.i.d. samples of size $N$ from two distributions $P$ and $Q$ in $\mathbb{R}^d$. For an arbitrary continuous cost function, the optimal assignment problem looks for the matching that minimizes the total cost. We consider instead in this paper the problem where each matching is endowed with a Gibbs probability weight proportional to the exponential o… ▽ More

    Submitted 31 December, 2022; v1 submitted 17 November, 2020; originally announced November 2020.

    MSC Class: 46N10; 60J35; 60F17; 62G20

  31. arXiv:2009.14575  [pdf, other

    math.OC cs.LG stat.ML

    First-order Optimization for Superquantile-based Supervised Learning

    Authors: Yassine Laguel, Jérôme Malick, Zaid Harchaoui

    Abstract: Classical supervised learning via empirical risk (or negative log-likelihood) minimization hinges upon the assumption that the testing distribution coincides with the training distribution. This assumption can be challenged in modern applications of machine learning in which learning machines may operate at prediction time with testing data whose distribution departs from the one of the training d… ▽ More

    Submitted 1 October, 2020; v1 submitted 30 September, 2020; originally announced September 2020.

    Comments: 6 pages, 2 figures, 2 tables, presented at IEEE MLSP

  32. arXiv:2003.12756  [pdf, other

    stat.ML cs.LG

    Harmonic Decompositions of Convolutional Networks

    Authors: Meyer Scetbon, Zaid Harchaoui

    Abstract: We present a description of the function space and the smoothness class associated with a convolutional network using the machinery of reproducing kernel Hilbert spaces. We show that the mapping associated with a convolutional network expands into a sum involving elementary functions akin to spherical harmonics. This functional decomposition can be related to the functional ANOVA decomposition in… ▽ More

    Submitted 16 November, 2020; v1 submitted 28 March, 2020; originally announced March 2020.

  33. arXiv:2002.12640  [pdf, other

    stat.ML cs.LG

    A Spectral Analysis of Dot-product Kernels

    Authors: Meyer Scetbon, Zaid Harchaoui

    Abstract: We present eigenvalue decay estimates of integral operators associated with compositional dot-product kernels. The estimates improve on previous ones established for power series kernels on spheres. This allows us to obtain the volumes of balls in the corresponding reproducing kernel Hilbert spaces. We discuss the consequences on statistical estimation with compositional dot product kernels and hi… ▽ More

    Submitted 26 February, 2021; v1 submitted 28 February, 2020; originally announced February 2020.

  34. arXiv:2002.11223  [pdf, other

    stat.ML cs.DC cs.LG math.OC

    Device Heterogeneity in Federated Learning: A Superquantile Approach

    Authors: Yassine Laguel, Krishna Pillutla, Jérôme Malick, Zaid Harchaoui

    Abstract: We propose a federated learning framework to handle heterogeneous client devices which do not conform to the population data distribution. The approach hinges upon a parameterized superquantile-based objective, where the parameter ranges over levels of conformity. We present an optimization algorithm and establish its convergence to a stationary point. We show how to practically implement it using… ▽ More

    Submitted 25 February, 2020; originally announced February 2020.

    Journal ref: Machine Learning (2023): 1-68

  35. arXiv:2002.09051  [pdf, ps, other

    cs.LG stat.ML

    An Elementary Approach to Convergence Guarantees of Optimization Algorithms for Deep Networks

    Authors: Vincent Roulet, Zaid Harchaoui

    Abstract: We present an approach to obtain convergence guarantees of optimization algorithms for deep networks based on elementary arguments and computations. The convergence analysis revolves around the analytical and computational structures of optimization oracles central to the implementation of deep networks in machine learning software. We provide a systematic way to compute estimates of the smoothnes… ▽ More

    Submitted 29 December, 2020; v1 submitted 20 February, 2020; originally announced February 2020.

    Comments: The changes from v1 to v2 include i) slightly more general results; ii) slightly more concise proofs; iii) highway and residual networks; iv) implicitly defined network layers; v) additional algorithm boxes and illustration figures

  36. arXiv:1912.13445  [pdf, other

    stat.ML cs.CR cs.LG

    Robust Aggregation for Federated Learning

    Authors: Krishna Pillutla, Sham M. Kakade, Zaid Harchaoui

    Abstract: Federated learning is the centralized training of statistical models from decentralized data on mobile devices while preserving the privacy of each device. We present a robust aggregation approach to make federated learning robust to settings when a fraction of the devices may be sending corrupted updates to the server. The approach relies on a robust aggregation oracle based on the geometric medi… ▽ More

    Submitted 17 January, 2022; v1 submitted 31 December, 2019; originally announced December 2019.

    Journal ref: IEEE Transactions on Signal Processing 70 (2022): 1142-1154

  37. Discriminative Clustering with Representation Learning with any Ratio of Labeled to Unlabeled Data

    Authors: Corinne Jones, Vincent Roulet, Zaid Harchaoui

    Abstract: We present a discriminative clustering approach in which the feature representation can be learned from data and moreover leverage labeled data. Representation learning can give a similarity-based clustering method the ability to automatically adapt to an underlying, yet hidden, geometric structure of the data. The proposed approach augments the DIFFRAC method with a representation learning capabi… ▽ More

    Submitted 17 February, 2023; v1 submitted 30 December, 2019; originally announced December 2019.

    Comments: Published in Statistics and Computing, 2022

    Journal ref: Stat Comput 32, 17 (2022)

  38. arXiv:1912.04977  [pdf, other

    cs.LG cs.CR stat.ML

    Advances and Open Problems in Federated Learning

    Authors: Peter Kairouz, H. Brendan McMahan, Brendan Avent, Aurélien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Kallista Bonawitz, Zachary Charles, Graham Cormode, Rachel Cummings, Rafael G. L. D'Oliveira, Hubert Eichner, Salim El Rouayheb, David Evans, Josh Gardner, Zachary Garrett, Adrià Gascón, Badih Ghazi, Phillip B. Gibbons, Marco Gruteser, Zaid Harchaoui, Chaoyang He, Lie He, Zhouyuan Huo, Ben Hutchinson , et al. (34 additional authors not shown)

    Abstract: Federated learning (FL) is a machine learning setting where many clients (e.g. mobile devices or whole organizations) collaboratively train a model under the orchestration of a central server (e.g. service provider), while keeping the training data decentralized. FL embodies the principles of focused data collection and minimization, and can mitigate many of the systemic privacy risks and costs re… ▽ More

    Submitted 8 March, 2021; v1 submitted 10 December, 2019; originally announced December 2019.

    Comments: Published in Foundations and Trends in Machine Learning Vol 4 Issue 1. See: https://www.nowpublishers.com/article/Details/MAL-083

  39. arXiv:1910.08221  [pdf, other

    math.OC

    On the Convergence of the Iterative Linear Exponential Quadratic Gaussian Algorithm to Stationary Points

    Authors: Vincent Roulet, Maryam Fazel, Siddhartha Srinivasa, Zaid Harchaoui

    Abstract: A classical method for risk-sensitive nonlinear control is the iterative linear exponential quadratic Gaussian algorithm. We present its convergence analysis from a first-order optimization viewpoint. We identify the objective that the algorithm actually minimizes and we show how the addition of a proximal term guarantees convergence to a stationary point.

    Submitted 17 October, 2019; originally announced October 2019.

  40. arXiv:1908.07615  [pdf, other

    math.OC

    Iterative Linearized Control: Stable Algorithms and Complexity Guarantees

    Authors: Vincent Roulet, Siddhartha Srinivasa, Dmitriy Drusvyatskiy, Zaid Harchaoui

    Abstract: We examine popular gradient-based algorithms for nonlinear control in the light of the modern complexity analysis of first-order optimization algorithms. The examination reveals that the complexity bounds can be clearly stated in terms of calls to a computational oracle related to dynamic programming and implementable by gradient back-propagation using machine learning software libraries such as P… ▽ More

    Submitted 20 August, 2019; originally announced August 2019.

    Comments: Short version appeared in International Conference on Machine Learning (ICML) 2019

  41. arXiv:1904.03834  [pdf, other

    stat.ML cs.LG cs.SD eess.AS

    A Statistical Investigation of Long Memory in Language and Music

    Authors: Alexander Greaves-Tunnell, Zaid Harchaoui

    Abstract: Representation and learning of long-range dependencies is a central challenge confronted in modern applications of machine learning to sequence data. Yet despite the prominence of this issue, the basic problem of measuring long-range dependence, either in a given data source or as represented in a trained deep model, remains largely limited to heuristic tools. We contribute a statistical framework… ▽ More

    Submitted 6 June, 2019; v1 submitted 8 April, 2019; originally announced April 2019.

    Comments: 29 pages; expanded supplement, added details in background and methods per reviewer feedback, included additional references

  42. arXiv:1903.08131  [pdf, other

    stat.ML cs.LG math.OC

    Kernel-based Translations of Convolutional Networks

    Authors: Corinne Jones, Vincent Roulet, Zaid Harchaoui

    Abstract: Convolutional Neural Networks, as most artificial neural networks, are commonly viewed as methods different in essence from kernel-based methods. We provide a systematic translation of Convolutional Neural Networks (ConvNets) into their kernel-based counterparts, Convolutional Kernel Networks (CKNs), and demonstrate that this perception is unfounded both formally and empirically. We show that, giv… ▽ More

    Submitted 19 March, 2019; originally announced March 2019.

  43. arXiv:1902.03228  [pdf, other

    stat.ML cs.LG math.OC

    A Smoother Way to Train Structured Prediction Models

    Authors: Krishna Pillutla, Vincent Roulet, Sham M. Kakade, Zaid Harchaoui

    Abstract: We present a framework to train a structured prediction model by performing smoothing on the inference algorithm it builds upon. Smoothing overcomes the non-smoothness inherent to the maximum margin structured prediction objective, and paves the way for the use of fast primal gradient-based optimization algorithms. We illustrate the proposed framework by developing a novel primal incremental optim… ▽ More

    Submitted 8 February, 2019; originally announced February 2019.

    Comments: Short version appeared in Neural Information Processing Systems (NeurIPS) 2018

  44. arXiv:1812.02772  [pdf, other

    cs.CV

    Object Discovery in Videos as Foreground Motion Clustering

    Authors: Christopher Xie, Yu Xiang, Zaid Harchaoui, Dieter Fox

    Abstract: We consider the problem of providing dense segmentation masks for object discovery in videos. We formulate the object discovery problem as foreground motion clustering, where the goal is to cluster foreground pixels in videos into different objects. We introduce a novel pixel-trajectory recurrent neural network that learns feature embeddings of foreground pixel trajectories linked across time. By… ▽ More

    Submitted 4 April, 2019; v1 submitted 6 December, 2018; originally announced December 2018.

  45. arXiv:1811.08045  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Coupled Recurrent Models for Polyphonic Music Composition

    Authors: John Thickstun, Zaid Harchaoui, Dean P. Foster, Sham M. Kakade

    Abstract: This paper introduces a novel recurrent model for music composition that is tailored to the structure of polyphonic music. We propose an efficient new conditional probabilistic factorization of musical scores, viewing a score as a collection of concurrent, coupled sequences: i.e. voices. To model the conditional distributions, we borrow ideas from both convolutional and recurrent neural models; we… ▽ More

    Submitted 26 November, 2019; v1 submitted 19 November, 2018; originally announced November 2018.

    Comments: 13 pages; long version of the paper appearing in ISMIR 2019

  46. arXiv:1806.04028  [pdf, ps, other

    math.ST stat.ML

    Adaptive Denoising of Signals with Local Shift-Invariant Structure

    Authors: Zaid Harchaoui, Anatoli Juditsky, Arkadi Nemirovski, Dmitrii Ostrovskii

    Abstract: We discuss the problem of adaptive discrete-time signal denoising in the situation where the signal to be recovered admits a "linear oracle" -- an unknown linear estimate that takes the form of convolution of observations with a time-invariant filter. It was shown by Juditsky and Nemirovski (2009) that when the $\ell_2$-norm of the oracle filter is small enough, such oracle can be "mimicked" by an… ▽ More

    Submitted 11 February, 2021; v1 submitted 11 June, 2018; originally announced June 2018.

    Comments: 39 pages

  47. arXiv:1803.11262  [pdf, other

    math.ST math.OC stat.ML

    Efficient First-Order Algorithms for Adaptive Signal Denoising

    Authors: Dmitrii Ostrovskii, Zaid Harchaoui

    Abstract: We consider the problem of discrete-time signal denoising, focusing on a specific family of non-linear convolution-type estimators. Each such estimator is associated with a time-invariant filter which is obtained adaptively, by solving a certain convex optimization problem. Adaptive convolution-type estimators were demonstrated to have favorable statistical properties. However, the question of the… ▽ More

    Submitted 12 June, 2018; v1 submitted 29 March, 2018; originally announced March 2018.

    Comments: 27 pages, 5 figures

  48. arXiv:1712.05654  [pdf, other

    stat.ML math.OC

    Catalyst Acceleration for First-order Convex Optimization: from Theory to Practice

    Authors: Hongzhou Lin, Julien Mairal, Zaid Harchaoui

    Abstract: We introduce a generic scheme for accelerating gradient-based optimization methods in the sense of Nesterov. The approach, called Catalyst, builds upon the inexact accelerated proximal point algorithm for minimizing a convex objective function, and consists of approximately solving a sequence of well-chosen auxiliary problems, leading to faster convergence. One of the keys to achieve acceleration… ▽ More

    Submitted 19 June, 2018; v1 submitted 15 December, 2017; originally announced December 2017.

    Comments: link to publisher website: http://jmlr.org/papers/volume18/17-748/17-748.pdf

    Journal ref: Journal of Machine Learning Research (JMLR), 18(212):1--54, 2018

  49. arXiv:1711.04845  [pdf, other

    stat.ML cs.LG cs.SD eess.AS

    Invariances and Data Augmentation for Supervised Music Transcription

    Authors: John Thickstun, Zaid Harchaoui, Dean Foster, Sham M. Kakade

    Abstract: This paper explores a variety of models for frame-based music transcription, with an emphasis on the methods needed to reach state-of-the-art on human recordings. The translation-invariant network discussed in this paper, which combines a traditional filterbank with a convolutional neural network, was the top-performing model in the 2017 MIREX Multiple Fundamental Frequency Estimation evaluation.… ▽ More

    Submitted 13 November, 2017; originally announced November 2017.

    Comments: 6 pages

  50. arXiv:1703.10993  [pdf, other

    stat.ML math.OC

    Catalyst Acceleration for Gradient-Based Non-Convex Optimization

    Authors: Courtney Paquette, Hongzhou Lin, Dmitriy Drusvyatskiy, Julien Mairal, Zaid Harchaoui

    Abstract: We introduce a generic scheme to solve nonconvex optimization problems using gradient-based algorithms originally designed for minimizing convex functions. Even though these methods may originally require convexity to operate, the proposed approach allows one to use them on weakly convex objectives, which covers a large class of non-convex functions typically appearing in machine learning and sign… ▽ More

    Submitted 31 December, 2018; v1 submitted 31 March, 2017; originally announced March 2017.