Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Showing 1–50 of 82 results for author: Hsu, D

Searching in archive stat. Search in all archives.
.
  1. arXiv:2406.06893  [pdf, other

    stat.ML cs.IT cs.LG

    Transformers Provably Learn Sparse Token Selection While Fully-Connected Nets Cannot

    Authors: Zixuan Wang, Stanley Wei, Daniel Hsu, Jason D. Lee

    Abstract: The transformer architecture has prevailed in various deep learning settings due to its exceptional capabilities to select and compose structural information. Motivated by these capabilities, Sanford et al. proposed the sparse token selection task, in which transformers excel while fully-connected networks (FCNs) fail in the worst case. Building upon that, we strengthen the FCN lower bound to an a… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  2. arXiv:2406.05287  [pdf, ps, other

    cs.LG cs.GT stat.ML

    Group-wise oracle-efficient algorithms for online multi-group learning

    Authors: Samuel Deng, Daniel Hsu, Jingwen Liu

    Abstract: We study the problem of online multi-group learning, a learning model in which an online learner must simultaneously achieve small prediction regret on a large collection of (possibly overlapping) subsequences corresponding to a family of groups. Groups are subsets of the context space, and in fairness applications, they may correspond to subpopulations defined by expressive functions of demograph… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  3. arXiv:2312.15469  [pdf, other

    stat.ML cs.LG stat.ME

    Efficient Estimation of the Central Mean Subspace via Smoothed Gradient Outer Products

    Authors: Gan Yuan, Mingyue Xu, Samory Kpotufe, Daniel Hsu

    Abstract: We consider the problem of sufficient dimension reduction (SDR) for multi-index models. The estimators of the central mean subspace in prior works either have slow (non-parametric) convergence rates, or rely on stringent distributional conditions (e.g., the covariate distribution $P_{\mathbf{X}}$ being elliptical symmetric). In this paper, we show that a fast parametric convergence rate of form… ▽ More

    Submitted 24 December, 2023; originally announced December 2023.

    MSC Class: 62B05; 62G08

  4. arXiv:2307.04191  [pdf, ps, other

    math.ST cs.IT cs.LG stat.ML

    On the sample complexity of parameter estimation in logistic regression with normal design

    Authors: Daniel Hsu, Arya Mazumdar

    Abstract: The logistic regression model is one of the most popular data generation model in noisy binary classification problems. In this work, we study the sample complexity of estimating the parameters of the logistic regression model up to a given $\ell_2$ error, in terms of the dimension and the inverse temperature, with standard normal covariates. The inverse temperature controls the signal-to-noise ra… ▽ More

    Submitted 23 May, 2024; v1 submitted 9 July, 2023; originally announced July 2023.

  5. arXiv:2306.02896  [pdf, other

    cs.LG stat.ML

    Representational Strengths and Limitations of Transformers

    Authors: Clayton Sanford, Daniel Hsu, Matus Telgarsky

    Abstract: Attention layers, as commonly used in transformers, form the backbone of modern deep learning, yet there is no mathematical description of their benefits and deficiencies as compared with other architectures. In this work we establish both positive and negative results on the representation power of attention layers, with a focus on intrinsic complexity parameters such as width, depth, and embeddi… ▽ More

    Submitted 16 November, 2023; v1 submitted 5 June, 2023; originally announced June 2023.

  6. arXiv:2204.07526  [pdf, other

    math.ST cs.IT cs.LG stat.ML

    Statistical-Computational Trade-offs in Tensor PCA and Related Problems via Communication Complexity

    Authors: Rishabh Dudeja, Daniel Hsu

    Abstract: Tensor PCA is a stylized statistical inference problem introduced by Montanari and Richard to study the computational difficulty of estimating an unknown parameter from higher-order moment tensors. Unlike its matrix counterpart, Tensor PCA exhibits a statistical-computational gap, i.e., a sample size regime where the problem is information-theoretically solvable but conjectured to be computational… ▽ More

    Submitted 20 January, 2024; v1 submitted 15 April, 2022; originally announced April 2022.

  7. arXiv:2202.09305  [pdf, other

    cs.LG stat.ML

    Masked prediction tasks: a parameter identifiability view

    Authors: Bingbin Liu, Daniel Hsu, Pradeep Ravikumar, Andrej Risteski

    Abstract: The vast majority of work in self-supervised learning, both theoretical and empirical (though mostly the latter), have largely focused on recovering good features for downstream tasks, with the definition of "good" often being intricately tied to the downstream task itself. This lens is undoubtedly very interesting, but suffers from the problem that there isn't a "canonical" set of downstream task… ▽ More

    Submitted 18 February, 2022; originally announced February 2022.

  8. arXiv:2201.07348  [pdf, other

    cs.LG stat.ML

    Learning Tensor Representations for Meta-Learning

    Authors: Samuel Deng, Yilin Guo, Daniel Hsu, Debmalya Mandal

    Abstract: We introduce a tensor-based model of shared representation for meta-learning from a diverse set of tasks. Prior works on learning linear representations for meta-learning assume that there is a common shared representation across different tasks, and do not consider the additional task-specific observable side information. In this work, we model the meta-parameter through an order-$3$ tensor, whic… ▽ More

    Submitted 18 January, 2022; originally announced January 2022.

    Comments: Forthcoming at AISTATS-2022

  9. arXiv:2112.12181  [pdf, ps, other

    cs.LG stat.ML

    Simple and near-optimal algorithms for hidden stratification and multi-group learning

    Authors: Christopher Tosh, Daniel Hsu

    Abstract: Multi-group agnostic learning is a formal learning criterion that is concerned with the conditional risks of predictors within subgroups of a population. The criterion addresses recent practical concerns such as subgroup fairness and hidden stratification. This paper studies the structure of solutions to the multi-group learning problem, and provides simple and near-optimal algorithms for the lear… ▽ More

    Submitted 14 June, 2024; v1 submitted 22 December, 2021; originally announced December 2021.

  10. arXiv:2107.01509  [pdf, other

    cs.LG math.ST stat.ML

    Bayesian decision-making under misspecified priors with applications to meta-learning

    Authors: Max Simchowitz, Christopher Tosh, Akshay Krishnamurthy, Daniel Hsu, Thodoris Lykouris, Miroslav Dudík, Robert E. Schapire

    Abstract: Thompson sampling and other Bayesian sequential decision-making algorithms are among the most popular approaches to tackle explore/exploit trade-offs in (contextual) bandits. The choice of prior in these algorithms offers flexibility to encode domain knowledge but can also lead to poor performance when misspecified. In this paper, we demonstrate that performance degrades gracefully with misspecifi… ▽ More

    Submitted 3 July, 2021; originally announced July 2021.

  11. arXiv:2105.14084  [pdf, other

    cs.LG math.ST stat.ML

    Support vector machines and linear regression coincide with very high-dimensional features

    Authors: Navid Ardeshir, Clayton Sanford, Daniel Hsu

    Abstract: The support vector machine (SVM) and minimum Euclidean norm least squares regression are two fundamentally different approaches to fitting linear models, but they have recently been connected in models for very high-dimensional data through a phenomenon of support vector proliferation, where every training example used to fit an SVM becomes a support vector. In this paper, we explore the generalit… ▽ More

    Submitted 27 October, 2021; v1 submitted 28 May, 2021; originally announced May 2021.

    Comments: 34 pages, 9 figures

  12. arXiv:2105.13445  [pdf, other

    math.ST stat.AP

    The piranha problem: Large effects swimming in a small pond

    Authors: Christopher Tosh, Philip Greengard, Ben Goodrich, Andrew Gelman, Aki Vehtari, Daniel Hsu

    Abstract: In some scientific fields, it is common to have certain variables of interest that are of particular importance and for which there are many studies indicating a relationship with different explanatory variables. In such cases, particularly those where no relationships are known among the explanatory variables, it is worth asking under what conditions it is possible for all such claimed effects to… ▽ More

    Submitted 23 July, 2024; v1 submitted 27 May, 2021; originally announced May 2021.

  13. arXiv:2105.07593  [pdf, other

    cs.CV cs.AI cs.LG cs.RO stat.ML

    Differentiable SLAM-net: Learning Particle SLAM for Visual Navigation

    Authors: Peter Karkus, Shaojun Cai, David Hsu

    Abstract: Simultaneous localization and mapping (SLAM) remains challenging for a number of downstream applications, such as visual robot navigation, because of rapid turns, featureless walls, and poor camera quality. We introduce the Differentiable SLAM Network (SLAM-net) along with a navigation architecture to enable planar robot navigation in previously unseen indoor environments. SLAM-net encodes a parti… ▽ More

    Submitted 19 May, 2021; v1 submitted 16 May, 2021; originally announced May 2021.

    Comments: CVPR 2021, extended results

  14. arXiv:2102.02336  [pdf, other

    cs.LG cs.NE stat.ML

    On the Approximation Power of Two-Layer Networks of Random ReLUs

    Authors: Daniel Hsu, Clayton Sanford, Rocco A. Servedio, Emmanouil-Vasileios Vlatakis-Gkaragkounis

    Abstract: This paper considers the following question: how well can depth-two ReLU networks with randomly initialized bottom-level weights represent smooth functions? We give near-matching upper- and lower-bounds for $L_2$-approximation in terms of the Lipschitz constant, the desired accuracy, and the dimension of the problem, as well as similar results in terms of Sobolev norms. Our positive results employ… ▽ More

    Submitted 7 September, 2021; v1 submitted 3 February, 2021; originally announced February 2021.

    Comments: 39 pages, COLT version

    Journal ref: Proceedings of Thirty Fourth Conference on Learning Theory, PMLR 134 (2021) 2423-2461

  15. arXiv:2009.10670  [pdf, other

    math.ST cs.LG stat.ML

    On the proliferation of support vectors in high dimensions

    Authors: Daniel Hsu, Vidya Muthukumar, Ji Xu

    Abstract: The support vector machine (SVM) is a well-established classification method whose name refers to the particular training examples, called support vectors, that determine the maximum margin separating hyperplane. The SVM classifier is known to enjoy good generalization properties when the number of support vectors is small compared to the number of training examples. However, recent research has s… ▽ More

    Submitted 13 June, 2022; v1 submitted 22 September, 2020; originally announced September 2020.

  16. arXiv:2008.10150  [pdf, ps, other

    cs.LG stat.ML

    Contrastive learning, multi-view redundancy, and linear models

    Authors: Christopher Tosh, Akshay Krishnamurthy, Daniel Hsu

    Abstract: Self-supervised learning is an empirically successful approach to unsupervised learning based on creating artificial supervised learning problems. A popular self-supervised approach to representation learning is contrastive learning, which leverages naturally occurring pairs of similar and dissimilar data points, or multiple views of the same data. This work provides a theoretical analysis of cont… ▽ More

    Submitted 14 April, 2021; v1 submitted 23 August, 2020; originally announced August 2020.

  17. arXiv:2008.04101  [pdf, ps, other

    math.ST stat.ML

    Statistical Query Lower Bounds for Tensor PCA

    Authors: Rishabh Dudeja, Daniel Hsu

    Abstract: In the Tensor PCA problem introduced by Richard and Montanari (2014), one is given a dataset consisting of $n$ samples $\mathbf{T}_{1:n}$ of i.i.d. Gaussian tensors of order $k$ with the promise that $\mathbb{E}\mathbf{T}_1$ is a rank-1 tensor and $\|\mathbb{E} \mathbf{T}_1\| = 1$. The goal is to estimate $\mathbb{E} \mathbf{T}_1$. This problem exhibits a large conjectured hard phase when $k>2$: W… ▽ More

    Submitted 13 February, 2021; v1 submitted 10 August, 2020; originally announced August 2020.

  18. arXiv:2008.02430  [pdf, other

    cs.LG stat.ML

    Contrastive Variational Reinforcement Learning for Complex Observations

    Authors: Xiao Ma, Siwei Chen, David Hsu, Wee Sun Lee

    Abstract: Deep reinforcement learning (DRL) has achieved significant success in various robot tasks: manipulation, navigation, etc. However, complex visual observations in natural environments remains a major challenge. This paper presents Contrastive Variational Reinforcement Learning (CVRL), a model-based method that tackles complex visual observations in DRL. CVRL learns a contrastive variational model b… ▽ More

    Submitted 9 November, 2020; v1 submitted 5 August, 2020; originally announced August 2020.

    Comments: CoRL 2020 camera ready

  19. arXiv:2007.06207  [pdf, other

    cs.LG cs.AI stat.ML

    DinerDash Gym: A Benchmark for Policy Learning in High-Dimensional Action Space

    Authors: Siwei Chen, Xiao Ma, David Hsu

    Abstract: It has been arduous to assess the progress of a policy learning algorithm in the domain of hierarchical task with high dimensional action space due to the lack of a commonly accepted benchmark. In this work, we propose a new light-weight benchmark task called Diner Dash for evaluating the performance in a complicated task with high dimensional action space. In contrast to the traditional Atari gam… ▽ More

    Submitted 13 July, 2020; originally announced July 2020.

  20. arXiv:2007.06029  [pdf, other

    cs.LG stat.ML

    Ensuring Fairness Beyond the Training Data

    Authors: Debmalya Mandal, Samuel Deng, Suman Jana, Jeannette M. Wing, Daniel Hsu

    Abstract: We initiate the study of fair classifiers that are robust to perturbations in the training distribution. Despite recent progress, the literature on fairness has largely ignored the design of fair and robust classifiers. In this work, we develop classifiers that are fair not only with respect to the training distribution, but also for a class of distributions that are weighted perturbations of the… ▽ More

    Submitted 4 November, 2020; v1 submitted 12 July, 2020; originally announced July 2020.

    Comments: 18 pages, 3 figures, To appear at NeurIPS-2020

  21. arXiv:2005.08054  [pdf, other

    cs.LG cs.IT stat.ML

    Classification vs regression in overparameterized regimes: Does the loss function matter?

    Authors: Vidya Muthukumar, Adhyyan Narang, Vignesh Subramanian, Mikhail Belkin, Daniel Hsu, Anant Sahai

    Abstract: We compare classification and regression tasks in an overparameterized linear model with Gaussian features. On the one hand, we show that with sufficient overparameterization all training points are support vectors: solutions obtained by least-squares minimum-norm interpolation, typically used for regression, are identical to those produced by the hard-margin support vector machine (SVM) that mini… ▽ More

    Submitted 14 October, 2021; v1 submitted 16 May, 2020; originally announced May 2020.

    Journal ref: Journal of Machine Learning Research, 22(222):1-69, 2021

  22. arXiv:2003.02234  [pdf, other

    cs.LG stat.ML

    Contrastive estimation reveals topic posterior information to linear models

    Authors: Christopher Tosh, Akshay Krishnamurthy, Daniel Hsu

    Abstract: Contrastive learning is an approach to representation learning that utilizes naturally occurring similar and dissimilar pairs of data points to find useful embeddings of data. In the context of document classification under topic modeling assumptions, we prove that contrastive learning is capable of recovering a representation of documents that reveals their underlying topic posterior information… ▽ More

    Submitted 4 March, 2020; originally announced March 2020.

  23. arXiv:2002.09884  [pdf, other

    cs.LG cs.AI stat.ML

    Discriminative Particle Filter Reinforcement Learning for Complex Partial Observations

    Authors: Xiao Ma, Peter Karkus, David Hsu, Wee Sun Lee, Nan Ye

    Abstract: Deep reinforcement learning is successful in decision making for sophisticated games, such as Atari, Go, etc. However, real-world decision making often requires reasoning with partial information extracted from complex visual observations. This paper presents Discriminative Particle Filter Reinforcement Learning (DPFRL), a new reinforcement learning framework for complex partial observations. DPFR… ▽ More

    Submitted 23 February, 2020; originally announced February 2020.

    Comments: Accepted to ICLR 2020

  24. arXiv:1912.13037  [pdf, other

    cs.LG cs.AI stat.ML

    A New Framework for Query Efficient Active Imitation Learning

    Authors: Daniel Hsu

    Abstract: We seek to align agent policy with human expert behavior in a reinforcement learning (RL) setting, without any prior knowledge about dynamics, reward function, and unsafe states. There is a human expert knowing the rewards and unsafe states based on his preference and objective, but querying that human expert is expensive. To address this challenge, we propose a new framework for imitation learnin… ▽ More

    Submitted 30 December, 2019; originally announced December 2019.

  25. arXiv:1910.00054  [pdf, other

    cs.LG cs.CL cs.IR stat.ML

    Weakly Supervised Attention Networks for Fine-Grained Opinion Mining and Public Health

    Authors: Giannis Karamanolakis, Daniel Hsu, Luis Gravano

    Abstract: In many review classification applications, a fine-grained analysis of the reviews is desirable, because different segments (e.g., sentences) of a review may focus on different aspects of the entity in question. However, training supervised models for segment-level classification requires segment labels, which may be more difficult or expensive to obtain than review labels. In this paper, we emplo… ▽ More

    Submitted 30 September, 2019; originally announced October 2019.

    Comments: Accepted for the 5th Workshop on Noisy User-generated Text (W-NUT 2019), held in conjunction with EMNLP 2019

  26. arXiv:1909.01502  [pdf, other

    stat.ML cs.CR cs.LG

    Privacy Accounting and Quality Control in the Sage Differentially Private ML Platform

    Authors: Mathias Lecuyer, Riley Spahn, Kiran Vodrahalli, Roxana Geambasu, Daniel Hsu

    Abstract: Companies increasingly expose machine learning (ML) models trained over sensitive user data to untrusted domains, such as end-user devices and wide-access model stores. We present Sage, a differentially private (DP) ML platform that bounds the cumulative leakage of training data through models. Sage builds upon the rich literature on DP ML algorithms and contributes pragmatic solutions to two of t… ▽ More

    Submitted 6 September, 2019; v1 submitted 3 September, 2019; originally announced September 2019.

    Comments: Extended version of a paper presented at the 27th ACM Symposium on Operating Systems Principles (SOSP '19)

  27. arXiv:1909.00415  [pdf, other

    cs.LG cs.CL stat.ML

    Leveraging Just a Few Keywords for Fine-Grained Aspect Detection Through Weakly Supervised Co-Training

    Authors: Giannis Karamanolakis, Daniel Hsu, Luis Gravano

    Abstract: User-generated reviews can be decomposed into fine-grained segments (e.g., sentences, clauses), each evaluating a different aspect of the principal entity (e.g., price, quality, appearance). Automatically detecting these aspects can be useful for both users and downstream opinion mining applications. Current supervised approaches for learning aspect classifiers require many fine-grained aspect lab… ▽ More

    Submitted 1 September, 2019; originally announced September 2019.

    Comments: Accepted to EMNLP 2019

  28. arXiv:1907.12439  [pdf, other

    cs.LG cs.AI stat.ML

    Hindsight Trust Region Policy Optimization

    Authors: Hanbo Zhang, Site Bai, Xuguang Lan, David Hsu, Nanning Zheng

    Abstract: Reinforcement Learning(RL) with sparse rewards is a major challenge. We propose \emph{Hindsight Trust Region Policy Optimization}(HTRPO), a new RL algorithm that extends the highly successful TRPO algorithm with \emph{hindsight} to tackle the challenge of sparse rewards. Hindsight refers to the algorithm's ability to learn from information across goals, including ones not intended for the current… ▽ More

    Submitted 17 May, 2021; v1 submitted 29 July, 2019; originally announced July 2019.

    Comments: Accepted by IJCAI 2021

  29. arXiv:1907.03411  [pdf, other

    stat.ML cs.LG

    Unbiased estimators for random design regression

    Authors: Michał Dereziński, Manfred K. Warmuth, Daniel Hsu

    Abstract: In linear regression we wish to estimate the optimum linear least squares predictor for a distribution over $d$-dimensional input points and real-valued responses, based on a small sample. Under standard random design analysis, where the sample is drawn i.i.d. from the input distribution, the least squares solution for that sample can be viewed as the natural estimator of the optimum. Unfortunatel… ▽ More

    Submitted 7 June, 2022; v1 submitted 8 July, 2019; originally announced July 2019.

  30. arXiv:1906.03471  [pdf, other

    cs.LG stat.ML

    A gradual, semi-discrete approach to generative network training via explicit Wasserstein minimization

    Authors: Yucheng Chen, Matus Telgarsky, Chao Zhang, Bolton Bailey, Daniel Hsu, Jian Peng

    Abstract: This paper provides a simple procedure to fit generative networks to target distributions, with the goal of a small Wasserstein distance (or other optimal transport costs). The approach is based on two principles: (a) if the source randomness of the network is a continuous distribution (the "semi-discrete" setting), then the Wasserstein distance is realized by a deterministic optimal transport map… ▽ More

    Submitted 11 June, 2019; v1 submitted 8 June, 2019; originally announced June 2019.

    Comments: Appears in ICML 2019

  31. arXiv:1906.03231  [pdf, ps, other

    cs.LG cs.CR stat.ML

    A cryptographic approach to black box adversarial machine learning

    Authors: Kevin Shi, Daniel Hsu, Allison Bishop

    Abstract: We propose a new randomized ensemble technique with a provable security guarantee against black-box transfer attacks. Our proof constructs a new security problem for random binary classifiers which is easier to empirically verify and a reduction from the security of this new model to the security of the ensemble classifier. We provide experimental evidence of the security of our random binary clas… ▽ More

    Submitted 21 February, 2020; v1 submitted 7 June, 2019; originally announced June 2019.

  32. arXiv:1906.02101  [pdf, other

    cs.LG stat.ML

    Diameter-based Interactive Structure Discovery

    Authors: Christopher Tosh, Daniel Hsu

    Abstract: We introduce interactive structure discovery, a generic framework that encompasses many interactive learning settings, including active learning, top-k item identification, interactive drug discovery, and others. We adapt a recently developed active learning algorithm of Tosh and Dasgupta (2017) for interactive structure discovery, and show that the new algorithm can be made noise-tolerant and enj… ▽ More

    Submitted 12 March, 2020; v1 submitted 5 June, 2019; originally announced June 2019.

  33. arXiv:1906.01139  [pdf, other

    math.ST cs.LG stat.ML

    On the number of variables to use in principal component regression

    Authors: Ji Xu, Daniel Hsu

    Abstract: We study least squares linear regression over $N$ uncorrelated Gaussian features that are selected in order of decreasing variance. When the number of selected features $p$ is at most the sample size $n$, the estimator under consideration coincides with the principal component regression estimator; when $p>n$, the estimator is the least $\ell_2$ norm solution over the selected features. We give an… ▽ More

    Submitted 3 October, 2019; v1 submitted 3 June, 2019; originally announced June 2019.

  34. arXiv:1905.12885  [pdf, other

    cs.LG stat.ML

    Particle Filter Recurrent Neural Networks

    Authors: Xiao Ma, Peter Karkus, David Hsu, Wee Sun Lee

    Abstract: Recurrent neural networks (RNNs) have been extraordinarily successful for prediction with sequential data. To tackle highly variable and noisy real-world data, we introduce Particle Filter Recurrent Neural Networks (PF-RNNs), a new RNN family that explicitly models uncertainty in its internal structure: while an RNN relies on a long, deterministic latent state vector, a PF-RNN maintains a latent s… ▽ More

    Submitted 1 December, 2019; v1 submitted 30 May, 2019; originally announced May 2019.

    Comments: Accepted to AAAI 2020

  35. arXiv:1905.11602  [pdf, other

    cs.LG cs.AI cs.RO stat.ML

    Differentiable Algorithm Networks for Composable Robot Learning

    Authors: Peter Karkus, Xiao Ma, David Hsu, Leslie Pack Kaelbling, Wee Sun Lee, Tomas Lozano-Perez

    Abstract: This paper introduces the Differentiable Algorithm Network (DAN), a composable architecture for robot learning systems. A DAN is composed of neural network modules, each encoding a differentiable robot algorithm and an associated model; and it is trained end-to-end from data. DAN combines the strengths of model-driven modular system design and data-driven end-to-end learning. The algorithms and mo… ▽ More

    Submitted 28 May, 2019; originally announced May 2019.

    Comments: RSS 2019 camera ready. Video is available at https://youtu.be/4jcYlTSJF4Y

  36. arXiv:1904.11761  [pdf, other

    cs.LG cs.AI stat.ML

    Factored Contextual Policy Search with Bayesian Optimization

    Authors: Robert Pinsler, Peter Karkus, Andras Kupcsik, David Hsu, Wee Sun Lee

    Abstract: Scarce data is a major challenge to scaling robot learning to truly complex tasks, as we need to generalize locally learned policies over different task contexts. Contextual policy search offers data-efficient learning and generalization by explicitly conditioning the policy on a parametric context space. In this paper, we further structure the contextual policy representation. We propose to facto… ▽ More

    Submitted 26 April, 2019; originally announced April 2019.

    Comments: To appear in ICRA 2019

  37. arXiv:1903.07571  [pdf, other

    cs.LG stat.ML

    Two models of double descent for weak features

    Authors: Mikhail Belkin, Daniel Hsu, Ji Xu

    Abstract: The "double descent" risk curve was proposed to qualitatively describe the out-of-sample prediction accuracy of variably-parameterized machine learning models. This article provides a precise mathematical analysis for the shape of this curve in two simple data models with the least squares/least norm predictor. Specifically, it is shown that the risk peaks when the number of features $p$ is close… ▽ More

    Submitted 9 October, 2020; v1 submitted 18 March, 2019; originally announced March 2019.

    Journal ref: SIAM Journal on Mathematics of Data Science, 2(4):1167-1180, 2020

  38. arXiv:1902.01753  [pdf, other

    math.ST cs.LG stat.ML

    Consistent Risk Estimation in Moderately High-Dimensional Linear Regression

    Authors: Ji Xu, Arian Maleki, Kamiar Rahnama Rad, Daniel Hsu

    Abstract: Risk estimation is at the core of many learning systems. The importance of this problem has motivated researchers to propose different schemes, such as cross validation, generalized cross validation, and Bootstrap. The theoretical properties of such estimates have been extensively studied in the low-dimensional settings, where the number of predictors $p$ is much smaller than the number of observa… ▽ More

    Submitted 18 January, 2021; v1 submitted 5 February, 2019; originally announced February 2019.

  39. Reconciling modern machine learning practice and the bias-variance trade-off

    Authors: Mikhail Belkin, Daniel Hsu, Siyuan Ma, Soumik Mandal

    Abstract: Breakthroughs in machine learning are rapidly changing science and society, yet our fundamental understanding of this technology has lagged far behind. Indeed, one of the central tenets of the field, the bias-variance trade-off, appears to be at odds with the observed behavior of methods used in the modern machine learning practice. The bias-variance trade-off implies that a model should balance u… ▽ More

    Submitted 10 September, 2019; v1 submitted 28 December, 2018; originally announced December 2018.

  40. arXiv:1810.11344  [pdf, other

    cs.LG stat.ML

    Benefits of over-parameterization with EM

    Authors: Ji Xu, Daniel Hsu, Arian Maleki

    Abstract: Expectation Maximization (EM) is among the most popular algorithms for maximum likelihood estimation, but it is generally only guaranteed to find its stationary points of the log-likelihood objective. The goal of this article is to present theoretical and empirical evidence that over-parameterization can help EM avoid spurious local optima in the log-likelihood. We consider the problem of estimati… ▽ More

    Submitted 26 October, 2018; originally announced October 2018.

    Comments: Accepted at NIPS 2018

  41. arXiv:1810.02453  [pdf, ps, other

    cs.LG stat.ML

    Correcting the bias in least squares regression with volume-rescaled sampling

    Authors: Michał Dereziński, Manfred K. Warmuth, Daniel Hsu

    Abstract: Consider linear regression where the examples are generated by an unknown distribution on $R^d\times R$. Without any assumptions on the noise, the linear least squares solution for any i.i.d. sample will typically be biased w.r.t. the least squares optimum over the entire distribution. However, we show that if an i.i.d. sample of any size k is augmented by a certain small additional sample, then t… ▽ More

    Submitted 4 October, 2018; originally announced October 2018.

  42. arXiv:1806.05161  [pdf, other

    stat.ML cond-mat.stat-mech cs.LG

    Overfitting or perfect fitting? Risk bounds for classification and regression rules that interpolate

    Authors: Mikhail Belkin, Daniel Hsu, Partha Mitra

    Abstract: Many modern machine learning models are trained to achieve zero or near-zero training error in order to obtain near-optimal (but non-zero) test error. This phenomenon of strong generalization performance for "overfitted" / interpolated classifiers appears to be ubiquitous in high-dimensional data, having been observed in deep networks, kernel machines, boosting and random forests. Their performanc… ▽ More

    Submitted 26 October, 2018; v1 submitted 13 June, 2018; originally announced June 2018.

  43. arXiv:1805.08975  [pdf, other

    cs.RO cs.AI cs.CV cs.LG stat.ML

    Particle Filter Networks with Application to Visual Localization

    Authors: Peter Karkus, David Hsu, Wee Sun Lee

    Abstract: Particle filtering is a powerful approach to sequential state estimation and finds application in many domains, including robot localization, object tracking, etc. To apply particle filtering in practice, a critical challenge is to construct probabilistic system models, especially for systems with complex dynamics or rich sensory inputs such as camera images. This paper introduces the Particle Fil… ▽ More

    Submitted 25 October, 2018; v1 submitted 23 May, 2018; originally announced May 2018.

    Comments: CoRL 2018 camera ready

  44. arXiv:1802.03471  [pdf, other

    stat.ML cs.AI cs.CR cs.LG

    Certified Robustness to Adversarial Examples with Differential Privacy

    Authors: Mathias Lecuyer, Vaggelis Atlidakis, Roxana Geambasu, Daniel Hsu, Suman Jana

    Abstract: Adversarial examples that fool machine learning models, particularly deep neural networks, have been a topic of intense research interest, with attacks and defenses being developed in a tight back-and-forth. Most past defenses are best effort and have been shown to be vulnerable to sophisticated attacks. Recently a set of certified defenses have been introduced, which provide guarantees of robustn… ▽ More

    Submitted 29 May, 2019; v1 submitted 9 February, 2018; originally announced February 2018.

  45. arXiv:1802.01212  [pdf, other

    astro-ph.CO cs.LG stat.ML

    Non-Gaussian information from weak lensing data via deep learning

    Authors: Arushi Gupta, José Manuel Zorrilla Matilla, Daniel Hsu, Zoltán Haiman

    Abstract: Weak lensing maps contain information beyond two-point statistics on small scales. Much recent work has tried to extract this information through a range of different observables or via nonlinear transformations of the lensing field. Here we train and apply a 2D convolutional neural network to simulated noiseless lensing maps covering 96 different cosmological models over a range of {$Ω_m,σ_8$}. U… ▽ More

    Submitted 1 May, 2018; v1 submitted 4 February, 2018; originally announced February 2018.

    Comments: 15 pages, 13 figures, accepted to PRD

    Journal ref: Phys. Rev. D 97, 103515 (2018)

  46. arXiv:1709.06489  [pdf, ps, other

    q-bio.GN cs.LG q-bio.QM stat.ML

    Accurate Genomic Prediction Of Human Height

    Authors: Louis Lello, Steven G. Avery, Laurent Tellier, Ana Vazquez, Gustavo de los Campos, Stephen D. H. Hsu

    Abstract: We construct genomic predictors for heritable and extremely complex human quantitative traits (height, heel bone density, and educational attainment) using modern methods in high dimensional statistics (i.e., machine learning). Replication tests show that these predictors capture, respectively, $\sim$40, 20, and 9 percent of total variance for the three traits. For example, predicted heights corre… ▽ More

    Submitted 19 September, 2017; originally announced September 2017.

    Comments: 17 pages, 10 figures

  47. arXiv:1708.07367  [pdf, ps, other

    math.ST cs.LG math.PR stat.ML

    Mixing time estimation in reversible Markov chains from a single sample path

    Authors: Daniel Hsu, Aryeh Kontorovich, David A. Levin, Yuval Peres, Csaba Szepesvári

    Abstract: The spectral gap $γ$ of a finite, ergodic, and reversible Markov chain is an important parameter measuring the asymptotic rate of convergence. In applications, the transition matrix $P$ may be unknown, yet one sample of the chain up to a fixed time $n$ may be observed. We consider here the problem of estimating $γ$ from this data. Let $π$ be the stationary distribution of $P$, and… ▽ More

    Submitted 24 August, 2017; originally announced August 2017.

    Comments: 34 pages, merges results of arXiv:1506.02903 and arXiv:1612.05330

  48. arXiv:1708.02975   

    cs.LG cs.NE stat.ML

    Anomaly Detection on Graph Time Series

    Authors: Daniel Hsu

    Abstract: In this paper, we use variational recurrent neural network to investigate the anomaly detection problem on graph time series. The temporal correlation is modeled by the combination of recurrent neural network (RNN) and variational inference (VI), while the spatial information is captured by the graph convolutional network. In order to incorporate external factors, we use feature extractor to augme… ▽ More

    Submitted 29 May, 2022; v1 submitted 9 August, 2017; originally announced August 2017.

    Comments: Very preminary work with some fatal mistakes. Some other work covering this will appear soon

  49. arXiv:1706.00729  [pdf, ps, other

    math.ST cs.LG stat.ML

    Parameter identification in Markov chain choice models

    Authors: Arushi Gupta, Daniel Hsu

    Abstract: This work studies the parameter identification problem for the Markov chain choice model of Blanchet, Gallego, and Goyal used in assortment planning. In this model, the product selected by a customer is determined by a Markov chain over the products, where the products in the offered assortment are absorbing states. The underlying parameters of the model were previously shown to be identifiable fr… ▽ More

    Submitted 25 July, 2017; v1 submitted 2 June, 2017; originally announced June 2017.

    Comments: 10 pages

  50. arXiv:1705.07048  [pdf, ps, other

    cs.LG math.ST stat.ML

    Linear regression without correspondence

    Authors: Daniel Hsu, Kevin Shi, Xiaorui Sun

    Abstract: This article considers algorithmic and statistical aspects of linear regression when the correspondence between the covariates and the responses is unknown. First, a fully polynomial-time approximation scheme is given for the natural least squares optimization problem in any constant dimension. Next, in an average-case and noise-free setting where the responses exactly correspond to a linear funct… ▽ More

    Submitted 7 November, 2017; v1 submitted 19 May, 2017; originally announced May 2017.