Search | arXiv e-print repository

Sample-Efficient Linear Representation Learning from Non-IID Non-Isotropic Data

Authors: Thomas T. C. K. Zhang, Leonardo F. Toso, James Anderson, Nikolai Matni

Abstract: A powerful concept behind much of the recent progress in machine learning is the extraction of common features across data from heterogeneous sources or tasks. Intuitively, using all of one's data to learn a common representation function benefits both computational effort and statistical generalization by leaving a smaller number of parameters to fine-tune on a given task. Toward theoretically gr… ▽ More A powerful concept behind much of the recent progress in machine learning is the extraction of common features across data from heterogeneous sources or tasks. Intuitively, using all of one's data to learn a common representation function benefits both computational effort and statistical generalization by leaving a smaller number of parameters to fine-tune on a given task. Toward theoretically grounding these merits, we propose a general setting of recovering linear operators $M$ from noisy vector measurements $y = Mx + w$, where the covariates $x$ may be both non-i.i.d. and non-isotropic. We demonstrate that existing isotropy-agnostic representation learning approaches incur biases on the representation update, which causes the scaling of the noise terms to lose favorable dependence on the number of source tasks. This in turn can cause the sample complexity of representation learning to be bottlenecked by the single-task data size. We introduce an adaptation, $\texttt{De-bias & Feature-Whiten}$ ($\texttt{DFW}$), of the popular alternating minimization-descent scheme proposed independently in Collins et al., (2021) and Nayer and Vaswani (2022), and establish linear convergence to the optimal representation with noise level scaling down with the $\textit{total}$ source data size. This leads to generalization bounds on the same order as an oracle empirical risk minimizer. We verify the vital importance of $\texttt{DFW}$ on various numerical simulations. In particular, we show that vanilla alternating-minimization descent fails catastrophically even for iid, but mildly non-isotropic data. Our analysis unifies and generalizes prior work, and provides a flexible framework for a wider range of applications, such as in controls and dynamical systems. △ Less

Submitted 27 July, 2024; v1 submitted 8 August, 2023; originally announced August 2023.

Comments: Appeared at ICLR 2024 (spotlight presentation)

arXiv:2305.16415 [pdf, other]

Performance-Robustness Tradeoffs in Adversarially Robust Control and Estimation

Authors: Bruce D. Lee, Thomas T. C. K. Zhang, Hamed Hassani, Nikolai Matni

Abstract: While $\mathcal{H}_\infty$ methods can introduce robustness against worst-case perturbations, their nominal performance under conventional stochastic disturbances is often drastically reduced. Though this fundamental tradeoff between nominal performance and robustness is known to exist, it is not well-characterized in quantitative terms. Toward addressing this issue, we borrow the increasingly ubi… ▽ More While $\mathcal{H}_\infty$ methods can introduce robustness against worst-case perturbations, their nominal performance under conventional stochastic disturbances is often drastically reduced. Though this fundamental tradeoff between nominal performance and robustness is known to exist, it is not well-characterized in quantitative terms. Toward addressing this issue, we borrow the increasingly ubiquitous notion of adversarial training from machine learning to construct a class of controllers which are optimized for disturbances consisting of mixed stochastic and worst-case components. We find that this problem admits a linear time invariant optimal controller that has a form closely related to suboptimal $\mathcal{H}_\infty$ solutions. We then provide a quantitative performance-robustness tradeoff analysis in two analytically tractable cases: state feedback control, and state estimation. In these special cases, we demonstrate that the severity of the tradeoff depends in an interpretable manner upon system-theoretic properties such as the spectrum of the controllability gramian, the spectrum of the observability gramian, and the stability of the system. This provides practitioners with general guidance for determining how much robustness to incorporate based on a priori system knowledge. We empirically validate our results by comparing the performance of our controller against standard baselines, and plotting tradeoff curves. △ Less

Submitted 25 May, 2023; originally announced May 2023.

Comments: arXiv admin note: substantial text overlap with arXiv:2203.10763

arXiv:2205.14812 [pdf, other]

TaSIL: Taylor Series Imitation Learning

Authors: Daniel Pfrommer, Thomas T. C. K. Zhang, Stephen Tu, Nikolai Matni

Abstract: We propose Taylor Series Imitation Learning (TaSIL), a simple augmentation to standard behavior cloning losses in the context of continuous control. TaSIL penalizes deviations in the higher-order Taylor series terms between the learned and expert policies. We show that experts satisfying a notion of $\textit{incremental input-to-state stability}$ are easy to learn, in the sense that a small TaSIL-… ▽ More We propose Taylor Series Imitation Learning (TaSIL), a simple augmentation to standard behavior cloning losses in the context of continuous control. TaSIL penalizes deviations in the higher-order Taylor series terms between the learned and expert policies. We show that experts satisfying a notion of $\textit{incremental input-to-state stability}$ are easy to learn, in the sense that a small TaSIL-augmented imitation loss over expert trajectories guarantees a small imitation loss over trajectories generated by the learned policy. We provide sample-complexity bounds for TaSIL that scale as $\tilde{\mathcal{O}}(1/n)$ in the realizable setting, for $n$ the number of expert demonstrations. Finally, we demonstrate experimentally the relationship between the robustness of the expert policy and the order of Taylor expansion required in TaSIL, and compare standard Behavior Cloning, DART, and DAgger with TaSIL-loss-augmented variants. In all cases, we show significant improvement over baselines across a variety of MuJoCo tasks. △ Less

Submitted 16 January, 2023; v1 submitted 29 May, 2022; originally announced May 2022.

Comments: Appeared at NeurIPS 2022. V2: added to related work, updated notation, fixed small errors in appendix

arXiv:2203.10763 [pdf, other]

Performance-Robustness Tradeoffs in Adversarially Robust Linear-Quadratic Control

Authors: Bruce D. Lee, Thomas T. C. K. Zhang, Hamed Hassani, Nikolai Matni

Abstract: While $\mathcal{H}_\infty$ methods can introduce robustness against worst-case perturbations, their nominal performance under conventional stochastic disturbances is often drastically reduced. Though this fundamental tradeoff between nominal performance and robustness is known to exist, it is not well-characterized in quantitative terms. Toward addressing this issue, we borrow from the increasingl… ▽ More While $\mathcal{H}_\infty$ methods can introduce robustness against worst-case perturbations, their nominal performance under conventional stochastic disturbances is often drastically reduced. Though this fundamental tradeoff between nominal performance and robustness is known to exist, it is not well-characterized in quantitative terms. Toward addressing this issue, we borrow from the increasingly ubiquitous notion of adversarial training from machine learning to construct a class of controllers which are optimized for disturbances consisting of mixed stochastic and worst-case components. We find that this problem admits a stationary optimal controller that has a simple analytic form closely related to suboptimal $\mathcal{H}_\infty$ solutions. We then provide a quantitative performance-robustness tradeoff analysis, in which system-theoretic properties such as controllability and stability explicitly manifest in an interpretable manner. This provides practitioners with general guidance for determining how much robustness to incorporate based on a priori system knowledge. We empirically validate our results by comparing the performance of our controller against standard baselines, and plotting tradeoff curves. △ Less

Submitted 21 March, 2022; originally announced March 2022.

arXiv:2112.10690 [pdf, other]

Adversarially Robust Stability Certificates can be Sample-Efficient

Authors: Thomas T. C. K. Zhang, Stephen Tu, Nicholas M. Boffi, Jean-Jacques E. Slotine, Nikolai Matni

Abstract: Motivated by bridging the simulation to reality gap in the context of safety-critical systems, we consider learning adversarially robust stability certificates for unknown nonlinear dynamical systems. In line with approaches from robust control, we consider additive and Lipschitz bounded adversaries that perturb the system dynamics. We show that under suitable assumptions of incremental stability… ▽ More Motivated by bridging the simulation to reality gap in the context of safety-critical systems, we consider learning adversarially robust stability certificates for unknown nonlinear dynamical systems. In line with approaches from robust control, we consider additive and Lipschitz bounded adversaries that perturb the system dynamics. We show that under suitable assumptions of incremental stability on the underlying system, the statistical cost of learning an adversarial stability certificate is equivalent, up to constant factors, to that of learning a nominal stability certificate. Our results hinge on novel bounds for the Rademacher complexity of the resulting adversarial loss class, which may be of independent interest. To the best of our knowledge, this is the first characterization of sample-complexity bounds when performing adversarial learning over data generated by a dynamical system. We further provide a practical algorithm for approximating the adversarial training algorithm, and validate our findings on a damped pendulum example. △ Less

Submitted 20 December, 2021; originally announced December 2021.

MSC Class: 93D05; 93D09

arXiv:2111.08864 [pdf, other]

Adversarial Tradeoffs in Robust State Estimation

Authors: Thomas T. C. K. Zhang, Bruce D. Lee, Hamed Hassani, Nikolai Matni

Abstract: Adversarially robust training has been shown to reduce the susceptibility of learned models to targeted input data perturbations. However, it has also been observed that such adversarially robust models suffer a degradation in accuracy when applied to unperturbed data sets, leading to a robustness-accuracy tradeoff. Inspired by recent progress in the adversarial machine learning literature which c… ▽ More Adversarially robust training has been shown to reduce the susceptibility of learned models to targeted input data perturbations. However, it has also been observed that such adversarially robust models suffer a degradation in accuracy when applied to unperturbed data sets, leading to a robustness-accuracy tradeoff. Inspired by recent progress in the adversarial machine learning literature which characterize such tradeoffs in simple settings, we develop tools to quantitatively study the performance-robustness tradeoff between nominal and robust state estimation. In particular, we define and analyze a novel $\textit{adversarially robust Kalman Filtering problem}$. We show that in contrast to most problem instances in adversarial machine learning, we can precisely derive the adversarial perturbation in the Kalman Filtering setting. We provide an algorithm to find this perturbation given data realizations, and develop upper and lower bounds on the adversarial state estimation error in terms of the standard (non-adversarial) estimation error and the spectral properties of the resulting observer. Through these results, we show a natural connection between a filter's robustness to adversarial perturbation and underlying control theoretic properties of the system being observed, namely the spectral properties of its observability gramian. △ Less

Submitted 6 February, 2023; v1 submitted 16 November, 2021; originally announced November 2021.

Comments: ACC 2023. V2: consolidated results for filtering, updated figures

MSC Class: 93B35

arXiv:2103.13840 [pdf, other]

doi 10.1137/21M1456807

Biwhitening Reveals the Rank of a Count Matrix

Authors: Boris Landa, Thomas T. C. K. Zhang, Yuval Kluger

Abstract: Estimating the rank of a corrupted data matrix is an important task in data analysis, most notably for choosing the number of components in PCA. Significant progress on this task was achieved using random matrix theory by characterizing the spectral properties of large noise matrices. However, utilizing such tools is not straightforward when the data matrix consists of count random variables, e.g.… ▽ More Estimating the rank of a corrupted data matrix is an important task in data analysis, most notably for choosing the number of components in PCA. Significant progress on this task was achieved using random matrix theory by characterizing the spectral properties of large noise matrices. However, utilizing such tools is not straightforward when the data matrix consists of count random variables, e.g., Poisson, in which case the noise can be heteroskedastic with an unknown variance in each entry. In this work, we consider a Poisson random matrix with independent entries, and propose a simple procedure termed \textit{biwhitening} for estimating the rank of the underlying signal matrix (i.e., the Poisson parameter matrix) without any prior knowledge. Our approach is based on the key observation that one can scale the rows and columns of the data matrix simultaneously so that the spectrum of the corresponding noise agrees with the standard Marchenko-Pastur (MP) law, justifying the use of the MP upper edge as a threshold for rank selection. Importantly, the required scaling factors can be estimated directly from the observations by solving a matrix scaling problem via the Sinkhorn-Knopp algorithm. Aside from the Poisson, our approach is extended to families of distributions that satisfy a quadratic relation between the mean and the variance, such as the generalized Poisson, binomial, negative binomial, gamma, and many others. This quadratic relation can also account for missing entries in the data. We conduct numerical experiments that corroborate our theoretical findings, and showcase the advantage of our approach for rank estimation in challenging regimes. Furthermore, we demonstrate the favorable performance of our approach on several real datasets of single-cell RNA sequencing (scRNA-seq), High-Throughput Chromosome Conformation Capture (Hi-C), and document topic modeling. △ Less

Submitted 2 November, 2021; v1 submitted 25 March, 2021; originally announced March 2021.

MSC Class: 62H12; 62H25

arXiv:1403.3715 [pdf, other]

On the continuous Fermat-Weber problem for a convex polygon using Euclidean distance

Authors: Thomas T. C. K. Zhang, John Gunnar Carlsson

Abstract: We consider the continuous Fermat-Weber problem, where the customers are continuously (uniformly) distributed along the boundary of a convex polygon. We derive the closed-form expression for finding the average distance from a given point to the continuously distributed customers along the boundary. A Weiszfeld-type procedure is proposed for this model, which is shown to be linearly convergent. We… ▽ More We consider the continuous Fermat-Weber problem, where the customers are continuously (uniformly) distributed along the boundary of a convex polygon. We derive the closed-form expression for finding the average distance from a given point to the continuously distributed customers along the boundary. A Weiszfeld-type procedure is proposed for this model, which is shown to be linearly convergent. We also derive a closed-form formula to find the average distance for a given point to the entire convex polygon, assuming a uniform distribution. Since the function is smooth, convex, and explicitly given, the continuous version of the Fermat-Weber problem over a convex polygon can be solved easily by numerical algorithms. △ Less

Submitted 14 March, 2014; originally announced March 2014.

Showing 1–8 of 8 results for author: Zhang, T T C K