-
On Learning what to Learn: heterogeneous observations of dynamics and establishing (possibly causal) relations among them
Authors:
David W. Sroczynski,
Felix Dietrich,
Eleni D. Koronaki,
Ronen Talmon,
Ronald R. Coifman,
Erik Bollt,
Ioannis G. Kevrekidis
Abstract:
Before we attempt to learn a function between two (sets of) observables of a physical process, we must first decide what the inputs and what the outputs of the desired function are going to be. Here we demonstrate two distinct, data-driven ways of initially deciding ``the right quantities'' to relate through such a function, and then proceed to learn it. This is accomplished by processing multiple…
▽ More
Before we attempt to learn a function between two (sets of) observables of a physical process, we must first decide what the inputs and what the outputs of the desired function are going to be. Here we demonstrate two distinct, data-driven ways of initially deciding ``the right quantities'' to relate through such a function, and then proceed to learn it. This is accomplished by processing multiple simultaneous heterogeneous data streams (ensembles of time series) from observations of a physical system: multiple observation processes of the system. We thus determine (a) what subsets of observables are common between the observation processes (and therefore observable from each other, relatable through a function); and (b) what information is unrelated to these common observables, and therefore particular to each observation process, and not contributing to the desired function. Any data-driven function approximation technique can subsequently be used to learn the input-output relation, from k-nearest neighbors and Geometric Harmonics to Gaussian Processes and Neural Networks. Two particular ``twists'' of the approach are discussed. The first has to do with the identifiability of particular quantities of interest from the measurements. We now construct mappings from a single set of observations of one process to entire level sets of measurements of the process, consistent with this single set. The second attempts to relate our framework to a form of causality: if one of the observation processes measures ``now'', while the second observation process measures ``in the future'', the function to be learned among what is common across observation processes constitutes a dynamical model for the system evolution.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Gappy local conformal auto-encoders for heterogeneous data fusion: in praise of rigidity
Authors:
Erez Peterfreund,
Iryna Burak,
Ofir Lindenbaum,
Jim Gimlett,
Felix Dietrich,
Ronald R. Coifman,
Ioannis G. Kevrekidis
Abstract:
Fusing measurements from multiple, heterogeneous, partial sources, observing a common object or process, poses challenges due to the increasing availability of numbers and types of sensors. In this work we propose, implement and validate an end-to-end computational pipeline in the form of a multiple-auto-encoder neural network architecture for this task. The inputs to the pipeline are several sets…
▽ More
Fusing measurements from multiple, heterogeneous, partial sources, observing a common object or process, poses challenges due to the increasing availability of numbers and types of sensors. In this work we propose, implement and validate an end-to-end computational pipeline in the form of a multiple-auto-encoder neural network architecture for this task. The inputs to the pipeline are several sets of partial observations, and the result is a globally consistent latent space, harmonizing (rigidifying, fusing) all measurements. The key enabler is the availability of multiple slightly perturbed measurements of each instance:, local measurement, "bursts", that allows us to estimate the local distortion induced by each instrument. We demonstrate the approach in a sequence of examples, starting with simple two-dimensional data sets and proceeding to a Wi-Fi localization problem and to the solution of a "dynamical puzzle" arising in spatio-temporal observations of the solutions of Partial Differential Equations.
△ Less
Submitted 20 December, 2023;
originally announced December 2023.
-
Hyperbolic Diffusion Embedding and Distance for Hierarchical Representation Learning
Authors:
Ya-Wei Eileen Lin,
Ronald R. Coifman,
Gal Mishne,
Ronen Talmon
Abstract:
Finding meaningful representations and distances of hierarchical data is important in many fields. This paper presents a new method for hierarchical data embedding and distance. Our method relies on combining diffusion geometry, a central approach to manifold learning, and hyperbolic geometry. Specifically, using diffusion geometry, we build multi-scale densities on the data, aimed to reveal their…
▽ More
Finding meaningful representations and distances of hierarchical data is important in many fields. This paper presents a new method for hierarchical data embedding and distance. Our method relies on combining diffusion geometry, a central approach to manifold learning, and hyperbolic geometry. Specifically, using diffusion geometry, we build multi-scale densities on the data, aimed to reveal their hierarchical structure, and then embed them into a product of hyperbolic spaces. We show theoretically that our embedding and distance recover the underlying hierarchical structure. In addition, we demonstrate the efficacy of the proposed method and its advantages compared to existing methods on graph embedding benchmarks and hierarchical datasets.
△ Less
Submitted 30 May, 2023;
originally announced May 2023.
-
A common variable minimax theorem for graphs
Authors:
Ronald R. Coifman,
Nicholas F. Marshall,
Stefan Steinerberger
Abstract:
Let $\mathcal{G} = \{G_1 = (V, E_1), \dots, G_m = (V, E_m)\}$ be a collection of $m$ graphs defined on a common set of vertices $V$ but with different edge sets $E_1, \dots, E_m$. Informally, a function $f :V \rightarrow \mathbb{R}$ is smooth with respect to $G_k = (V,E_k)$ if $f(u) \sim f(v)$ whenever $(u, v) \in E_k$. We study the problem of understanding whether there exists a nonconstant funct…
▽ More
Let $\mathcal{G} = \{G_1 = (V, E_1), \dots, G_m = (V, E_m)\}$ be a collection of $m$ graphs defined on a common set of vertices $V$ but with different edge sets $E_1, \dots, E_m$. Informally, a function $f :V \rightarrow \mathbb{R}$ is smooth with respect to $G_k = (V,E_k)$ if $f(u) \sim f(v)$ whenever $(u, v) \in E_k$. We study the problem of understanding whether there exists a nonconstant function that is smooth with respect to all graphs in $\mathcal{G}$, simultaneously, and how to find it if it exists.
△ Less
Submitted 30 July, 2021;
originally announced July 2021.
-
Doubly-Stochastic Normalization of the Gaussian Kernel is Robust to Heteroskedastic Noise
Authors:
Boris Landa,
Ronald R. Coifman,
Yuval Kluger
Abstract:
A fundamental step in many data-analysis techniques is the construction of an affinity matrix describing similarities between data points. When the data points reside in Euclidean space, a widespread approach is to from an affinity matrix by the Gaussian kernel with pairwise distances, and to follow with a certain normalization (e.g. the row-stochastic normalization or its symmetric variant). We d…
▽ More
A fundamental step in many data-analysis techniques is the construction of an affinity matrix describing similarities between data points. When the data points reside in Euclidean space, a widespread approach is to from an affinity matrix by the Gaussian kernel with pairwise distances, and to follow with a certain normalization (e.g. the row-stochastic normalization or its symmetric variant). We demonstrate that the doubly-stochastic normalization of the Gaussian kernel with zero main diagonal (i.e., no self loops) is robust to heteroskedastic noise. That is, the doubly-stochastic normalization is advantageous in that it automatically accounts for observations with different noise variances. Specifically, we prove that in a suitable high-dimensional setting where heteroskedastic noise does not concentrate too much in any particular direction in space, the resulting (doubly-stochastic) noisy affinity matrix converges to its clean counterpart with rate $m^{-1/2}$, where $m$ is the ambient dimension. We demonstrate this result numerically, and show that in contrast, the popular row-stochastic and symmetric normalizations behave unfavorably under heteroskedastic noise. Furthermore, we provide examples of simulated and experimental single-cell RNA sequence data with intrinsic heteroskedasticity, where the advantage of the doubly-stochastic normalization for exploratory analysis is evident.
△ Less
Submitted 25 January, 2021; v1 submitted 30 May, 2020;
originally announced June 2020.
-
LOCA: LOcal Conformal Autoencoder for standardized data coordinates
Authors:
Erez Peterfreund,
Ofir Lindenbaum,
Felix Dietrich,
Tom Bertalan,
Matan Gavish,
Ioannis G. Kevrekidis,
Ronald R. Coifman
Abstract:
We propose a deep-learning based method for obtaining standardized data coordinates from scientific measurements.Data observations are modeled as samples from an unknown, non-linear deformation of an underlying Riemannian manifold, which is parametrized by a few normalized latent variables. By leveraging a repeated measurement sampling strategy, we present a method for learning an embedding in…
▽ More
We propose a deep-learning based method for obtaining standardized data coordinates from scientific measurements.Data observations are modeled as samples from an unknown, non-linear deformation of an underlying Riemannian manifold, which is parametrized by a few normalized latent variables. By leveraging a repeated measurement sampling strategy, we present a method for learning an embedding in $\mathbb{R}^d$ that is isometric to the latent variables of the manifold. These data coordinates, being invariant under smooth changes of variables, enable matching between different instrumental observations of the same phenomenon. Our embedding is obtained using a LOcal Conformal Autoencoder (LOCA), an algorithm that constructs an embedding to rectify deformations by using a local z-scoring procedure while preserving relevant geometric information. We demonstrate the isometric embedding properties of LOCA on various model settings and observe that it exhibits promising interpolation and extrapolation capabilities. Finally, we apply LOCA to single-site Wi-Fi localization data, and to $3$-dimensional curved surface estimation based on a $2$-dimensional projection.
△ Less
Submitted 14 January, 2021; v1 submitted 15 April, 2020;
originally announced April 2020.
-
Co-manifold learning with missing data
Authors:
Gal Mishne,
Eric C. Chi,
Ronald R. Coifman
Abstract:
Representation learning is typically applied to only one mode of a data matrix, either its rows or columns. Yet in many applications, there is an underlying geometry to both the rows and the columns. We propose utilizing this coupled structure to perform co-manifold learning: uncovering the underlying geometry of both the rows and the columns of a given matrix, where we focus on a missing data set…
▽ More
Representation learning is typically applied to only one mode of a data matrix, either its rows or columns. Yet in many applications, there is an underlying geometry to both the rows and the columns. We propose utilizing this coupled structure to perform co-manifold learning: uncovering the underlying geometry of both the rows and the columns of a given matrix, where we focus on a missing data setting. Our unsupervised approach consists of three components. We first solve a family of optimization problems to estimate a complete matrix at multiple scales of smoothness. We then use this collection of smooth matrix estimates to compute pairwise distances on the rows and columns based on a new multi-scale metric that implicitly introduces a coupling between the rows and the columns. Finally, we construct row and column representations from these multi-scale metrics. We demonstrate that our approach outperforms competing methods in both data visualization and clustering.
△ Less
Submitted 16 October, 2018;
originally announced October 2018.
-
Two-sample Statistics Based on Anisotropic Kernels
Authors:
Xiuyuan Cheng,
Alexander Cloninger,
Ronald R. Coifman
Abstract:
The paper introduces a new kernel-based Maximum Mean Discrepancy (MMD) statistic for measuring the distance between two distributions given finitely-many multivariate samples. When the distributions are locally low-dimensional, the proposed test can be made more powerful to distinguish certain alternatives by incorporating local covariance matrices and constructing an anisotropic kernel. The kerne…
▽ More
The paper introduces a new kernel-based Maximum Mean Discrepancy (MMD) statistic for measuring the distance between two distributions given finitely-many multivariate samples. When the distributions are locally low-dimensional, the proposed test can be made more powerful to distinguish certain alternatives by incorporating local covariance matrices and constructing an anisotropic kernel. The kernel matrix is asymmetric; it computes the affinity between $n$ data points and a set of $n_R$ reference points, where $n_R$ can be drastically smaller than $n$. While the proposed statistic can be viewed as a special class of Reproducing Kernel Hilbert Space MMD, the consistency of the test is proved, under mild assumptions of the kernel, as long as $\|p-q\| \sqrt{n} \to \infty $, and a finite-sample lower bound of the testing power is obtained. Applications to flow cytometry and diffusion MRI datasets are demonstrated, which motivate the proposed approach to compare distributions.
△ Less
Submitted 30 August, 2018; v1 submitted 14 September, 2017;
originally announced September 2017.
-
Data-Driven Tree Transforms and Metrics
Authors:
Gal Mishne,
Ronen Talmon,
Israel Cohen,
Ronald R. Coifman,
Yuval Kluger
Abstract:
We consider the analysis of high dimensional data given in the form of a matrix with columns consisting of observations and rows consisting of features. Often the data is such that the observations do not reside on a regular grid, and the given order of the features is arbitrary and does not convey a notion of locality. Therefore, traditional transforms and metrics cannot be used for data organiza…
▽ More
We consider the analysis of high dimensional data given in the form of a matrix with columns consisting of observations and rows consisting of features. Often the data is such that the observations do not reside on a regular grid, and the given order of the features is arbitrary and does not convey a notion of locality. Therefore, traditional transforms and metrics cannot be used for data organization and analysis. In this paper, our goal is to organize the data by defining an appropriate representation and metric such that they respect the smoothness and structure underlying the data. We also aim to generalize the joint clustering of observations and features in the case the data does not fall into clear disjoint groups. For this purpose, we propose multiscale data-driven transforms and metrics based on trees. Their construction is implemented in an iterative refinement procedure that exploits the co-dependencies between features and observations. Beyond the organization of a single dataset, our approach enables us to transfer the organization learned from one dataset to another and to integrate several datasets together. We present an application to breast cancer gene expression analysis: learning metrics on the genes to cluster the tumor samples into cancer sub-types and validating the joint organization of both the genes and the samples. We demonstrate that using our approach to combine information from multiple gene expression cohorts, acquired by different profiling technologies, improves the clustering of tumor samples.
△ Less
Submitted 18 August, 2017;
originally announced August 2017.
-
No equations, no parameters, no variables: data, and the reconstruction of normal forms by learning informed observation geometries
Authors:
Or Yair,
Ronen Talmon,
Ronald R. Coifman,
Ioannis G. Kevrekidis
Abstract:
The discovery of physical laws consistent with empirical observations lies at the heart of (applied) science and engineering. These laws typically take the form of nonlinear differential equations depending on parameters, dynamical systems theory provides, through the appropriate normal forms, an "intrinsic", prototypical characterization of the types of dynamical regimes accessible to a given mod…
▽ More
The discovery of physical laws consistent with empirical observations lies at the heart of (applied) science and engineering. These laws typically take the form of nonlinear differential equations depending on parameters, dynamical systems theory provides, through the appropriate normal forms, an "intrinsic", prototypical characterization of the types of dynamical regimes accessible to a given model. Using an implementation of data-informed geometry learning we directly reconstruct the relevant "normal forms": a quantitative mapping from empirical observations to prototypical realizations of the underlying dynamics. Interestingly, the state variables and the parameters of these realizations are inferred from the empirical observations, without prior knowledge or understanding, they parametrize the dynamics {\em intrinsically}, without explicit reference to fundamental physical quantities.
△ Less
Submitted 30 November, 2016;
originally announced December 2016.
-
Provable approximation properties for deep neural networks
Authors:
Uri Shaham,
Alexander Cloninger,
Ronald R. Coifman
Abstract:
We discuss approximation of functions using deep neural nets. Given a function $f$ on a $d$-dimensional manifold $Γ\subset \mathbb{R}^m$, we construct a sparsely-connected depth-4 neural network and bound its error in approximating $f$. The size of the network depends on dimension and curvature of the manifold $Γ$, the complexity of $f$, in terms of its wavelet description, and only weakly on the…
▽ More
We discuss approximation of functions using deep neural nets. Given a function $f$ on a $d$-dimensional manifold $Γ\subset \mathbb{R}^m$, we construct a sparsely-connected depth-4 neural network and bound its error in approximating $f$. The size of the network depends on dimension and curvature of the manifold $Γ$, the complexity of $f$, in terms of its wavelet description, and only weakly on the ambient dimension $m$. Essentially, our network computes wavelet functions, which are computed from Rectified Linear Units (ReLU)
△ Less
Submitted 28 March, 2016; v1 submitted 24 September, 2015;
originally announced September 2015.
-
Bigeometric Organization of Deep Nets
Authors:
Alexander Cloninger,
Ronald R. Coifman,
Nicholas Downing,
Harlan M. Krumholz
Abstract:
In this paper, we build an organization of high-dimensional datasets that cannot be cleanly embedded into a low-dimensional representation due to missing entries and a subset of the features being irrelevant to modeling functions of interest. Our algorithm begins by defining coarse neighborhoods of the points and defining an expected empirical function value on these neighborhoods. We then generat…
▽ More
In this paper, we build an organization of high-dimensional datasets that cannot be cleanly embedded into a low-dimensional representation due to missing entries and a subset of the features being irrelevant to modeling functions of interest. Our algorithm begins by defining coarse neighborhoods of the points and defining an expected empirical function value on these neighborhoods. We then generate new non-linear features with deep net representations tuned to model the approximate function, and re-organize the geometry of the points with respect to the new representation. Finally, the points are locally z-scored to create an intrinsic geometric organization which is independent of the parameters of the deep net, a geometry designed to assure smoothness with respect to the empirical function. We examine this approach on data from the Center for Medicare and Medicaid Services Hospital Quality Initiative, and generate an intrinsic low-dimensional organization of the hospitals that is smooth with respect to an expert driven function of quality.
△ Less
Submitted 1 July, 2015;
originally announced July 2015.
-
Diffusion maps for changing data
Authors:
Ronald R. Coifman,
Matthew J. Hirn
Abstract:
Graph Laplacians and related nonlinear mappings into low dimensional spaces have been shown to be powerful tools for organizing high dimensional data. Here we consider a data set X in which the graph associated with it changes depending on some set of parameters. We analyze this type of data in terms of the diffusion distance and the corresponding diffusion map. As the data changes over the parame…
▽ More
Graph Laplacians and related nonlinear mappings into low dimensional spaces have been shown to be powerful tools for organizing high dimensional data. Here we consider a data set X in which the graph associated with it changes depending on some set of parameters. We analyze this type of data in terms of the diffusion distance and the corresponding diffusion map. As the data changes over the parameter space, the low dimensional embedding changes as well. We give a way to go between these embeddings, and furthermore, map them all into a common space, allowing one to track the evolution of X in its intrinsic geometry. A global diffusion distance is also defined, which gives a measure of the global behavior of the data over the parameter space. Approximation theorems in terms of randomly sampled data are presented, as are potential applications.
△ Less
Submitted 11 July, 2013; v1 submitted 3 September, 2012;
originally announced September 2012.
-
Bi-stochastic kernels via asymmetric affinity functions
Authors:
Ronald R. Coifman,
Matthew J. Hirn
Abstract:
In this short letter we present the construction of a bi-stochastic kernel p for an arbitrary data set X that is derived from an asymmetric affinity function α. The affinity function α measures the similarity between points in X and some reference set Y. Unlike other methods that construct bi-stochastic kernels via some convergent iteration process or through solving an optimization problem, the c…
▽ More
In this short letter we present the construction of a bi-stochastic kernel p for an arbitrary data set X that is derived from an asymmetric affinity function α. The affinity function α measures the similarity between points in X and some reference set Y. Unlike other methods that construct bi-stochastic kernels via some convergent iteration process or through solving an optimization problem, the construction presented here is quite simple. Furthermore, it can be viewed through the lens of out of sample extensions, making it useful for massive data sets.
△ Less
Submitted 11 July, 2013; v1 submitted 2 September, 2012;
originally announced September 2012.