-
Entropic Optimal Transport Eigenmaps for Nonlinear Alignment and Joint Embedding of High-Dimensional Datasets
Authors:
Boris Landa,
Yuval Kluger,
Rong Ma
Abstract:
Embedding high-dimensional data into a low-dimensional space is an indispensable component of data analysis. In numerous applications, it is necessary to align and jointly embed multiple datasets from different studies or experimental conditions. Such datasets may share underlying structures of interest but exhibit individual distortions, resulting in misaligned embeddings using traditional techni…
▽ More
Embedding high-dimensional data into a low-dimensional space is an indispensable component of data analysis. In numerous applications, it is necessary to align and jointly embed multiple datasets from different studies or experimental conditions. Such datasets may share underlying structures of interest but exhibit individual distortions, resulting in misaligned embeddings using traditional techniques. In this work, we propose \textit{Entropic Optimal Transport (EOT) eigenmaps}, a principled approach for aligning and jointly embedding a pair of datasets with theoretical guarantees. Our approach leverages the leading singular vectors of the EOT plan matrix between two datasets to extract their shared underlying structure and align the datasets accordingly in a common embedding space. We interpret our approach as an inter-data variant of the classical Laplacian eigenmaps and diffusion maps embeddings, showing that it enjoys many favorable analogous properties. We then analyze a data-generative model where two observed high-dimensional datasets share latent variables on a common low-dimensional manifold, but each dataset is subject to data-specific translation, scaling, nuisance structures, and noise. We show that in a high-dimensional asymptotic regime, the EOT plan recovers the shared manifold structure by approximating a kernel function evaluated at the locations of the latent variables. Subsequently, we provide a geometric interpretation of our embedding by relating it to the eigenfunctions of population-level operators encoding the density and geometry of the shared manifold. Finally, we showcase the performance of our approach for data integration and embedding through simulations and analyses of real-world biological data, demonstrating its advantages over alternative methods in challenging scenarios.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Exponential weight averaging as damped harmonic motion
Authors:
Jonathan Patsenker,
Henry Li,
Yuval Kluger
Abstract:
The exponential moving average (EMA) is a commonly used statistic for providing stable estimates of stochastic quantities in deep learning optimization. Recently, EMA has seen considerable use in generative models, where it is computed with respect to the model weights, and significantly improves the stability of the inference model during and after training. While the practice of weight averaging…
▽ More
The exponential moving average (EMA) is a commonly used statistic for providing stable estimates of stochastic quantities in deep learning optimization. Recently, EMA has seen considerable use in generative models, where it is computed with respect to the model weights, and significantly improves the stability of the inference model during and after training. While the practice of weight averaging at the end of training is well-studied and known to improve estimates of local optima, the benefits of EMA over the course of training is less understood. In this paper, we derive an explicit connection between EMA and a damped harmonic system between two particles, where one particle (the EMA weights) is drawn to the other (the model weights) via an idealized zero-length spring. We then leverage this physical analogy to analyze the effectiveness of EMA, and propose an improved training algorithm, which we call BELAY. Finally, we demonstrate theoretically and empirically several advantages enjoyed by BELAY over standard EMA.
△ Less
Submitted 20 October, 2023;
originally announced October 2023.
-
The Dyson Equalizer: Adaptive Noise Stabilization for Low-Rank Signal Detection and Recovery
Authors:
Boris Landa,
Yuval Kluger
Abstract:
Detecting and recovering a low-rank signal in a noisy data matrix is a fundamental task in data analysis. Typically, this task is addressed by inspecting and manipulating the spectrum of the observed data, e.g., thresholding the singular values of the data matrix at a certain critical level. This approach is well-established in the case of homoskedastic noise, where the noise variance is identical…
▽ More
Detecting and recovering a low-rank signal in a noisy data matrix is a fundamental task in data analysis. Typically, this task is addressed by inspecting and manipulating the spectrum of the observed data, e.g., thresholding the singular values of the data matrix at a certain critical level. This approach is well-established in the case of homoskedastic noise, where the noise variance is identical across the entries. However, in numerous applications, the noise can be heteroskedastic, where the noise characteristics may vary considerably across the rows and columns of the data. In this scenario, the spectral behavior of the noise can differ significantly from the homoskedastic case, posing various challenges for signal detection and recovery. To address these challenges, we develop an adaptive normalization procedure that equalizes the average noise variance across the rows and columns of a given data matrix. Our proposed procedure is data-driven and fully automatic, supporting a broad range of noise distributions, variance patterns, and signal structures. We establish that in many cases, this procedure enforces the standard spectral behavior of homoskedastic noise -- the Marchenko-Pastur (MP) law, allowing for simple and reliable detection of signal components. Furthermore, we demonstrate that our approach can substantially improve signal recovery in heteroskedastic settings by manipulating the spectrum after normalization. Lastly, we apply our method to single-cell RNA sequencing and spatial transcriptomics data, showcasing accurate fits to the MP law after normalization. Our approach relies on recent results in random matrix theory, which describe the resolvent of the noise via the so-called Dyson equation. By leveraging this relation, we can accurately infer the noise level in each row and each column directly from the resolvent of the data.
△ Less
Submitted 19 June, 2023;
originally announced June 2023.
-
Multi-modal Differentiable Unsupervised Feature Selection
Authors:
Junchen Yang,
Ofir Lindenbaum,
Yuval Kluger,
Ariel Jaffe
Abstract:
Multi-modal high throughput biological data presents a great scientific opportunity and a significant computational challenge. In multi-modal measurements, every sample is observed simultaneously by two or more sets of sensors. In such settings, many observed variables in both modalities are often nuisance and do not carry information about the phenomenon of interest. Here, we propose a multi-moda…
▽ More
Multi-modal high throughput biological data presents a great scientific opportunity and a significant computational challenge. In multi-modal measurements, every sample is observed simultaneously by two or more sets of sensors. In such settings, many observed variables in both modalities are often nuisance and do not carry information about the phenomenon of interest. Here, we propose a multi-modal unsupervised feature selection framework: identifying informative variables based on coupled high-dimensional measurements. Our method is designed to identify features associated with two types of latent low-dimensional structures: (i) shared structures that govern the observations in both modalities and (ii) differential structures that appear in only one modality. To that end, we propose two Laplacian-based scoring operators. We incorporate the scores with differentiable gates that mask nuisance features and enhance the accuracy of the structure captured by the graph Laplacian. The performance of the new scheme is illustrated using synthetic and real datasets, including an extended biological application to single-cell multi-omics.
△ Less
Submitted 16 March, 2023;
originally announced March 2023.
-
Autoregressive Generative Modeling with Noise Conditional Maximum Likelihood Estimation
Authors:
Henry Li,
Yuval Kluger
Abstract:
We introduce a simple modification to the standard maximum likelihood estimation (MLE) framework. Rather than maximizing a single unconditional likelihood of the data under the model, we maximize a family of \textit{noise conditional} likelihoods consisting of the data perturbed by a continuum of noise levels. We find that models trained this way are more robust to noise, obtain higher test likeli…
▽ More
We introduce a simple modification to the standard maximum likelihood estimation (MLE) framework. Rather than maximizing a single unconditional likelihood of the data under the model, we maximize a family of \textit{noise conditional} likelihoods consisting of the data perturbed by a continuum of noise levels. We find that models trained this way are more robust to noise, obtain higher test likelihoods, and generate higher quality images. They can also be sampled from via a novel score-based sampling scheme which combats the classical \textit{covariate shift} problem that occurs during sample generation in autoregressive models. Applying this augmentation to autoregressive image models, we obtain 3.32 bits per dimension on the ImageNet 64x64 dataset, and substantially improve the quality of generated samples in terms of the Frechet Inception distance (FID) -- from 37.50 to 12.09 on the CIFAR-10 dataset.
△ Less
Submitted 19 October, 2022;
originally announced October 2022.
-
ManiFeSt: Manifold-based Feature Selection for Small Data Sets
Authors:
David Cohen,
Tal Shnitzer,
Yuval Kluger,
Ronen Talmon
Abstract:
In this paper, we present a new method for few-sample supervised feature selection (FS). Our method first learns the manifold of the feature space of each class using kernels capturing multi-feature associations. Then, based on Riemannian geometry, a composite kernel is computed, extracting the differences between the learned feature associations. Finally, a FS score based on spectral analysis is…
▽ More
In this paper, we present a new method for few-sample supervised feature selection (FS). Our method first learns the manifold of the feature space of each class using kernels capturing multi-feature associations. Then, based on Riemannian geometry, a composite kernel is computed, extracting the differences between the learned feature associations. Finally, a FS score based on spectral analysis is proposed. Considering multi-feature associations makes our method multivariate by design. This in turn allows for the extraction of the hidden manifold underlying the features and avoids overfitting, facilitating few-sample FS. We showcase the efficacy of our method on illustrative examples and several benchmarks, where our method demonstrates higher accuracy in selecting the informative features compared to competing methods. In addition, we show that our FS leads to improved classification and better generalization when applied to test data.
△ Less
Submitted 18 July, 2022;
originally announced July 2022.
-
Neural Inverse Transform Sampler
Authors:
Henry Li,
Yuval Kluger
Abstract:
Any explicit functional representation $f$ of a density is hampered by two main obstacles when we wish to use it as a generative model: designing $f$ so that sampling is fast, and estimating $Z = \int f$ so that $Z^{-1}f$ integrates to 1. This becomes increasingly complicated as $f$ itself becomes complicated. In this paper, we show that when modeling one-dimensional conditional densities with a n…
▽ More
Any explicit functional representation $f$ of a density is hampered by two main obstacles when we wish to use it as a generative model: designing $f$ so that sampling is fast, and estimating $Z = \int f$ so that $Z^{-1}f$ integrates to 1. This becomes increasingly complicated as $f$ itself becomes complicated. In this paper, we show that when modeling one-dimensional conditional densities with a neural network, $Z$ can be exactly and efficiently computed by letting the network represent the cumulative distribution function of a target density, and applying a generalized fundamental theorem of calculus. We also derive a fast algorithm for sampling from the resulting representation by the inverse transform method. By extending these principles to higher dimensions, we introduce the \textbf{Neural Inverse Transform Sampler (NITS)}, a novel deep learning framework for modeling and sampling from general, multidimensional, compactly-supported probability densities. NITS is a highly expressive density estimator that boasts end-to-end differentiability, fast sampling, and exact and cheap likelihood evaluation. We demonstrate the applicability of NITS by applying it to realistic, high-dimensional density estimation tasks: likelihood-based generative modeling on the CIFAR-10 dataset, and density estimation on the UCI suite of benchmark datasets, where NITS produces compelling results rivaling or surpassing the state of the art.
△ Less
Submitted 22 June, 2022;
originally announced June 2022.
-
Deep Unsupervised Feature Selection by Discarding Nuisance and Correlated Features
Authors:
Uri Shaham,
Ofir Lindenbaum,
Jonathan Svirsky,
Yuval Kluger
Abstract:
Modern datasets often contain large subsets of correlated features and nuisance features, which are not or loosely related to the main underlying structures of the data. Nuisance features can be identified using the Laplacian score criterion, which evaluates the importance of a given feature via its consistency with the Graph Laplacians' leading eigenvectors. We demonstrate that in the presence of…
▽ More
Modern datasets often contain large subsets of correlated features and nuisance features, which are not or loosely related to the main underlying structures of the data. Nuisance features can be identified using the Laplacian score criterion, which evaluates the importance of a given feature via its consistency with the Graph Laplacians' leading eigenvectors. We demonstrate that in the presence of large numbers of nuisance features, the Laplacian must be computed on the subset of selected features rather than on the complete feature set. To do this, we propose a fully differentiable approach for unsupervised feature selection, utilizing the Laplacian score criterion to avoid the selection of nuisance features. We employ an autoencoder architecture to cope with correlated features, trained to reconstruct the data from the subset of selected features. Building on the recently proposed concrete layer that allows controlling for the number of selected features via architectural design, simplifying the optimization process. Experimenting on several real-world datasets, we demonstrate that our proposed approach outperforms similar approaches designed to avoid only correlated or nuisance features, but not both. Several state-of-the-art clustering results are reported.
△ Less
Submitted 11 October, 2021;
originally announced October 2021.
-
Probabilistic Robust Autoencoders for Outlier Detection
Authors:
Ofir Lindenbaum,
Yariv Aizenbud,
Yuval Kluger
Abstract:
Anomalies (or outliers) are prevalent in real-world empirical observations and potentially mask important underlying structures. Accurate identification of anomalous samples is crucial for the success of downstream data analysis tasks. To automatically identify anomalies, we propose Probabilistic Robust AutoEncoder (PRAE). PRAE aims to simultaneously remove outliers and identify a low-dimensional…
▽ More
Anomalies (or outliers) are prevalent in real-world empirical observations and potentially mask important underlying structures. Accurate identification of anomalous samples is crucial for the success of downstream data analysis tasks. To automatically identify anomalies, we propose Probabilistic Robust AutoEncoder (PRAE). PRAE aims to simultaneously remove outliers and identify a low-dimensional representation for the inlier samples. We first present the Robust AutoEncoder (RAE) objective as a minimization problem for splitting the data into inliers and outliers. Our objective is designed to exclude outliers while including a subset of samples (inliers) that can be effectively reconstructed using an AutoEncoder (AE). RAE minimizes the autoencoder's reconstruction error while incorporating as many samples as possible. This could be formulated via regularization by subtracting an $\ell_0$ norm counting the number of selected samples from the reconstruction term. Unfortunately, this leads to an intractable combinatorial problem. Therefore, we propose two probabilistic relaxations of RAE, which are differentiable and alleviate the need for a combinatorial search. We prove that the solution to the PRAE problem is equivalent to the solution of RAE. We use synthetic data to show that PRAE can accurately remove outliers in a wide range of contamination levels. Finally, we demonstrate that using PRAE for anomaly detection leads to state-of-the-art results on various benchmark datasets.
△ Less
Submitted 24 August, 2022; v1 submitted 1 October, 2021;
originally announced October 2021.
-
Locally Sparse Neural Networks for Tabular Biomedical Data
Authors:
Junchen Yang,
Ofir Lindenbaum,
Yuval Kluger
Abstract:
Tabular datasets with low-sample-size or many variables are prevalent in biomedicine. Practitioners in this domain prefer linear or tree-based models over neural networks since the latter are harder to interpret and tend to overfit when applied to tabular datasets. To address these neural networks' shortcomings, we propose an intrinsically interpretable network for heterogeneous biomedical data. W…
▽ More
Tabular datasets with low-sample-size or many variables are prevalent in biomedicine. Practitioners in this domain prefer linear or tree-based models over neural networks since the latter are harder to interpret and tend to overfit when applied to tabular datasets. To address these neural networks' shortcomings, we propose an intrinsically interpretable network for heterogeneous biomedical data. We design a locally sparse neural network where the local sparsity is learned to identify the subset of most relevant features for each sample. This sample-specific sparsity is predicted via a \textit{gating} network, which is trained in tandem with the \textit{prediction} network. By forcing the model to select a subset of the most informative features for each sample, we reduce model overfitting in low-sample-size data and obtain an interpretable model. We demonstrate that our method outperforms state-of-the-art models when applied to synthetic or real-world biomedical datasets using extensive experiments. Furthermore, the proposed framework dramatically outperforms existing schemes when evaluating its interpretability capabilities. Finally, we demonstrate the applicability of our model to two important biomedical tasks: survival analysis and marker gene identification.
△ Less
Submitted 7 February, 2022; v1 submitted 11 June, 2021;
originally announced June 2021.
-
On the Efficient Evaluation of the Azimuthal Fourier Components of the Green's Function for Helmholtz's Equation in Cylindrical Coordinates
Authors:
James Garritano,
Yuval Kluger,
Vladimir Rokhlin,
Kirill Serkh
Abstract:
In this manuscript, we develop an efficient algorithm to evaluate the azimuthal Fourier components of the Green's function for the Helmholtz equation in cylindrical coordinates. A computationally efficient algorithm for this modal Green's function is essential for solvers for electromagnetic scattering from bodies of revolution (e.g., radar cross sections, antennas). Current algorithms to evaluate…
▽ More
In this manuscript, we develop an efficient algorithm to evaluate the azimuthal Fourier components of the Green's function for the Helmholtz equation in cylindrical coordinates. A computationally efficient algorithm for this modal Green's function is essential for solvers for electromagnetic scattering from bodies of revolution (e.g., radar cross sections, antennas). Current algorithms to evaluate this modal Green's function become computationally intractable when the source and target are close or when the wavenumber is large. Furthermore, most state of the art methods cannot be easily parallelized. In this manuscript, we present an algorithm for evaluating the modal Green's function that has performance independent of both source-to-target proximity and wavenumber, and whose cost grows as $O(m)$, where $m$ is the Fourier mode. Furthermore, our algorithm is embarrassingly parallelizable.
△ Less
Submitted 25 April, 2021;
originally announced April 2021.
-
Biwhitening Reveals the Rank of a Count Matrix
Authors:
Boris Landa,
Thomas T. C. K. Zhang,
Yuval Kluger
Abstract:
Estimating the rank of a corrupted data matrix is an important task in data analysis, most notably for choosing the number of components in PCA. Significant progress on this task was achieved using random matrix theory by characterizing the spectral properties of large noise matrices. However, utilizing such tools is not straightforward when the data matrix consists of count random variables, e.g.…
▽ More
Estimating the rank of a corrupted data matrix is an important task in data analysis, most notably for choosing the number of components in PCA. Significant progress on this task was achieved using random matrix theory by characterizing the spectral properties of large noise matrices. However, utilizing such tools is not straightforward when the data matrix consists of count random variables, e.g., Poisson, in which case the noise can be heteroskedastic with an unknown variance in each entry. In this work, we consider a Poisson random matrix with independent entries, and propose a simple procedure termed \textit{biwhitening} for estimating the rank of the underlying signal matrix (i.e., the Poisson parameter matrix) without any prior knowledge. Our approach is based on the key observation that one can scale the rows and columns of the data matrix simultaneously so that the spectrum of the corresponding noise agrees with the standard Marchenko-Pastur (MP) law, justifying the use of the MP upper edge as a threshold for rank selection. Importantly, the required scaling factors can be estimated directly from the observations by solving a matrix scaling problem via the Sinkhorn-Knopp algorithm. Aside from the Poisson, our approach is extended to families of distributions that satisfy a quadratic relation between the mean and the variance, such as the generalized Poisson, binomial, negative binomial, gamma, and many others. This quadratic relation can also account for missing entries in the data. We conduct numerical experiments that corroborate our theoretical findings, and showcase the advantage of our approach for rank estimation in challenging regimes. Furthermore, we demonstrate the favorable performance of our approach on several real datasets of single-cell RNA sequencing (scRNA-seq), High-Throughput Chromosome Conformation Capture (Hi-C), and document topic modeling.
△ Less
Submitted 2 November, 2021; v1 submitted 25 March, 2021;
originally announced March 2021.
-
Spectral Top-Down Recovery of Latent Tree Models
Authors:
Yariv Aizenbud,
Ariel Jaffe,
Meng Wang,
Amber Hu,
Noah Amsel,
Boaz Nadler,
Joseph T. Chang,
Yuval Kluger
Abstract:
Modeling the distribution of high dimensional data by a latent tree graphical model is a prevalent approach in multiple scientific domains. A common task is to infer the underlying tree structure, given only observations of its terminal nodes. Many algorithms for tree recovery are computationally intensive, which limits their applicability to trees of moderate size. For large trees, a common appro…
▽ More
Modeling the distribution of high dimensional data by a latent tree graphical model is a prevalent approach in multiple scientific domains. A common task is to infer the underlying tree structure, given only observations of its terminal nodes. Many algorithms for tree recovery are computationally intensive, which limits their applicability to trees of moderate size. For large trees, a common approach, termed divide-and-conquer, is to recover the tree structure in two steps. First, recover the structure separately of multiple, possibly random subsets of the terminal nodes. Second, merge the resulting subtrees to form a full tree. Here, we develop Spectral Top-Down Recovery (STDR), a deterministic divide-and-conquer approach to infer large latent tree models. Unlike previous methods, STDR partitions the terminal nodes in a non random way, based on the Fiedler vector of a suitable Laplacian matrix related to the observed nodes. We prove that under certain conditions, this partitioning is consistent with the tree structure. This, in turn, leads to a significantly simpler merging procedure of the small subtrees. We prove that STDR is statistically consistent and bound the number of samples required to accurately recover the tree with high probability. Using simulated data from several common tree models in phylogenetics, we demonstrate that STDR has a significant advantage in terms of runtime, with improved or similar accuracy.
△ Less
Submitted 7 December, 2021; v1 submitted 25 February, 2021;
originally announced February 2021.
-
Local Two-Sample Testing over Graphs and Point-Clouds by Random-Walk Distributions
Authors:
Boris Landa,
Rihao Qu,
Joseph Chang,
Yuval Kluger
Abstract:
Rejecting the null hypothesis in two-sample testing is a fundamental tool for scientific discovery. Yet, aside from concluding that two samples do not come from the same probability distribution, it is often of interest to characterize how the two distributions differ. Given samples from two densities $f_1$ and $f_0$, we consider the task of localizing occurrences of the inequality $f_1 > f_0$. To…
▽ More
Rejecting the null hypothesis in two-sample testing is a fundamental tool for scientific discovery. Yet, aside from concluding that two samples do not come from the same probability distribution, it is often of interest to characterize how the two distributions differ. Given samples from two densities $f_1$ and $f_0$, we consider the task of localizing occurrences of the inequality $f_1 > f_0$. To avoid the challenges associated with high-dimensional space, we propose a general hypothesis testing framework where hypotheses are formulated adaptively to the data by conditioning on the combined sample from the two densities. We then investigate a special case of this framework where the notion of locality is captured by a random walk on a weighted graph constructed over this combined sample. We derive a tractable testing procedure for this case employing a type of scan statistic, and provide non-asymptotic lower bounds on the power and accuracy of our test to detect whether $f_1>f_0$ in a local sense. Furthermore, we characterize the test's consistency according to a certain problem-hardness parameter, and show that our test achieves the minimax detection rate for this parameter. We conduct numerical experiments to validate our method, and demonstrate our approach on two real-world applications: detecting and localizing arsenic well contamination across the United States, and analyzing two-sample single-cell RNA sequencing data from melanoma patients.
△ Less
Submitted 7 September, 2021; v1 submitted 6 November, 2020;
originally announced November 2020.
-
$\ell_0$-based Sparse Canonical Correlation Analysis
Authors:
Ofir Lindenbaum,
Moshe Salhov,
Amir Averbuch,
Yuval Kluger
Abstract:
Canonical Correlation Analysis (CCA) models are powerful for studying the associations between two sets of variables. The canonically correlated representations, termed \textit{canonical variates} are widely used in unsupervised learning to analyze unlabeled multi-modal registered datasets. Despite their success, CCA models may break (or overfit) if the number of variables in either of the modalit…
▽ More
Canonical Correlation Analysis (CCA) models are powerful for studying the associations between two sets of variables. The canonically correlated representations, termed \textit{canonical variates} are widely used in unsupervised learning to analyze unlabeled multi-modal registered datasets. Despite their success, CCA models may break (or overfit) if the number of variables in either of the modalities exceeds the number of samples. Moreover, often a significant fraction of the variables measures modality-specific information, and thus removing them is beneficial for identifying the \textit{canonically correlated variates}. Here, we propose $\ell_0$-CCA, a method for learning correlated representations based on sparse subsets of variables from two observed modalities. Sparsity is obtained by multiplying the input variables by stochastic gates, whose parameters are learned together with the CCA weights via an $\ell_0$-regularized correlation loss. We further propose $\ell_0$-Deep CCA for solving the problem of non-linear sparse CCA by modeling the correlated representations using deep nets. We demonstrate the efficacy of the method using several synthetic and real examples. Most notably, by gating nuisance input variables, our approach improves the extracted representations compared to other linear, non-linear and sparse CCA-based models.
△ Less
Submitted 8 June, 2021; v1 submitted 12 October, 2020;
originally announced October 2020.
-
Differentiable Unsupervised Feature Selection based on a Gated Laplacian
Authors:
Ofir Lindenbaum,
Uri Shaham,
Jonathan Svirsky,
Erez Peterfreund,
Yuval Kluger
Abstract:
Scientific observations may consist of a large number of variables (features). Identifying a subset of meaningful features is often ignored in unsupervised learning, despite its potential for unraveling clear patterns hidden in the ambient space. In this paper, we present a method for unsupervised feature selection, and we demonstrate its use for the task of clustering. We propose a differentiable…
▽ More
Scientific observations may consist of a large number of variables (features). Identifying a subset of meaningful features is often ignored in unsupervised learning, despite its potential for unraveling clear patterns hidden in the ambient space. In this paper, we present a method for unsupervised feature selection, and we demonstrate its use for the task of clustering. We propose a differentiable loss function that combines the Laplacian score, which favors low-frequency features, with a gating mechanism for feature selection. We improve the Laplacian score, by replacing it with a gated variant computed on a subset of features. This subset is obtained using a continuous approximation of Bernoulli variables whose parameters are trained to gate the full feature space. We mathematically motivate the proposed approach and demonstrate that in the high noise regime, it is crucial to compute the Laplacian on the gated inputs, rather than on the full feature set. Experimental demonstration of the efficacy of the proposed approach and its advantage over current baselines is provided using several real-world examples.
△ Less
Submitted 9 November, 2020; v1 submitted 9 July, 2020;
originally announced July 2020.
-
Doubly-Stochastic Normalization of the Gaussian Kernel is Robust to Heteroskedastic Noise
Authors:
Boris Landa,
Ronald R. Coifman,
Yuval Kluger
Abstract:
A fundamental step in many data-analysis techniques is the construction of an affinity matrix describing similarities between data points. When the data points reside in Euclidean space, a widespread approach is to from an affinity matrix by the Gaussian kernel with pairwise distances, and to follow with a certain normalization (e.g. the row-stochastic normalization or its symmetric variant). We d…
▽ More
A fundamental step in many data-analysis techniques is the construction of an affinity matrix describing similarities between data points. When the data points reside in Euclidean space, a widespread approach is to from an affinity matrix by the Gaussian kernel with pairwise distances, and to follow with a certain normalization (e.g. the row-stochastic normalization or its symmetric variant). We demonstrate that the doubly-stochastic normalization of the Gaussian kernel with zero main diagonal (i.e., no self loops) is robust to heteroskedastic noise. That is, the doubly-stochastic normalization is advantageous in that it automatically accounts for observations with different noise variances. Specifically, we prove that in a suitable high-dimensional setting where heteroskedastic noise does not concentrate too much in any particular direction in space, the resulting (doubly-stochastic) noisy affinity matrix converges to its clean counterpart with rate $m^{-1/2}$, where $m$ is the ambient dimension. We demonstrate this result numerically, and show that in contrast, the popular row-stochastic and symmetric normalizations behave unfavorably under heteroskedastic noise. Furthermore, we provide examples of simulated and experimental single-cell RNA sequence data with intrinsic heteroskedasticity, where the advantage of the doubly-stochastic normalization for exploratory analysis is evident.
△ Less
Submitted 25 January, 2021; v1 submitted 30 May, 2020;
originally announced June 2020.
-
Spectral neighbor joining for reconstruction of latent tree models
Authors:
Ariel Jaffe,
Noah Amsel,
Yariv Aizenbud,
Boaz Nadler,
Joseph T. Chang,
Yuval Kluger
Abstract:
A common assumption in multiple scientific applications is that the distribution of observed data can be modeled by a latent tree graphical model. An important example is phylogenetics, where the tree models the evolutionary lineages of a set of observed organisms. Given a set of independent realizations of the random variables at the leaves of the tree, a key challenge is to infer the underlying…
▽ More
A common assumption in multiple scientific applications is that the distribution of observed data can be modeled by a latent tree graphical model. An important example is phylogenetics, where the tree models the evolutionary lineages of a set of observed organisms. Given a set of independent realizations of the random variables at the leaves of the tree, a key challenge is to infer the underlying tree topology. In this work we develop Spectral Neighbor Joining (SNJ), a novel method to recover the structure of latent tree graphical models. Given a matrix that contains a measure of similarity between all pairs of observed variables, SNJ computes a spectral measure of cohesion between groups of observed variables. We prove that SNJ is consistent, and derive a sufficient condition for correct tree recovery from an estimated similarity matrix. Combining this condition with a concentration of measure result on the similarity matrix, we bound the number of samples required to recover the tree with high probability. We illustrate via extensive simulations that in comparison to several other reconstruction methods, SNJ requires fewer samples to accurately recover trees with a large number of leaves or long edges.
△ Less
Submitted 22 September, 2020; v1 submitted 28 February, 2020;
originally announced February 2020.
-
The Spectral Underpinning of word2vec
Authors:
Ariel Jaffe,
Yuval Kluger,
Ofir Lindenbaum,
Jonathan Patsenker,
Erez Peterfreund,
Stefan Steinerberger
Abstract:
word2vec due to Mikolov \textit{et al.} (2013) is a word embedding method that is widely used in natural language processing. Despite its great success and frequent use, theoretical justification is still lacking. The main contribution of our paper is to propose a rigorous analysis of the highly nonlinear functional of word2vec. Our results suggest that word2vec may be primarily driven by an under…
▽ More
word2vec due to Mikolov \textit{et al.} (2013) is a word embedding method that is widely used in natural language processing. Despite its great success and frequent use, theoretical justification is still lacking. The main contribution of our paper is to propose a rigorous analysis of the highly nonlinear functional of word2vec. Our results suggest that word2vec may be primarily driven by an underlying spectral method. This insight may open the door to obtaining provable guarantees for word2vec. We support these findings by numerical simulations. One fascinating open question is whether the nonlinear properties of word2vec that are not captured by the spectral method are beneficial and, if so, by what mechanism.
△ Less
Submitted 9 November, 2020; v1 submitted 27 February, 2020;
originally announced February 2020.
-
Heavy-tailed kernels reveal a finer cluster structure in t-SNE visualisations
Authors:
Dmitry Kobak,
George Linderman,
Stefan Steinerberger,
Yuval Kluger,
Philipp Berens
Abstract:
T-distributed stochastic neighbour embedding (t-SNE) is a widely used data visualisation technique. It differs from its predecessor SNE by the low-dimensional similarity kernel: the Gaussian kernel was replaced by the heavy-tailed Cauchy kernel, solving the "crowding problem" of SNE. Here, we develop an efficient implementation of t-SNE for a $t$-distribution kernel with an arbitrary degree of fre…
▽ More
T-distributed stochastic neighbour embedding (t-SNE) is a widely used data visualisation technique. It differs from its predecessor SNE by the low-dimensional similarity kernel: the Gaussian kernel was replaced by the heavy-tailed Cauchy kernel, solving the "crowding problem" of SNE. Here, we develop an efficient implementation of t-SNE for a $t$-distribution kernel with an arbitrary degree of freedom $ν$, with $ν\to\infty$ corresponding to SNE and $ν=1$ corresponding to the standard t-SNE. Using theoretical analysis and toy examples, we show that $ν<1$ can further reduce the crowding problem and reveal finer cluster structure that is invisible in standard t-SNE. We further demonstrate the striking effect of heavier-tailed kernels on large real-life data sets such as MNIST, single-cell RNA-sequencing data, and the HathiTrust library. We use domain knowledge to confirm that the revealed clusters are meaningful. Overall, we argue that modifying the tail heaviness of the t-SNE kernel can yield additional insight into the cluster structure of the data.
△ Less
Submitted 4 April, 2019; v1 submitted 15 February, 2019;
originally announced February 2019.
-
Feature Selection using Stochastic Gates
Authors:
Yutaro Yamada,
Ofir Lindenbaum,
Sahand Negahban,
Yuval Kluger
Abstract:
Feature selection problems have been extensively studied for linear estimation, for instance, Lasso, but less emphasis has been placed on feature selection for non-linear functions. In this study, we propose a method for feature selection in high-dimensional non-linear function estimation problems. The new procedure is based on minimizing the $\ell_0$ norm of the vector of indicator variables that…
▽ More
Feature selection problems have been extensively studied for linear estimation, for instance, Lasso, but less emphasis has been placed on feature selection for non-linear functions. In this study, we propose a method for feature selection in high-dimensional non-linear function estimation problems. The new procedure is based on minimizing the $\ell_0$ norm of the vector of indicator variables that represent if a feature is selected or not. Our approach relies on the continuous relaxation of Bernoulli distributions, which allows our model to learn the parameters of the approximate Bernoulli distributions via gradient descent. This general framework simultaneously minimizes a loss function while selecting relevant features. Furthermore, we provide an information-theoretic justification of incorporating Bernoulli distribution into our approach and demonstrate the potential of the approach on synthetic and real-life applications.
△ Less
Submitted 26 July, 2020; v1 submitted 9 October, 2018;
originally announced October 2018.
-
Defending against Adversarial Images using Basis Functions Transformations
Authors:
Uri Shaham,
James Garritano,
Yutaro Yamada,
Ethan Weinberger,
Alex Cloninger,
Xiuyuan Cheng,
Kelly Stanton,
Yuval Kluger
Abstract:
We study the effectiveness of various approaches that defend against adversarial attacks on deep networks via manipulations based on basis function representations of images. Specifically, we experiment with low-pass filtering, PCA, JPEG compression, low resolution wavelet approximation, and soft-thresholding. We evaluate these defense techniques using three types of popular attacks in black, gray…
▽ More
We study the effectiveness of various approaches that defend against adversarial attacks on deep networks via manipulations based on basis function representations of images. Specifically, we experiment with low-pass filtering, PCA, JPEG compression, low resolution wavelet approximation, and soft-thresholding. We evaluate these defense techniques using three types of popular attacks in black, gray and white-box settings. Our results show JPEG compression tends to outperform the other tested defenses in most of the settings considered, in addition to soft-thresholding, which performs well in specific cases, and yields a more mild decrease in accuracy on benign examples. In addition, we also mathematically derive a novel white-box attack in which the adversarial perturbation is composed only of terms corresponding a to pre-determined subset of the basis functions, of which a "low frequency attack" is a special case.
△ Less
Submitted 16 April, 2018; v1 submitted 28 March, 2018;
originally announced March 2018.
-
Learning Binary Latent Variable Models: A Tensor Eigenpair Approach
Authors:
Ariel Jaffe,
Roi Weiss,
Shai Carmi,
Yuval Kluger,
Boaz Nadler
Abstract:
Latent variable models with hidden binary units appear in various applications. Learning such models, in particular in the presence of noise, is a challenging computational problem. In this paper we propose a novel spectral approach to this problem, based on the eigenvectors of both the second order moment matrix and third order moment tensor of the observed data. We prove that under mild non-dege…
▽ More
Latent variable models with hidden binary units appear in various applications. Learning such models, in particular in the presence of noise, is a challenging computational problem. In this paper we propose a novel spectral approach to this problem, based on the eigenvectors of both the second order moment matrix and third order moment tensor of the observed data. We prove that under mild non-degeneracy conditions, our method consistently estimates the model parameters at the optimal parametric rate. Our tensor-based method generalizes previous orthogonal tensor decomposition approaches, where the hidden units were assumed to be either statistically independent or mutually exclusive. We illustrate the consistency of our method on simulated data and demonstrate its usefulness in learning a common model for population mixtures in genetics.
△ Less
Submitted 26 February, 2018;
originally announced February 2018.
-
SpectralNet: Spectral Clustering using Deep Neural Networks
Authors:
Uri Shaham,
Kelly Stanton,
Henry Li,
Boaz Nadler,
Ronen Basri,
Yuval Kluger
Abstract:
Spectral clustering is a leading and popular technique in unsupervised data analysis. Two of its major limitations are scalability and generalization of the spectral embedding (i.e., out-of-sample-extension). In this paper we introduce a deep learning approach to spectral clustering that overcomes the above shortcomings. Our network, which we call SpectralNet, learns a map that embeds input data p…
▽ More
Spectral clustering is a leading and popular technique in unsupervised data analysis. Two of its major limitations are scalability and generalization of the spectral embedding (i.e., out-of-sample-extension). In this paper we introduce a deep learning approach to spectral clustering that overcomes the above shortcomings. Our network, which we call SpectralNet, learns a map that embeds input data points into the eigenspace of their associated graph Laplacian matrix and subsequently clusters them. We train SpectralNet using a procedure that involves constrained stochastic optimization. Stochastic optimization allows it to scale to large datasets, while the constraints, which are implemented using a special-purpose output layer, allow us to keep the network output orthogonal. Moreover, the map learned by SpectralNet naturally generalizes the spectral embedding to unseen data points. To further improve the quality of the clustering, we replace the standard pairwise Gaussian affinities with affinities leaned from unlabeled data using a Siamese network. Additional improvement can be achieved by applying the network to code representations produced, e.g., by standard autoencoders. Our end-to-end learning procedure is fully unsupervised. In addition, we apply VC dimension theory to derive a lower bound on the size of SpectralNet. State-of-the-art clustering results are reported on the Reuters dataset. Our implementation is publicly available at https://github.com/kstant0725/SpectralNet .
△ Less
Submitted 4 April, 2018; v1 submitted 4 January, 2018;
originally announced January 2018.
-
Efficient Algorithms for t-distributed Stochastic Neighborhood Embedding
Authors:
George C. Linderman,
Manas Rachh,
Jeremy G. Hoskins,
Stefan Steinerberger,
Yuval Kluger
Abstract:
t-distributed Stochastic Neighborhood Embedding (t-SNE) is a method for dimensionality reduction and visualization that has become widely popular in recent years. Efficient implementations of t-SNE are available, but they scale poorly to datasets with hundreds of thousands to millions of high dimensional data-points. We present Fast Fourier Transform-accelerated Interpolation-based t-SNE (FIt-SNE)…
▽ More
t-distributed Stochastic Neighborhood Embedding (t-SNE) is a method for dimensionality reduction and visualization that has become widely popular in recent years. Efficient implementations of t-SNE are available, but they scale poorly to datasets with hundreds of thousands to millions of high dimensional data-points. We present Fast Fourier Transform-accelerated Interpolation-based t-SNE (FIt-SNE), which dramatically accelerates the computation of t-SNE. The most time-consuming step of t-SNE is a convolution that we accelerate by interpolating onto an equispaced grid and subsequently using the fast Fourier transform to perform the convolution. We also optimize the computation of input similarities in high dimensions using multi-threaded approximate nearest neighbors. We further present a modification to t-SNE called "late exaggeration," which allows for easier identification of clusters in t-SNE embeddings. Finally, for datasets that cannot be loaded into the memory, we present out-of-core randomized principal component analysis (oocPCA), so that the top principal components of a dataset can be computed without ever fully loading the matrix, hence allowing for t-SNE of large datasets to be computed on resource-limited machines.
△ Less
Submitted 24 December, 2017;
originally announced December 2017.
-
Randomized Near Neighbor Graphs, Giant Components, and Applications in Data Science
Authors:
George C. Linderman,
Gal Mishne,
Yuval Kluger,
Stefan Steinerberger
Abstract:
If we pick $n$ random points uniformly in $[0,1]^d$ and connect each point to its $k-$nearest neighbors, then it is well known that there exists a giant connected component with high probability. We prove that in $[0,1]^d$ it suffices to connect every point to $ c_{d,1} \log{\log{n}}$ points chosen randomly among its $ c_{d,2} \log{n}-$nearest neighbors to ensure a giant component of size…
▽ More
If we pick $n$ random points uniformly in $[0,1]^d$ and connect each point to its $k-$nearest neighbors, then it is well known that there exists a giant connected component with high probability. We prove that in $[0,1]^d$ it suffices to connect every point to $ c_{d,1} \log{\log{n}}$ points chosen randomly among its $ c_{d,2} \log{n}-$nearest neighbors to ensure a giant component of size $n - o(n)$ with high probability. This construction yields a much sparser random graph with $\sim n \log\log{n}$ instead of $\sim n \log{n}$ edges that has comparable connectivity properties. This result has nontrivial implications for problems in data science where an affinity matrix is constructed: instead of picking the $k-$nearest neighbors, one can often pick $k' \ll k$ random points out of the $k-$nearest neighbors without sacrificing efficiency. This can massively simplify and accelerate computation, we illustrate this with several numerical examples.
△ Less
Submitted 13 November, 2017;
originally announced November 2017.
-
Data-Driven Tree Transforms and Metrics
Authors:
Gal Mishne,
Ronen Talmon,
Israel Cohen,
Ronald R. Coifman,
Yuval Kluger
Abstract:
We consider the analysis of high dimensional data given in the form of a matrix with columns consisting of observations and rows consisting of features. Often the data is such that the observations do not reside on a regular grid, and the given order of the features is arbitrary and does not convey a notion of locality. Therefore, traditional transforms and metrics cannot be used for data organiza…
▽ More
We consider the analysis of high dimensional data given in the form of a matrix with columns consisting of observations and rows consisting of features. Often the data is such that the observations do not reside on a regular grid, and the given order of the features is arbitrary and does not convey a notion of locality. Therefore, traditional transforms and metrics cannot be used for data organization and analysis. In this paper, our goal is to organize the data by defining an appropriate representation and metric such that they respect the smoothness and structure underlying the data. We also aim to generalize the joint clustering of observations and features in the case the data does not fall into clear disjoint groups. For this purpose, we propose multiscale data-driven transforms and metrics based on trees. Their construction is implemented in an iterative refinement procedure that exploits the co-dependencies between features and observations. Beyond the organization of a single dataset, our approach enables us to transfer the organization learned from one dataset to another and to integrate several datasets together. We present an application to breast cancer gene expression analysis: learning metrics on the genes to cluster the tumor samples into cancer sub-types and validating the joint organization of both the genes and the samples. We demonstrate that using our approach to combine information from multiple gene expression cohorts, acquired by different profiling technologies, improves the clustering of tumor samples.
△ Less
Submitted 18 August, 2017;
originally announced August 2017.
-
Mahalanonbis Distance Informed by Clustering
Authors:
Almog Lahav,
Ronen Talmon,
Yuval Kluger
Abstract:
A fundamental question in data analysis, machine learning and signal processing is how to compare between data points. The choice of the distance metric is specifically challenging for high-dimensional data sets, where the problem of meaningfulness is more prominent (e.g. the Euclidean distance between images). In this paper, we propose to exploit a property of high-dimensional data that is usuall…
▽ More
A fundamental question in data analysis, machine learning and signal processing is how to compare between data points. The choice of the distance metric is specifically challenging for high-dimensional data sets, where the problem of meaningfulness is more prominent (e.g. the Euclidean distance between images). In this paper, we propose to exploit a property of high-dimensional data that is usually ignored - which is the structure stemming from the relationships between the coordinates. Specifically we show that organizing similar coordinates in clusters can be exploited for the construction of the Mahalanobis distance between samples. When the observable samples are generated by a nonlinear transformation of hidden variables, the Mahalanobis distance allows the recovery of the Euclidean distances in the hidden space.We illustrate the advantage of our approach on a synthetic example where the discovery of clusters of correlated coordinates improves the estimation of the principal directions of the samples. Our method was applied to real data of gene expression for lung adenocarcinomas (lung cancer). By using the proposed metric we found a partition of subjects to risk groups with a good separation between their Kaplan-Meier survival plot.
△ Less
Submitted 13 August, 2017;
originally announced August 2017.
-
Unsupervised Ensemble Regression
Authors:
Omer Dror,
Boaz Nadler,
Erhan Bilal,
Yuval Kluger
Abstract:
Consider a regression problem where there is no labeled data and the only observations are the predictions $f_i(x_j)$ of $m$ experts $f_{i}$ over many samples $x_j$. With no knowledge on the accuracy of the experts, is it still possible to accurately estimate the unknown responses $y_{j}$? Can one still detect the least or most accurate experts? In this work we propose a framework to study these q…
▽ More
Consider a regression problem where there is no labeled data and the only observations are the predictions $f_i(x_j)$ of $m$ experts $f_{i}$ over many samples $x_j$. With no knowledge on the accuracy of the experts, is it still possible to accurately estimate the unknown responses $y_{j}$? Can one still detect the least or most accurate experts? In this work we propose a framework to study these questions, based on the assumption that the $m$ experts have uncorrelated deviations from the optimal predictor. Assuming the first two moments of the response are known, we develop methods to detect the best and worst regressors, and derive U-PCR, a novel principal components approach for unsupervised ensemble regression. We provide theoretical support for U-PCR and illustrate its improved accuracy over the ensemble mean and median on a variety of regression problems.
△ Less
Submitted 8 March, 2017;
originally announced March 2017.
-
Randomized algorithms for distributed computation of principal component analysis and singular value decomposition
Authors:
Huamin Li,
Yuval Kluger,
Mark Tygert
Abstract:
Randomized algorithms provide solutions to two ubiquitous problems: (1) the distributed calculation of a principal component analysis or singular value decomposition of a highly rectangular matrix, and (2) the distributed calculation of a low-rank approximation (in the form of a singular value decomposition) to an arbitrary matrix. Carefully honed algorithms yield results that are uniformly superi…
▽ More
Randomized algorithms provide solutions to two ubiquitous problems: (1) the distributed calculation of a principal component analysis or singular value decomposition of a highly rectangular matrix, and (2) the distributed calculation of a low-rank approximation (in the form of a singular value decomposition) to an arbitrary matrix. Carefully honed algorithms yield results that are uniformly superior to those of the stock, deterministic implementations in Spark (the popular platform for distributed computation); in particular, whereas the stock software will without warning return left singular vectors that are far from numerically orthonormal, a significantly burnished randomized implementation generates left singular vectors that are numerically orthonormal to nearly the machine precision.
△ Less
Submitted 1 January, 2018; v1 submitted 27 December, 2016;
originally announced December 2016.
-
Removal of Batch Effects using Distribution-Matching Residual Networks
Authors:
Uri Shaham,
Kelly P. Stanton,
Jun Zhao,
Huamin Li,
Khadir Raddassi,
Ruth Montgomery,
Yuval Kluger
Abstract:
Sources of variability in experimentally derived data include measurement error in addition to the physical phenomena of interest. This measurement error is a combination of systematic components, originating from the measuring instrument, and random measurement errors. Several novel biological technologies, such as mass cytometry and single-cell RNA-seq, are plagued with systematic errors that ma…
▽ More
Sources of variability in experimentally derived data include measurement error in addition to the physical phenomena of interest. This measurement error is a combination of systematic components, originating from the measuring instrument, and random measurement errors. Several novel biological technologies, such as mass cytometry and single-cell RNA-seq, are plagued with systematic errors that may severely affect statistical analysis if the data is not properly calibrated. We propose a novel deep learning approach for removing systematic batch effects. Our method is based on a residual network, trained to minimize the Maximum Mean Discrepancy (MMD) between the multivariate distributions of two replicates, measured in different batches. We apply our method to mass cytometry and single-cell RNA-seq datasets, and demonstrate that it effectively attenuates batch effects.
△ Less
Submitted 8 January, 2018; v1 submitted 13 October, 2016;
originally announced October 2016.
-
DeepSurv: Personalized Treatment Recommender System Using A Cox Proportional Hazards Deep Neural Network
Authors:
Jared Katzman,
Uri Shaham,
Jonathan Bates,
Alexander Cloninger,
Tingting Jiang,
Yuval Kluger
Abstract:
Medical practitioners use survival models to explore and understand the relationships between patients' covariates (e.g. clinical and genetic features) and the effectiveness of various treatment options. Standard survival models like the linear Cox proportional hazards model require extensive feature engineering or prior medical knowledge to model treatment interaction at an individual level. Whil…
▽ More
Medical practitioners use survival models to explore and understand the relationships between patients' covariates (e.g. clinical and genetic features) and the effectiveness of various treatment options. Standard survival models like the linear Cox proportional hazards model require extensive feature engineering or prior medical knowledge to model treatment interaction at an individual level. While nonlinear survival methods, such as neural networks and survival forests, can inherently model these high-level interaction terms, they have yet to be shown as effective treatment recommender systems. We introduce DeepSurv, a Cox proportional hazards deep neural network and state-of-the-art survival method for modeling interactions between a patient's covariates and treatment effectiveness in order to provide personalized treatment recommendations. We perform a number of experiments training DeepSurv on simulated and real survival data. We demonstrate that DeepSurv performs as well as or better than other state-of-the-art survival models and validate that DeepSurv successfully models increasingly complex relationships between a patient's covariates and their risk of failure. We then show how DeepSurv models the relationship between a patient's features and effectiveness of different treatment options to show how DeepSurv can be used to provide individual treatment recommendations. Finally, we train DeepSurv on real clinical studies to demonstrate how it's personalized treatment recommendations would increase the survival time of a set of patients. The predictive and modeling capabilities of DeepSurv will enable medical researchers to use deep neural networks as a tool in their exploration, understanding, and prediction of the effects of a patient's characteristics on their risk of failure.
△ Less
Submitted 8 August, 2017; v1 submitted 2 June, 2016;
originally announced June 2016.
-
A Deep Learning Approach to Unsupervised Ensemble Learning
Authors:
Uri Shaham,
Xiuyuan Cheng,
Omer Dror,
Ariel Jaffe,
Boaz Nadler,
Joseph Chang,
Yuval Kluger
Abstract:
We show how deep learning methods can be applied in the context of crowdsourcing and unsupervised ensemble learning. First, we prove that the popular model of Dawid and Skene, which assumes that all classifiers are conditionally independent, is {\em equivalent} to a Restricted Boltzmann Machine (RBM) with a single hidden node. Hence, under this model, the posterior probabilities of the true labels…
▽ More
We show how deep learning methods can be applied in the context of crowdsourcing and unsupervised ensemble learning. First, we prove that the popular model of Dawid and Skene, which assumes that all classifiers are conditionally independent, is {\em equivalent} to a Restricted Boltzmann Machine (RBM) with a single hidden node. Hence, under this model, the posterior probabilities of the true labels can be instead estimated via a trained RBM. Next, to address the more general case, where classifiers may strongly violate the conditional independence assumption, we propose to apply RBM-based Deep Neural Net (DNN). Experimental results on various simulated and real-world datasets demonstrate that our proposed DNN approach outperforms other state-of-the-art methods, in particular when the data violates the conditional independence assumption.
△ Less
Submitted 6 February, 2016;
originally announced February 2016.
-
Unsupervised Ensemble Learning with Dependent Classifiers
Authors:
Ariel Jaffe,
Ethan Fetaya,
Boaz Nadler,
Tingting Jiang,
Yuval Kluger
Abstract:
In unsupervised ensemble learning, one obtains predictions from multiple sources or classifiers, yet without knowing the reliability and expertise of each source, and with no labeled data to assess it. The task is to combine these possibly conflicting predictions into an accurate meta-learner. Most works to date assumed perfect diversity between the different sources, a property known as condition…
▽ More
In unsupervised ensemble learning, one obtains predictions from multiple sources or classifiers, yet without knowing the reliability and expertise of each source, and with no labeled data to assess it. The task is to combine these possibly conflicting predictions into an accurate meta-learner. Most works to date assumed perfect diversity between the different sources, a property known as conditional independence. In realistic scenarios, however, this assumption is often violated, and ensemble learners based on it can be severely sub-optimal. The key challenges we address in this paper are:\ (i) how to detect, in an unsupervised manner, strong violations of conditional independence; and (ii) construct a suitable meta-learner. To this end we introduce a statistical model that allows for dependencies between classifiers. Our main contributions are the development of novel unsupervised methods to detect strongly dependent classifiers, better estimate their accuracies, and construct an improved meta-learner. Using both artificial and real datasets, we showcase the importance of taking classifier dependencies into account and the competitive performance of our approach.
△ Less
Submitted 23 February, 2016; v1 submitted 20 October, 2015;
originally announced October 2015.
-
An implementation of a randomized algorithm for principal component analysis
Authors:
Arthur Szlam,
Yuval Kluger,
Mark Tygert
Abstract:
Recent years have witnessed intense development of randomized methods for low-rank approximation. These methods target principal component analysis (PCA) and the calculation of truncated singular value decompositions (SVD). The present paper presents an essentially black-box, fool-proof implementation for Mathworks' MATLAB, a popular software platform for numerical computation. As illustrated via…
▽ More
Recent years have witnessed intense development of randomized methods for low-rank approximation. These methods target principal component analysis (PCA) and the calculation of truncated singular value decompositions (SVD). The present paper presents an essentially black-box, fool-proof implementation for Mathworks' MATLAB, a popular software platform for numerical computation. As illustrated via several tests, the randomized algorithms for low-rank approximation outperform or at least match the classical techniques (such as Lanczos iterations) in basically all respects: accuracy, computational efficiency (both speed and memory usage), ease-of-use, parallelizability, and reliability. However, the classical procedures remain the methods of choice for estimating spectral norms, and are far superior for calculating the least singular values and corresponding singular vectors (or singular subspaces).
△ Less
Submitted 10 December, 2014;
originally announced December 2014.
-
Estimating the Accuracies of Multiple Classifiers Without Labeled Data
Authors:
Ariel Jaffe,
Boaz Nadler,
Yuval Kluger
Abstract:
In various situations one is given only the predictions of multiple classifiers over a large unlabeled test data. This scenario raises the following questions: Without any labeled data and without any a-priori knowledge about the reliability of these different classifiers, is it possible to consistently and computationally efficiently estimate their accuracies? Furthermore, also in a completely un…
▽ More
In various situations one is given only the predictions of multiple classifiers over a large unlabeled test data. This scenario raises the following questions: Without any labeled data and without any a-priori knowledge about the reliability of these different classifiers, is it possible to consistently and computationally efficiently estimate their accuracies? Furthermore, also in a completely unsupervised manner, can one construct a more accurate unsupervised ensemble classifier? In this paper, focusing on the binary case, we present simple, computationally efficient algorithms to solve these questions. Furthermore, under standard classifier independence assumptions, we prove our methods are consistent and study their asymptotic error. Our approach is spectral, based on the fact that the off-diagonal entries of the classifiers' covariance matrix and 3-d tensor are rank-one. We illustrate the competitive performance of our algorithms via extensive experiments on both artificial and real datasets.
△ Less
Submitted 30 October, 2014; v1 submitted 29 July, 2014;
originally announced July 2014.
-
Ranking and combining multiple predictors without labeled data
Authors:
Fabio Parisi,
Francesco Strino,
Boaz Nadler,
Yuval Kluger
Abstract:
In a broad range of classification and decision making problems, one is given the advice or predictions of several classifiers, of unknown reliability, over multiple questions or queries. This scenario is different from the standard supervised setting, where each classifier accuracy can be assessed using available labeled data, and raises two questions: given only the predictions of several classi…
▽ More
In a broad range of classification and decision making problems, one is given the advice or predictions of several classifiers, of unknown reliability, over multiple questions or queries. This scenario is different from the standard supervised setting, where each classifier accuracy can be assessed using available labeled data, and raises two questions: given only the predictions of several classifiers over a large set of unlabeled test data, is it possible to a) reliably rank them; and b) construct a meta-classifier more accurate than most classifiers in the ensemble? Here we present a novel spectral approach to address these questions. First, assuming conditional independence between classifiers, we show that the off-diagonal entries of their covariance matrix correspond to a rank-one matrix. Moreover, the classifiers can be ranked using the leading eigenvector of this covariance matrix, as its entries are proportional to their balanced accuracies. Second, via a linear approximation to the maximum likelihood estimator, we derive the Spectral Meta-Learner (SML), a novel ensemble classifier whose weights are equal to this eigenvector entries. On both simulated and real data, SML typically achieves a higher accuracy than most classifiers in the ensemble and can provide a better starting point than majority voting, for estimating the maximum likelihood solution. Furthermore, SML is robust to the presence of small malicious groups of classifiers designed to veer the ensemble prediction away from the (unknown) ground truth.
△ Less
Submitted 24 November, 2013; v1 submitted 13 March, 2013;
originally announced March 2013.
-
TrAp: a Tree Approach for Fingerprinting Subclonal Tumor Composition
Authors:
Francesco Strino,
Fabio Parisi,
Mariann Micsinai,
Yuval Kluger
Abstract:
Revealing the clonal composition of a single tumor is essential for identifying cell subpopulations with metastatic potential in primary tumors or with resistance to therapies in metastatic tumors. Sequencing technologies provide an overview of an aggregate of numerous cells, rather than subclonal-specific quantification of aberrations such as single nucleotide variants (SNVs). Computational appro…
▽ More
Revealing the clonal composition of a single tumor is essential for identifying cell subpopulations with metastatic potential in primary tumors or with resistance to therapies in metastatic tumors. Sequencing technologies provide an overview of an aggregate of numerous cells, rather than subclonal-specific quantification of aberrations such as single nucleotide variants (SNVs). Computational approaches to de-mix a single collective signal from the mixed cell population of a tumor sample into its individual components are currently not available. Herein we propose a framework for deconvolving data from a single genome-wide experiment to infer the composition, abundance and evolutionary paths of the underlying cell subpopulations of a tumor. The method is based on the plausible biological assumption that tumor progression is an evolutionary process where each individual aberration event stems from a unique subclone and is present in all its descendants subclones. We have developed an efficient algorithm (TrAp) for solving this mixture problem. In silico analyses show that TrAp correctly deconvolves mixed subpopulations when the number of subpopulations and the measurement errors are moderate. We demonstrate the applicability of the method using tumor karyotypes and somatic hypermutation datasets. We applied TrAp to SNV frequency profile from Exome-Seq experiment of a renal cell carcinoma tumor sample and compared the mutational profile of the inferred subpopulations to the mutational profiles of twenty single cells of the same tumor. Despite the large experimental noise, specific co-occurring mutations found in clones inferred by TrAp are also present in some of these single cells. Finally, we deconvolve Exome-Seq data from three distinct metastases from different body compartments of one melanoma patient and exhibit the evolutionary relationships of their subpopulations.
△ Less
Submitted 9 January, 2013;
originally announced January 2013.
-
Pair production in a strong electric field: an initial value problem in quantum field theory
Authors:
Y. Kluger,
J. M. Eisenberg,
B. Svetitsky
Abstract:
We review recent achievements in the solution of the initial-value problem for quantum back-reaction in scalar and spinor QED. The problem is formulated and solved in the semiclassical mean-field approximation for a homogeneous, time-dependent electric field. Our primary motivation in examining back-reaction has to do with applications to theoretical models of production of the quark-gluon plasm…
▽ More
We review recent achievements in the solution of the initial-value problem for quantum back-reaction in scalar and spinor QED. The problem is formulated and solved in the semiclassical mean-field approximation for a homogeneous, time-dependent electric field. Our primary motivation in examining back-reaction has to do with applications to theoretical models of production of the quark-gluon plasma, though we here address practicable solutions for back-reaction in general. We review the application of the method of adiabatic regularization to the Klein-Gordon and Dirac fields in order to renormalize the expectation value of the current and derive a finite coupled set of ordinary differential equations for the time evolution of the system. Three time scales are involved in the problem and therefore caution is needed to achieve numerical stability for this system. Several physical features, like plasma oscillations and plateaus in the current, appear in the solution. From the plateau of the electric current one can estimate the number of pairs before the onset of plasma oscillations, while the plasma oscillations themselves yield the number of particles from the plasma frequency.
We compare the field-theory solution to a simple model based on a relativistic Boltzmann-Vlasov equation, with a particle production source term inferred from the Schwinger particle creation rate and a Pauli-blocking (or Bose-enhancement) factor. This model reproduces very well the time behavior of the electric field and the creation rate of charged pairs of the semiclassical calculation. It therefore provides a simple intuitive understanding of the nature of the solution since nearly all the physical features can be expressed in terms of the classical distribution function.
△ Less
Submitted 23 November, 2003;
originally announced November 2003.
-
The Quantum Vlasov Equation and its Markov Limit
Authors:
Yuval Kluger,
Emil Mottola,
Judah M. Eisenberg
Abstract:
The adiabatic particle number in mean field theory obeys a quantum Vlasov equation which is nonlocal in time. For weak, slowly varying electric fields this particle number can be identified with the single particle distribution function in phase space, and its time rate of change is the appropriate effective source term for the Boltzmann-Vlasov equation. By analyzing the evolution of the particl…
▽ More
The adiabatic particle number in mean field theory obeys a quantum Vlasov equation which is nonlocal in time. For weak, slowly varying electric fields this particle number can be identified with the single particle distribution function in phase space, and its time rate of change is the appropriate effective source term for the Boltzmann-Vlasov equation. By analyzing the evolution of the particle number we exhibit the time structure of the particle creation process in a constant electric field, and derive the local form of the source term due to pair creation. In order to capture the secular Schwinger creation rate, the source term requires an asymptotic expansion which is uniform in time, and whose longitudinal momentum dependence can be approximated by a delta function only on long time scales. The local Vlasov source term amounts to a kind of Markov limit of field theory, where information about quantum phase correlations in the created pairs is ignored and a reversible Hamiltonian evolution is replaced by an irreversible kinetic one. This replacement has a precise counterpart in the density matrix description, where it corresponds to disregarding the rapidly varying off-diagonal terms in the adiabatic number basis and treating the more slowly varying diagonal elements as the probabilities of creating pairs in a stochastic process. A numerical comparison between the quantum and local kinetic approaches to the dynamical backreaction problem shows remarkably good agreement, even in quite strong electric fields, over a large range of times.
△ Less
Submitted 19 March, 1998; v1 submitted 17 March, 1998;
originally announced March 1998.
-
Dilepton from Disoriented Chiral Condensates
Authors:
V. Koch,
J. Randrup,
X. N. Wang,
Y. Kluger
Abstract:
Disoriented chiral condensates are manifested as long wavelength pionic oscillations and their interaction with the thermal environment can be a significant source of dileptons. We calculate the yield of such dilepton production within the linear sigma model and illustrate the basic features of the dilepton spectrum in a schematic model. We find that the dilepton yield with invariant mass near a…
▽ More
Disoriented chiral condensates are manifested as long wavelength pionic oscillations and their interaction with the thermal environment can be a significant source of dileptons. We calculate the yield of such dilepton production within the linear sigma model and illustrate the basic features of the dilepton spectrum in a schematic model. We find that the dilepton yield with invariant mass near and below $2m_π$ due to the soft pion modes can be up to two orders of magnitude larger than the corresponding equilibrium yield. We conclude with a discussion on how this enhancement can be detected by present dilepton experiments.
△ Less
Submitted 17 December, 1997;
originally announced December 1997.
-
Dileptons from Disoriented Chiral Condensates
Authors:
Y. Kluger,
V. Koch,
J. Randrup,
X. N. Wang
Abstract:
Disoriented chiral condensates or long wavelength pionic oscillations and their interaction with the thermal environment can be a significant source of dileptons. We calculate the yield of such dilepton production within the linear sigma model, both in a quantal mean-field treatment and in a semi-classical approximation. We then illustrate the basic features of the dilepton spectrum in a schemat…
▽ More
Disoriented chiral condensates or long wavelength pionic oscillations and their interaction with the thermal environment can be a significant source of dileptons. We calculate the yield of such dilepton production within the linear sigma model, both in a quantal mean-field treatment and in a semi-classical approximation. We then illustrate the basic features of the dilepton spectrum in a schematic model. We find that dilepton yield with invariant mass near and below $2m_π$ due to the soft pion modes can be up to two orders of magnitude larger than the corresponding equilibrium yield.
△ Less
Submitted 10 April, 1997;
originally announced April 1997.
-
Nonequilibrium Dynamics of Symmetry Breaking in Lambda Phi^4 Field Theory
Authors:
Fred Cooper,
Salman Habib,
Yuval Kluger,
Emil Mottola
Abstract:
The time evolution of O(N) symmetric lambda Phi^4 scalar field theory is studied in the large N limit. In this limit the <Phi> mean field and two-point correlation function <Phi Phi> evolve together as a self-consistent closed Hamiltonian system, characterized by a Gaussian density matrix. The static part of the effective Hamiltonian defines the True Effective Potential U_eff for configurations…
▽ More
The time evolution of O(N) symmetric lambda Phi^4 scalar field theory is studied in the large N limit. In this limit the <Phi> mean field and two-point correlation function <Phi Phi> evolve together as a self-consistent closed Hamiltonian system, characterized by a Gaussian density matrix. The static part of the effective Hamiltonian defines the True Effective Potential U_eff for configurations far from thermal equilibrium. Numerically solving the time evolution equations for energy densities corresponding to a quench in the unstable spinodal region, we find results quite different from what might be inferred from the equilibrium free energy ``effective'' potential F. Typical time evolutions show effectively irreversible energy flow from the coherent mean fields to the quantum fluctuating modes, due to the creation of massless Goldstone bosons near threshold. The plasma frequency and collisionless damping rate of the mean fields are calculated in terms of the particle number density by a linear response analysis and compared with the numerical results. Dephasing of the fluctuations leads also to the growth of an effective entropy and the transition from quantum to classical behavior of the ensemble. In addition to casting some light on fundamental issues of nonequilibrium quantum statistical mechanics, the general framework presented in this work may be applied to a study of the dynamics of second order phase transitions in a wide variety of Landau-Ginsburg systems described by a scalar order parameter.
△ Less
Submitted 14 October, 1996;
originally announced October 1996.
-
Anomalous Transverse Distribution of Pions as a signal for the production of DCC's
Authors:
F. Cooper,
Y. Kluger,
E. Mottola
Abstract:
We give evidence that the production of DCC's during a non-equilibrium phase transition can lead to an anomalous transverse distribution of secondary pions when compared to a more conventional boost invariant hydrodynamic flow in local thermal equilibrium. Our results pertain to the linear $σ$ model,treated in leading order in large-$N$, in a boost invariant approximation. We also show that the…
▽ More
We give evidence that the production of DCC's during a non-equilibrium phase transition can lead to an anomalous transverse distribution of secondary pions when compared to a more conventional boost invariant hydrodynamic flow in local thermal equilibrium. Our results pertain to the linear $σ$ model,treated in leading order in large-$N$, in a boost invariant approximation. We also show that the interpolating number density of the field theory calculation plays the role of a classical relativistic phase space number distribution in determining the momentum distribution of pions in the center of mass frame.
△ Less
Submitted 10 April, 1996;
originally announced April 1996.
-
Dissipation and Decoherence in Mean Field Theory
Authors:
Salman Habib,
Yuval Kluger,
Emil Mottola,
Juan Pablo Paz
Abstract:
The time evolution of a closed system of mean fields and fluctuations is Hamiltonian, with the canonical variables parameterizing the general time-dependent Gaussian density matrix of the system. Yet, the evolution manifests both quantum decoherence and apparent irreversibility of energy flow from the coherent mean fields to fluctuating quantum modes. Using scalar QED as an example we show how t…
▽ More
The time evolution of a closed system of mean fields and fluctuations is Hamiltonian, with the canonical variables parameterizing the general time-dependent Gaussian density matrix of the system. Yet, the evolution manifests both quantum decoherence and apparent irreversibility of energy flow from the coherent mean fields to fluctuating quantum modes. Using scalar QED as an example we show how this collisionless damping and decoherence may be understood as the result of {\em dephasing} of the rapidly varying fluctuations and particle production in the time varying mean field.
△ Less
Submitted 28 September, 1995;
originally announced September 1995.
-
Quantum evolution of the disoriented chiral condensates
Authors:
Y. Kluger,
F. Cooper,
E. Mottola,
J. P. Paz,
A. Kovner
Abstract:
We study the dynamics of the chiral phase transition expected during the expansion of the quark-gluon plasma produced in a high energy hadron or heavy ion collision, using the $O(4)$ linear sigma model in the mean field approximation. Imposing boost invariant initial conditions at an initial proper time $τ_0$ and starting from an approximate equilibrium configuration, we investigate the possibilit…
▽ More
We study the dynamics of the chiral phase transition expected during the expansion of the quark-gluon plasma produced in a high energy hadron or heavy ion collision, using the $O(4)$ linear sigma model in the mean field approximation. Imposing boost invariant initial conditions at an initial proper time $τ_0$ and starting from an approximate equilibrium configuration, we investigate the possibility of formation of disoriented chiral condensate during the expansion. In order to create large domains of disoriented chiral condensates low-momentum instabilities have to last for long enough periods of time. Our simulations show no instabilities for an initial thermal configuration. For some of the out-of-equilibrium initial states studied, the fluctuation in the number of particles with low transverse momenta become large at late proper times.
△ Less
Submitted 1 March, 1995;
originally announced March 1995.
-
Semiquantum Chaos and the Large N Expansion
Authors:
Fred Cooper,
John Dawson,
Salman Habib,
Yuval Kluger,
Dawn Meredith,
Harvey Shepard
Abstract:
We consider the dynamical system consisting of a quantum degree of freedom $A$ interacting with $N$ quantum oscillators described by the Lagrangian \bq L = {1\over 2}\dot{A}^2 + \sum_{i=1}^{N} \left\{{1\over 2}\dot{x}_i^2 - {1\over 2}( m^2 + e^2 A^2)x_i^2 \right\}. \eq In the limit $N \rightarrow \infty$, with $e^2 N$ fixed, the quantum fluctuations in $A$ are of order $1/N$. In this limit, the…
▽ More
We consider the dynamical system consisting of a quantum degree of freedom $A$ interacting with $N$ quantum oscillators described by the Lagrangian \bq L = {1\over 2}\dot{A}^2 + \sum_{i=1}^{N} \left\{{1\over 2}\dot{x}_i^2 - {1\over 2}( m^2 + e^2 A^2)x_i^2 \right\}. \eq In the limit $N \rightarrow \infty$, with $e^2 N$ fixed, the quantum fluctuations in $A$ are of order $1/N$. In this limit, the $x$ oscillators behave as harmonic oscillators with a time dependent mass determined by the solution of a semiclassical equation for the expectation value $\VEV{A(t)}$. This system can be described, when $\VEV{x(t)}= 0$, by a classical Hamiltonian for the variables $G(t) = \VEV{x^2(t)}$, $\dot{G}(t)$, $A_c(t) = \VEV{A(t)}$, and $\dot{A_c}(t)$. The dynamics of this latter system turns out to be chaotic. We propose to study the nature of this large-$N$ limit by considering both the exact quantum system as well as by studying an expansion in powers of $1/N$ for the equations of motion using the closed time path formalism of quantum dynamics.
△ Less
Submitted 8 November, 1994;
originally announced November 1994.
-
Evolution of the Disoriented Chiral Condensates in the Mean Field Approximation
Authors:
Yuval Kluger
Abstract:
We study the dynamics of the chiral phase transition expected during the expansion of the quark-gluon plasma produced in a high energy hadron or heavy ion collision, using the $O(4)$ linear sigma model in the mean field approximation. Starting from an approximate equilibrium configuration at an initial proper time $τ$ in the disordered phase, we study the transition to the ordered broken symmetr…
▽ More
We study the dynamics of the chiral phase transition expected during the expansion of the quark-gluon plasma produced in a high energy hadron or heavy ion collision, using the $O(4)$ linear sigma model in the mean field approximation. Starting from an approximate equilibrium configuration at an initial proper time $τ$ in the disordered phase, we study the transition to the ordered broken symmetry phase as the system expands and cools. We give results for the proper time evolution of the effective pion mass and for the pion two point correlation function. We investigate the possibility of disoriented chiral condensate being formed during the expansion. In order to create large domains of disoriented chiral condensates low-momentum instabilities have to last for long enough periods of time. Our simulations show no instabilities for an initial thermal configuration. For the far-of-equilibrium cases studied, the instabilities are formed during the initial stages of the expansion survive for short proper times. For slow expansion rates even such configurations do not develop instabilities.
△ Less
Submitted 15 August, 1994; v1 submitted 14 August, 1994;
originally announced August 1994.
-
Non-Equilibrium Quantum Fields in the Large N Expansion
Authors:
Fred Cooper,
Salman Habib,
Yuval Kluger,
Emil Mottola,
Juan Pablo Paz,
Paul R. Anderson
Abstract:
An effective action technique for the time evolution of a closed system consisting of one or more mean fields interacting with their quantum fluctuations is presented. By marrying large $N$ expansion methods to the Schwinger-Keldysh closed time path (CTP) formulation of the quantum effective action, causality of the resulting equations of motion is ensured and a systematic, energy conserving and…
▽ More
An effective action technique for the time evolution of a closed system consisting of one or more mean fields interacting with their quantum fluctuations is presented. By marrying large $N$ expansion methods to the Schwinger-Keldysh closed time path (CTP) formulation of the quantum effective action, causality of the resulting equations of motion is ensured and a systematic, energy conserving and gauge invariant expansion about the quasi-classical mean field(s) in powers of $1/N$ developed. The general method is exposed in two specific examples, $O(N)$ symmetric scalar $ł\F^4$ theory and Quantum Electrodynamics (QED) with $N$ fermion fields. The $ł\F^4$ case is well suited to the numerical study of the real time dynamics of phase transitions characterized by a scalar order parameter. In QED the technique may be used to study the quantum non-equilibrium effects of pair creation in strong electric fields and the scattering and transport processes in a relativistic $e^+e^-$ plasma. A simple renormalization scheme that makes practical the numerical solution of the equations of motion of these and other field theories is described.
△ Less
Submitted 23 May, 1994;
originally announced May 1994.
-
Non-equilibrium evolution of the disoriented chiral condensate in heavy-ion collisions
Authors:
Yuval Kluger
Abstract:
We study the dynamics of the chiral phase transition expected during the expansion of the quark-gluon plasma produced in a high energy hadron or heavy ion collision in the $O(4)$ linear sigma model to leading order in a large $N$ expansion for strong coupling constants. Starting from an approximate equilibrium configuration at an initial proper time $τ$ in the disordered phase, we study the tran…
▽ More
We study the dynamics of the chiral phase transition expected during the expansion of the quark-gluon plasma produced in a high energy hadron or heavy ion collision in the $O(4)$ linear sigma model to leading order in a large $N$ expansion for strong coupling constants. Starting from an approximate equilibrium configuration at an initial proper time $τ$ in the disordered phase, we study the transition to the ordered broken symmetry phase as the system expands and cools. We give results for the proper time evolution of the effective pion mass, the order parameter $<σ>$ as well as for the pion two point correlation function expressed in terms of a time dependent phase space number density and pair correlation density. We investigate the possibility of disoriented chiral condensate being formed during the expansion. In order to create large domains of disoriented chiral condensates low-momentum instabilities have to last for long enough periods of time. Our simulations show that instabilities that are formed during the initial stages of the expansion survive for proper times that are at most $3\,fm/c$.
△ Less
Submitted 12 May, 1994;
originally announced May 1994.