Search | arXiv e-print repository

Probabilistic Robust Autoencoders for Outlier Detection

Authors: Ofir Lindenbaum, Yariv Aizenbud, Yuval Kluger

Abstract: Anomalies (or outliers) are prevalent in real-world empirical observations and potentially mask important underlying structures. Accurate identification of anomalous samples is crucial for the success of downstream data analysis tasks. To automatically identify anomalies, we propose Probabilistic Robust AutoEncoder (PRAE). PRAE aims to simultaneously remove outliers and identify a low-dimensional… ▽ More Anomalies (or outliers) are prevalent in real-world empirical observations and potentially mask important underlying structures. Accurate identification of anomalous samples is crucial for the success of downstream data analysis tasks. To automatically identify anomalies, we propose Probabilistic Robust AutoEncoder (PRAE). PRAE aims to simultaneously remove outliers and identify a low-dimensional representation for the inlier samples. We first present the Robust AutoEncoder (RAE) objective as a minimization problem for splitting the data into inliers and outliers. Our objective is designed to exclude outliers while including a subset of samples (inliers) that can be effectively reconstructed using an AutoEncoder (AE). RAE minimizes the autoencoder's reconstruction error while incorporating as many samples as possible. This could be formulated via regularization by subtracting an $\ell_0$ norm counting the number of selected samples from the reconstruction term. Unfortunately, this leads to an intractable combinatorial problem. Therefore, we propose two probabilistic relaxations of RAE, which are differentiable and alleviate the need for a combinatorial search. We prove that the solution to the PRAE problem is equivalent to the solution of RAE. We use synthetic data to show that PRAE can accurately remove outliers in a wide range of contamination levels. Finally, we demonstrate that using PRAE for anomaly detection leads to state-of-the-art results on various benchmark datasets. △ Less

Submitted 24 August, 2022; v1 submitted 1 October, 2021; originally announced October 2021.

arXiv:2107.05852 [pdf, ps, other]

Convergence rates of vector-valued local polynomial regression

Authors: Yariv Aizenbud, Barak Sober

Abstract: Non-parametric estimation of functions as well as their derivatives by means of local-polynomial regression is a subject that was studied in the literature since the late 1970's. Given a set of noisy samples of a $\mathcal{C}^k$ smooth function, we perform a local polynomial fit, and by taking its $m$-th derivative we obtain an estimate for the $m$-th function derivative. The known optimal rates o… ▽ More Non-parametric estimation of functions as well as their derivatives by means of local-polynomial regression is a subject that was studied in the literature since the late 1970's. Given a set of noisy samples of a $\mathcal{C}^k$ smooth function, we perform a local polynomial fit, and by taking its $m$-th derivative we obtain an estimate for the $m$-th function derivative. The known optimal rates of convergence for this problem for a $k$-times smooth function $f:\mathbb{R}^d \to \mathbb{R}$ are $n^{-\frac{k-m}{2k + d}}$. However in modern applications it is often the case that we have to estimate a function operating to $\mathbb{R}^D$, for $D \gg d$ extremely large. In this work, we prove that these same rates of convergence are also achievable by local-polynomial regression in case of a high dimensional target, given some assumptions on the noise distribution. This result is an extension to Stone's seminal work from 1980 to the regime of high-dimensional target domain. In addition, we unveil a connection between the failure probability $\varepsilon$ and the number of samples required to achieve the optimal rates. △ Less

Submitted 14 July, 2021; v1 submitted 13 July, 2021; originally announced July 2021.

arXiv:2105.04754 [pdf, other]

Non-Parametric Estimation of Manifolds from Noisy Data

Authors: Yariv Aizenbud, Barak Sober

Abstract: A common observation in data-driven applications is that high dimensional data has a low intrinsic dimension, at least locally. In this work, we consider the problem of estimating a $d$ dimensional sub-manifold of $\mathbb{R}^D$ from a finite set of noisy samples. Assuming that the data was sampled uniformly from a tubular neighborhood of $\mathcal{M}\in \mathcal{C}^k$, a compact manifold without… ▽ More A common observation in data-driven applications is that high dimensional data has a low intrinsic dimension, at least locally. In this work, we consider the problem of estimating a $d$ dimensional sub-manifold of $\mathbb{R}^D$ from a finite set of noisy samples. Assuming that the data was sampled uniformly from a tubular neighborhood of $\mathcal{M}\in \mathcal{C}^k$, a compact manifold without boundary, we present an algorithm that takes a point $r$ from the tubular neighborhood and outputs $\hat p_n\in \mathbb{R}^D$, and $\widehat{T_{\hat p_n}\mathcal{M}}$ an element in the Grassmanian $Gr(d, D)$. We prove that as the number of samples $n\to\infty$ the point $\hat p_n$ converges to $p\in \mathcal{M}$ and $\widehat{T_{\hat p_n}\mathcal{M}}$ converges to $T_p\mathcal{M}$ (the tangent space at that point) with high probability. Furthermore, we show that the estimation yields asymptotic rates of convergence of $n^{-\frac{k}{2k + d}}$ for the point estimation and $n^{-\frac{k-1}{2k + d}}$ for the estimation of the tangent space. These rates are known to be optimal for the case of function estimation. △ Less

Submitted 19 July, 2021; v1 submitted 10 May, 2021; originally announced May 2021.

arXiv:2102.13276 [pdf, other]

Spectral Top-Down Recovery of Latent Tree Models

Authors: Yariv Aizenbud, Ariel Jaffe, Meng Wang, Amber Hu, Noah Amsel, Boaz Nadler, Joseph T. Chang, Yuval Kluger

Abstract: Modeling the distribution of high dimensional data by a latent tree graphical model is a prevalent approach in multiple scientific domains. A common task is to infer the underlying tree structure, given only observations of its terminal nodes. Many algorithms for tree recovery are computationally intensive, which limits their applicability to trees of moderate size. For large trees, a common appro… ▽ More Modeling the distribution of high dimensional data by a latent tree graphical model is a prevalent approach in multiple scientific domains. A common task is to infer the underlying tree structure, given only observations of its terminal nodes. Many algorithms for tree recovery are computationally intensive, which limits their applicability to trees of moderate size. For large trees, a common approach, termed divide-and-conquer, is to recover the tree structure in two steps. First, recover the structure separately of multiple, possibly random subsets of the terminal nodes. Second, merge the resulting subtrees to form a full tree. Here, we develop Spectral Top-Down Recovery (STDR), a deterministic divide-and-conquer approach to infer large latent tree models. Unlike previous methods, STDR partitions the terminal nodes in a non random way, based on the Fiedler vector of a suitable Laplacian matrix related to the observed nodes. We prove that under certain conditions, this partitioning is consistent with the tree structure. This, in turn, leads to a significantly simpler merging procedure of the small subtrees. We prove that STDR is statistically consistent and bound the number of samples required to accurately recover the tree with high probability. Using simulated data from several common tree models in phylogenetics, we demonstrate that STDR has a significant advantage in terms of runtime, with improved or similar accuracy. △ Less

Submitted 7 December, 2021; v1 submitted 25 February, 2021; originally announced February 2021.

arXiv:2002.12547 [pdf, ps, other]

Spectral neighbor joining for reconstruction of latent tree models

Authors: Ariel Jaffe, Noah Amsel, Yariv Aizenbud, Boaz Nadler, Joseph T. Chang, Yuval Kluger

Abstract: A common assumption in multiple scientific applications is that the distribution of observed data can be modeled by a latent tree graphical model. An important example is phylogenetics, where the tree models the evolutionary lineages of a set of observed organisms. Given a set of independent realizations of the random variables at the leaves of the tree, a key challenge is to infer the underlying… ▽ More A common assumption in multiple scientific applications is that the distribution of observed data can be modeled by a latent tree graphical model. An important example is phylogenetics, where the tree models the evolutionary lineages of a set of observed organisms. Given a set of independent realizations of the random variables at the leaves of the tree, a key challenge is to infer the underlying tree topology. In this work we develop Spectral Neighbor Joining (SNJ), a novel method to recover the structure of latent tree graphical models. Given a matrix that contains a measure of similarity between all pairs of observed variables, SNJ computes a spectral measure of cohesion between groups of observed variables. We prove that SNJ is consistent, and derive a sufficient condition for correct tree recovery from an estimated similarity matrix. Combining this condition with a concentration of measure result on the similarity matrix, we bound the number of samples required to recover the tree with high probability. We illustrate via extensive simulations that in comparison to several other reconstruction methods, SNJ requires fewer samples to accurately recover trees with a large number of leaves or long edges. △ Less

Submitted 22 September, 2020; v1 submitted 28 February, 2020; originally announced February 2020.

arXiv:1907.12159 [pdf, other]

Approximating the Span of Principal Components via Iterative Least-Squares

Authors: Yariv Aizenbud, Barak Sober

Abstract: In the course of the last century, Principal Component Analysis (PCA) have become one of the pillars of modern scientific methods. Although PCA is normally addressed as a statistical tool aiming at finding orthogonal directions on which the variance is maximized, its first introduction by Pearson at 1901 was done through defining a non-linear least-squares minimization problem of fitting a plane t… ▽ More In the course of the last century, Principal Component Analysis (PCA) have become one of the pillars of modern scientific methods. Although PCA is normally addressed as a statistical tool aiming at finding orthogonal directions on which the variance is maximized, its first introduction by Pearson at 1901 was done through defining a non-linear least-squares minimization problem of fitting a plane to scattered data points. Thus, it seems natural that PCA and linear least-squares regression are somewhat related, as they both aim at fitting planes to data points. In this paper, we present a connection between the two approaches. Specifically, we present an iterated linear least-squares approach, yielding a sequence of subspaces, which converges to the space spanned by the leading principal components (i.e., principal space). △ Less

Submitted 28 July, 2019; originally announced July 2019.

arXiv:1905.12442 [pdf, other]

Rank-one Multi-Reference Factor Analysis

Authors: Yariv Aizenbud, Boris Landa, Yoel Shkolnisky

Abstract: In recent years, there is a growing need for processing methods aimed at extracting useful information from large datasets. In many cases the challenge is to discover a low-dimensional structure in the data, often concealed by the existence of nuisance parameters and noise. Motivated by such challenges, we consider the problem of estimating a signal from its scaled, cyclically-shifted and noisy ob… ▽ More In recent years, there is a growing need for processing methods aimed at extracting useful information from large datasets. In many cases the challenge is to discover a low-dimensional structure in the data, often concealed by the existence of nuisance parameters and noise. Motivated by such challenges, we consider the problem of estimating a signal from its scaled, cyclically-shifted and noisy observations. We focus on the particularly challenging regime of low signal-to-noise ratio (SNR), where different observations cannot be shift-aligned. We show that an accurate estimation of the signal from its noisy observations is possible, and derive a procedure which is proved to consistently estimate the signal. The asymptotic sample complexity (the number of observations required to recover the signal) of the procedure is $1/\operatorname{SNR}^4$. Additionally, we propose a procedure which is experimentally shown to improve the sample complexity by a factor equal to the signal's length. Finally, we present numerical experiments which demonstrate the performance of our algorithms, and corroborate our theoretical findings. △ Less

Submitted 4 June, 2019; v1 submitted 29 May, 2019; originally announced May 2019.

arXiv:1711.00765 [pdf, other]

doi 10.1016/j.cam.2020.113140

Approximation of Functions over Manifolds: A Moving Least-Squares Approach

Authors: Barak Sober, Yariv Aizenbud, David Levin

Abstract: We present an algorithm for approximating a function defined over a $d$-dimensional manifold utilizing only noisy function values at locations sampled from the manifold with noise. To produce the approximation we do not require any knowledge regarding the manifold other than its dimension $d$. We use the Manifold Moving Least-Squares approach of (Sober and Levin 2016) to reconstruct the atlas of c… ▽ More We present an algorithm for approximating a function defined over a $d$-dimensional manifold utilizing only noisy function values at locations sampled from the manifold with noise. To produce the approximation we do not require any knowledge regarding the manifold other than its dimension $d$. We use the Manifold Moving Least-Squares approach of (Sober and Levin 2016) to reconstruct the atlas of charts and the approximation is built on-top of those charts. The resulting approximant is shown to be a function defined over a neighborhood of a manifold, approximating the originally sampled manifold. In other words, given a new point, located near the manifold, the approximation can be evaluated directly on that point. We prove that our construction yields a smooth function, and in case of noiseless samples the approximation order is $\mathcal{O}(h^{m+1})$, where $h$ is a local density of sample parameter (i.e., the fill distance) and $m$ is the degree of a local polynomial approximation, used in our algorithm. In addition, the proposed algorithm has linear time complexity with respect to the ambient-space's dimension. Thus, we are able to avoid the computational complexity, commonly encountered in high dimensional approximations, without having to perform non-linear dimension reduction, which inevitably introduces distortions to the geometry of the data. Additionaly, we show numerical experiments that the proposed approach compares favorably to statistical approaches for regression over manifolds and show its potential. △ Less

Submitted 16 January, 2020; v1 submitted 2 November, 2017; originally announced November 2017.

Comments: arXiv admin note: text overlap with arXiv:1606.07104

arXiv:1707.03311 [pdf, other]

Similarity Search Over Graphs Using Localized Spectral Analysis

Authors: Yariv Aizenbud, Amir Averbuch, Gil Shabat, Guy Ziv

Abstract: This paper provides a new similarity detection algorithm. Given an input set of multi-dimensional data points, where each data point is assumed to be multi-dimensional, and an additional reference data point for similarity finding, the algorithm uses kernel method that embeds the data points into a low dimensional manifold. Unlike other kernel methods, which consider the entire data for the embedd… ▽ More This paper provides a new similarity detection algorithm. Given an input set of multi-dimensional data points, where each data point is assumed to be multi-dimensional, and an additional reference data point for similarity finding, the algorithm uses kernel method that embeds the data points into a low dimensional manifold. Unlike other kernel methods, which consider the entire data for the embedding, our method selects a specific set of kernel eigenvectors. The eigenvectors are chosen to separate between the data points and the reference data point so that similar data points can be easily identified as being distinct from most of the members in the dataset. △ Less

Submitted 11 July, 2017; originally announced July 2017.

Comments: Published in SampTA 2017

arXiv:1609.01100 [pdf, other]

A max-cut approach to heterogeneity in cryo-electron microscopy

Authors: Yariv Aizenbud, Yoel Shkolnisky

Abstract: The field of cryo-electron microscopy has made astounding advancements in the past few years, mainly due to advancements in electron detectors' technology. Yet, one of the key open challenges of the field remains the processing of heterogeneous data sets, produced from samples containing particles at several different conformational states. For such data sets, the algorithms must include some clas… ▽ More The field of cryo-electron microscopy has made astounding advancements in the past few years, mainly due to advancements in electron detectors' technology. Yet, one of the key open challenges of the field remains the processing of heterogeneous data sets, produced from samples containing particles at several different conformational states. For such data sets, the algorithms must include some classification procedure to identify homogeneous groups within the data, so that the images in each group correspond to the same underlying structure. The fundamental importance of the heterogeneity problem in cryo-electron microscopy has drawn many research efforts, and resulted in significant progress in classification algorithms for heterogeneous data sets. While these algorithms are extremely useful and effective in practice, they lack rigorous mathematical analysis and performance guarantees. In this paper, we attempt to make the first steps towards rigorous mathematical analysis of the heterogeneity problem in cryo-electron microscopy. To that end, we present an algorithm for processing heterogeneous data sets, and prove accuracy and stability bounds for it. We also suggest an extension of this algorithm that combines the classification and reconstruction steps. We demonstrate it on simulated data, and compare its performance to the state-of-the-art algorithm in RELION. △ Less

Submitted 3 October, 2019; v1 submitted 5 September, 2016; originally announced September 2016.

arXiv:1606.08819 [pdf, other]

Multi-View Kernel Consensus For Data Analysis

Authors: Moshe Salhov, Ofir Lindenbaum, Yariv Aizenbud, Avi Silberschatz, Yoel Shkolnisky, Amir Averbuch

Abstract: The input data features set for many data driven tasks is high-dimensional while the intrinsic dimension of the data is low. Data analysis methods aim to uncover the underlying low dimensional structure imposed by the low dimensional hidden parameters by utilizing distance metrics that consider the set of attributes as a single monolithic set. However, the transformation of the low dimensional phe… ▽ More The input data features set for many data driven tasks is high-dimensional while the intrinsic dimension of the data is low. Data analysis methods aim to uncover the underlying low dimensional structure imposed by the low dimensional hidden parameters by utilizing distance metrics that consider the set of attributes as a single monolithic set. However, the transformation of the low dimensional phenomena into the measured high dimensional observations might distort the distance metric, This distortion can effect the desired estimated low dimensional geometric structure. In this paper, we suggest to utilize the redundancy in the attribute domain by partitioning the attributes into multiple subsets we call views. The proposed methods utilize the agreement also called consensus between different views to extract valuable geometric information that unifies multiple views about the intrinsic relationships among several different observations. This unification enhances the information that a single view or a simple concatenations of views provides. △ Less

Submitted 29 January, 2019; v1 submitted 28 June, 2016; originally announced June 2016.

arXiv:1602.03360 [pdf, other]

Matrix Decompositions using sub-Gaussian Random Matrices

Authors: Yariv Aizenbud, Amir Averbuch

Abstract: In recent years, several algorithms, which approximate matrix decomposition, have been developed. These algorithms are based on metric conservation features for linear spaces of random projection types. We show that an i.i.d sub-Gaussian matrix with large probability to have zero entries is metric conserving. We also present a new algorithm, which achieves with high probability, a rank $r$ decompo… ▽ More In recent years, several algorithms, which approximate matrix decomposition, have been developed. These algorithms are based on metric conservation features for linear spaces of random projection types. We show that an i.i.d sub-Gaussian matrix with large probability to have zero entries is metric conserving. We also present a new algorithm, which achieves with high probability, a rank $r$ decomposition approximation for an $m \times n$ matrix that has an asymptotic complexity like state-of-the-art algorithms. We derive an error bound that does not depend on the first $r$ singular values. Although the proven error bound is not as tight as the state-of-the-art bound, experiments show that the proposed algorithm is faster in practice, while getting the same error rates as the state-of-the-art algorithms get. △ Less

Submitted 10 February, 2016; originally announced February 2016.

arXiv:1601.04280 [pdf, other]

Randomized LU Decomposition Using Sparse Projections

Authors: Yariv Aizenbud, Gil Shabat, Amir Averbuch

Abstract: A fast algorithm for the approximation of a low rank LU decomposition is presented. In order to achieve a low complexity, the algorithm uses sparse random projections combined with FFT-based random projections. The asymptotic approximation error of the algorithm is analyzed and a theoretical error bound is presented. Finally, numerical examples illustrate that for a similar approximation error, th… ▽ More A fast algorithm for the approximation of a low rank LU decomposition is presented. In order to achieve a low complexity, the algorithm uses sparse random projections combined with FFT-based random projections. The asymptotic approximation error of the algorithm is analyzed and a theoretical error bound is presented. Finally, numerical examples illustrate that for a similar approximation error, the sparse LU algorithm is faster than recent state-of-the-art methods. The algorithm is completely parallelizable that enables to run on a GPU. The performance is tested on a GPU card, showing a significant improvement in the running time in comparison to sequential execution. △ Less

Submitted 17 January, 2016; originally announced January 2016.

arXiv:1511.00831 [pdf, other]

PCA-Based Out-of-Sample Extension for Dimensionality Reduction

Authors: Yariv Aizenbud, Amit Bermanis, Amir Averbuch

Abstract: Dimensionality reduction methods are very common in the field of high dimensional data analysis. Typically, algorithms for dimensionality reduction are computationally expensive. Therefore, their applications for the analysis of massive amounts of data are impractical. For example, repeated computations due to accumulated data are computationally prohibitive. In this paper, an out-of-sample extens… ▽ More Dimensionality reduction methods are very common in the field of high dimensional data analysis. Typically, algorithms for dimensionality reduction are computationally expensive. Therefore, their applications for the analysis of massive amounts of data are impractical. For example, repeated computations due to accumulated data are computationally prohibitive. In this paper, an out-of-sample extension scheme, which is used as a complementary method for dimensionality reduction, is presented. We describe an algorithm which performs an out-of-sample extension to newly-arrived data points. Unlike other extension algorithms such as Nyström algorithm, the proposed algorithm uses the intrinsic geometry of the data and properties for dimensionality reduction map. We prove that the error of the proposed algorithm is bounded. Additionally to the out-of-sample extension, the algorithm provides a degree of the abnormality of any newly-arrived data point. △ Less

Submitted 3 November, 2015; originally announced November 2015.

arXiv:1310.7202 [pdf, other]

Randomized LU Decomposition

Authors: Gil Shabat, Yaniv Shmueli, Yariv Aizenbud, Amir Averbuch

Abstract: We present a fast randomized algorithm that computes a low rank LU decomposition. Our algorithm uses random projections type techniques to efficiently compute a low rank approximation of large matrices. The randomized LU algorithm can be parallelized and further accelerated by using sparse random matrices in its projection step. Several different error bounds are proven for the algorithm approxima… ▽ More We present a fast randomized algorithm that computes a low rank LU decomposition. Our algorithm uses random projections type techniques to efficiently compute a low rank approximation of large matrices. The randomized LU algorithm can be parallelized and further accelerated by using sparse random matrices in its projection step. Several different error bounds are proven for the algorithm approximations. To prove these bounds, recent results from random matrix theory related to subgaussian matrices are used. As an application, we also show how the algorithm can be utilized to solve problems such as the rank-deficient least squares problem. Numerical examples, which illustrate the performance of the algorithm and compare it to other decomposition methods, are presented. △ Less

Submitted 30 January, 2016; v1 submitted 27 October, 2013; originally announced October 2013.

Showing 1–15 of 15 results for author: Aizenbud, Y