Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content
Elena Grigorescu

    Elena Grigorescu

    We initiate a systematic investigation of distribution learning (density estimation) when the data is distributed across multiple servers. The servers must communicate with a referee and the goal is to estimate the underlying distribution... more
    We initiate a systematic investigation of distribution learning (density estimation) when the data is distributed across multiple servers. The servers must communicate with a referee and the goal is to estimate the underlying distribution with as few bits of communication as possible. We focus on non-parametric density estimation of discrete distributions with respect to the l1 and l2 norms. We provide the first non-trivial upper and lower bounds on the communication complexity of this basic estimation task in various settings of interest. Specifically, our results include the following: 1. When the unknown discrete distribution is unstructured and each server has only one sample, we show that any blackboard protocol (i.e., any protocol in which servers interact arbitrarily using public messages) that learns the distribution must essentially communicate the entire sample. 2. For the case of structured distributions, such as k-histograms and monotone distributions, we design distribu...
    We investigate the local-list decodability of codes whose codewords are group homomorphisms. The study of such codes was intiated by Goldreich and Levin with the seminal work on decoding the Hadamard code. Many of the recent abstractions... more
    We investigate the local-list decodability of codes whose codewords are group homomorphisms. The study of such codes was intiated by Goldreich and Levin with the seminal work on decoding the Hadamard code. Many of the recent abstractions of their initial algorithm focus on Locally Decodable Codes (LDC's) over finite fields. We derive our algorithmic approach from the list decoding of the Reed-Muller code over finite fields proposed by Sudan, Trevisan and Vadhan. Given an abelian group G and a fixed abelian group H, we give combinatorial bounds on the number of homomorphisms that have agreement 5 with an oracle-access function f : G --+ H. Our bounds are polynomial in !, where the degree of the polynomial depends on IHI. Also, 6 depends on the distance parameter of the code, namely we consider 5 to be slightly greater than 1-minimum distance. Furthermore, we give a local-list decoding algorithm for the homomorphisms that agree on a J fraction of the domain with a function f, the ...
    Properties of Boolean functions on the hypercube that are invariant with respect to linear transfor-mations of the domain are among some of the most well-studied properties in the context of property testing. In this paper, we study a... more
    Properties of Boolean functions on the hypercube that are invariant with respect to linear transfor-mations of the domain are among some of the most well-studied properties in the context of property testing. In this paper, we study a particular natural class of linear-invariant ...
    Lattices are discrete mathematical objects with widespread applications to integer programs as well as modern cryptography. A fundamental problem in both domains is the Closest Vector Problem (popularly known as CVP). It is well-known... more
    Lattices are discrete mathematical objects with widespread applications to integer programs as well as modern cryptography. A fundamental problem in both domains is the Closest Vector Problem (popularly known as CVP). It is well-known that CVP can be easily solved in lattices that have an orthogonal basis if the orthogonal basis is specified. This motivates the orthogon-ality decision problem: verify whether a given lattice has an orthogonal basis. Surprisingly, the orthogonality decision problem is not known to be either NP-complete or in P. In this paper, we focus on the orthogonality decision problem for a well-known family of lattices, namely Construction-A lattices. These are lattices of the form C + qZ n , where C is an error-correcting q-ary code, and are studied in communication settings. We provide a complete characterization of lattices obtained from binary and ternary codes using Construction-A that have an orthogonal basis. This characterization leads to an efficient alg...
    Locality sensitive hashing (LSH) was introduced by Indyk and Motwani (STOC `98) to give the first sublinear time algorithm for the c-approximate nearest neighbor (ANN) problem using only polynomial space. At a high level, an LSH family... more
    Locality sensitive hashing (LSH) was introduced by Indyk and Motwani (STOC `98) to give the first sublinear time algorithm for the c-approximate nearest neighbor (ANN) problem using only polynomial space. At a high level, an LSH family hashes "nearby" points to the same bucket and "far away" points to different buckets. The quality of measure of an LSH family is its LSH exponent, which helps determine both query time and space usage. In a seminal work, Andoni and Indyk (FOCS `06) constructed an LSH family based on random ball partitioning of space that achieves an LSH exponent of 1/c^2 for the l_2 norm, which was later shown to be optimal by Motwani, Naor and Panigrahy (SIDMA `07) and O'Donnell, Wu and Zhou (TOCT `14). Although optimal in the LSH exponent, the ball partitioning approach is computationally expensive. So, in the same work, Andoni and Indyk proposed a simpler and more practical hashing scheme based on Euclidean lattices and provided computationa...
    A palindrome is a string that reads the same as its reverse, such as "aibohphobia" (fear of palindromes). Given an integer $d>0$, a $d$-near-palindrome is a string of Hamming distance at most $d$ from its reverse. We study... more
    A palindrome is a string that reads the same as its reverse, such as "aibohphobia" (fear of palindromes). Given an integer $d>0$, a $d$-near-palindrome is a string of Hamming distance at most $d$ from its reverse. We study the natural problem of identifying a longest $d$-near-palindrome in data streams. The problem is relevant to the analysis of DNA databases, and to the task of repairing recursive structures in documents such as XML and JSON. We present an algorithm that returns a $d$-near-palindrome whose length is within a multiplicative $(1+\epsilon)$-factor of the longest $d$-near-palindrome. Our algorithm also returns the set of mismatched indices of the $d$-near-palindrome, using $\mathcal{O}\left(\frac{d\log^7 n}{\epsilon\log(1+\epsilon)}\right)$ bits of space, and $\mathcal{O}\left(\frac{d\log^6 n}{\epsilon\log(1+\epsilon)}\right)$ update time per arriving symbol. We show that $\Omega(d\log n)$ space is necessary for estimating the length of longest $d$-near-pa...
    A Boolean k-monotone function defined over a finite poset domain D alternates between the values 0 and 1 at most k times on any ascending chain in D. Therefore, k-monotone functions are natural generalizations of the classical monotone... more
    A Boolean k-monotone function defined over a finite poset domain D alternates between the values 0 and 1 at most k times on any ascending chain in D. Therefore, k-monotone functions are natural generalizations of the classical monotone functions, which are the 1-monotone functions. Motivated by the recent interest in k-monotone functions in the context of circuit complexity and learning theory, and by the central role that monotonicity testing plays in the context of property testing, we initiate a systematic study of k-monotone functions, in the property testing model. In this model, the goal is to distinguish functions that are k-monotone (or are close to being k-monotone) from functions that are far from being k-monotone. Our results include the following: 1. We demonstrate a separation between testing k-monotonicity and testing monotonicity, on the hypercube domain {0, 1}d, for k ≥ 3; 2. We demonstrate a separation between testing and learning on {0, 1}d, for k = ω(log d): testi...
    AC^0 o MOD_2 circuits are AC^0 circuits augmented with a layer of parity gates just above the input layer. We study AC^0 o MOD2 circuit lower bounds for computing the Boolean Inner Product functions. Recent works by Servedio and Viola... more
    AC^0 o MOD_2 circuits are AC^0 circuits augmented with a layer of parity gates just above the input layer. We study AC^0 o MOD2 circuit lower bounds for computing the Boolean Inner Product functions. Recent works by Servedio and Viola (ECCC TR12-144) and Akavia et al. (ITCS 2014) have highlighted this problem as a frontier problem in circuit complexity that arose both as a first step towards solving natural special cases of the matrix rigidity problem and as a candidate for constructing pseudorandom generators of minimal complexity. We give the first superlinear lower bound for the Boolean Inner Product function against AC^0 o MOD2 of depth four or greater. Specifically, we prove a superlinear lower bound for circuits of arbitrary constant depth, and an ~Omega(n^2) lower bound for the special case of depth-4 AC^0 o MOD_2. Our proof of the depth-4 lower bound employs a new "moment-matching" inequality for bounded, nonnegative integer-valued random variables that may be of i...
    We are interested in constructing efficient data structures that still work (most of the time) when hit by a constant fraction of adversarial noise. Roughly speaking, by “efficient” we mean constructions that are simultaneously close to... more
    We are interested in constructing efficient data structures that still work (most of the time) when hit by a constant fraction of adversarial noise. Roughly speaking, by “efficient” we mean constructions that are simultaneously close to the optimal time and space for the noiseless case. Recently, de Wolf [20] introduced a model for this, called “error-correcting data structures,” and studied the tradeoff between data structure length and efficiency of query answering (as m easured by the number of bit-probes). Unfortunately, this tradeoff is quite bad in that model, and it is unlikely that one could construct errorcorrecting data structures that are simultaneously efficie nt in time and space, unless significant progress is made in improving this tradeoff for “locally decodable codes.” In this paper we relax the requirements on error-correcting data structures: our model only requir es that most queries are answered correctly, while for the remaining queries the decoder is allowed t...
    A Boolean $k$-monotone function defined over a finite poset domain ${\cal D}$ alternates between the values $0$ and $1$ at most $k$ times on any ascending chain in ${\cal D}$. Therefore, $k$-monotone functions are natural generalizations... more
    A Boolean $k$-monotone function defined over a finite poset domain ${\cal D}$ alternates between the values $0$ and $1$ at most $k$ times on any ascending chain in ${\cal D}$. Therefore, $k$-monotone functions are natural generalizations of the classical monotone functions, which are the $1$-monotone functions. Motivated by the recent interest in $k$-monotone functions in the context of circuit complexity and learning theory, and by the central role that monotonicity testing plays in the context of property testing, we initiate a systematic study of $k$-monotone functions, in the property testing model. In this model, the goal is to distinguish functions that are $k$-monotone (or are close to being $k$-monotone) from functions that are far from being $k$-monotone. Our results include the following: - We demonstrate a separation between testing $k$-monotonicity and testing monotonicity, on the hypercube domain $\{0,1\}^d$, for $k\geq 3$; - We demonstrate a separation between testing ...
    We study the problem of finding all $k$-periods of a length-$n$ string $S$, presented as a data stream. $S$ is said to have $k$-period $p$ if its prefix of length $n-p$ differs from its suffix of length $n-p$ in at most $k$ locations. We... more
    We study the problem of finding all $k$-periods of a length-$n$ string $S$, presented as a data stream. $S$ is said to have $k$-period $p$ if its prefix of length $n-p$ differs from its suffix of length $n-p$ in at most $k$ locations. We give a one-pass streaming algorithm that computes the $k$-periods of a string $S$ using $\text{poly}(k, \log n)$ bits of space, for $k$-periods of length at most $\frac{n}{2}$. We also present a two-pass streaming algorithm that computes $k$-periods of $S$ using $\text{poly}(k, \log n)$ bits of space, regardless of period length. We complement these results with comparable lower bounds.
    We investigate the problem of detecting periodic trends within a string S of length n, arriving in the streaming model, containing at most k wildcard characters, where k = o(n). A wildcard character is a special character that can be... more
    We investigate the problem of detecting periodic trends within a string S of length n, arriving in the streaming model, containing at most k wildcard characters, where k = o(n). A wildcard character is a special character that can be assigned any other character. We say S has wildcard-period p if there exists an assignment to each of the wildcard characters so that in the resulting stream the length n− p prefix equals the length n− p suffix. We present a two-pass streaming algorithm that computes wildcard-periods of S using O ( k polylog n ) bits of space, while we also show that this problem cannot be solved in sublinear space in one pass. We then give a one-pass randomized streaming algorithm that computes all wildcard-periods p of S with p < n 2 and no wildcard characters appearing in the last p symbols of S, using O ( k log n )
    We introduce and study the model of list learning with attribute noise. Learning with attribute noise was introduced by Shackelford and Volper (COLT 1988) as a variant of PAC learning, in which the algorithm has access to noisy examples... more
    We introduce and study the model of list learning with attribute noise. Learning with attribute noise was introduced by Shackelford and Volper (COLT 1988) as a variant of PAC learning, in which the algorithm has access to noisy examples and uncorrupted labels, and the goal is to recover an accurate hypothesis. Sloan (COLT 1988) and Goldman and Sloan (Algorithmica 1995) discovered information-theoretic limits to learning in this model, which have impeded further progress. In this article we extend the model to that of list learning, drawing inspiration from the list-decoding model in coding theory, and its recent variant studied in the context of learning. On the positive side, we show that sparse conjunctions can be efficiently list learned under some assumptions on the underlying ground-truth distribution. On the negative side, our results show that even in the list-learning model, efficient learning of parities and majorities is not possible regardless of the representation used.
    Motivated by the structural analogies between point lattices and linear error-correcting codes, and by the mature theory on locally testable codes, we initiate a systematic study of local testing for membership in lattices. Testing... more
    Motivated by the structural analogies between point lattices and linear error-correcting codes, and by the mature theory on locally testable codes, we initiate a systematic study of local testing for membership in lattices. Testing membership in lattices is also motivated in practice, by applications to integer programming, error detection in lattice-based communication, and cryptography. Apart from establishing the conceptual foundations of lattice testing, our results include the following: 1. We demonstrate upper and lower bounds on the query complexity of local testing for the well-known family of code formula lattices. Furthermore, we instantiate our results with code formula lattices constructed from Reed-Muller codes, and obtain nearly-tight bounds. 2. We show that in order to achieve low query complexity, it is sufficient to design one-sided non-adaptive canonical tests. This result is akin to, and based on an analogous result for error-correcting codes due to Ben-Sasson et ...
    Recent efforts in coding theory have focused on building codes for insertions and deletions, called insdel codes, with optimal trade-offs between their redundancy and their error-correction capabilities, as well as efficient encoding and... more
    Recent efforts in coding theory have focused on building codes for insertions and deletions, called insdel codes, with optimal trade-offs between their redundancy and their error-correction capabilities, as well as efficient encoding and decoding algorithms. In many applications, polynomial running time may still be prohibitively expensive, which has motivated the study of codes with super-efficient decoding algorithms. These have led to the well-studied notions of Locally Decodable Codes (LDCs) and Locally Correctable Codes (LCCs). Inspired by these notions, Ostrovsky and Paskin-Cherniavsky (Information Theoretic Security, 2015) generalized Hamming LDCs to insertions and deletions. To the best of our knowledge, these are the only known results that study the analogues of Hamming LDCs in channels performing insertions and deletions. Here we continue the study of insdel codes that admit local algorithms. Specifically, we reprove the results of Ostrovsky and Paskin-Cherniavsky for ins...
    We study the distinct elements and $\ell_p$-heavy hitters problems in the sliding window model, where only the most recent $n$ elements in the data stream form the underlying set. We first introduce the composable histogram, a simple... more
    We study the distinct elements and $\ell_p$-heavy hitters problems in the sliding window model, where only the most recent $n$ elements in the data stream form the underlying set. We first introduce the composable histogram, a simple twist on the exponential (Datar et al., SODA 2002) and smooth histograms (Braverman and Ostrovsky, FOCS 2007) that may be of independent interest. We then show that the composable histogram along with a careful combination of existing techniques to track either the identity or frequency of a few specific items suffices to obtain algorithms for both distinct elements and $\ell_p$-heavy hitters that are nearly optimal in both $n$ and $\epsilon$. Applying our new composable histogram framework, we provide an algorithm that outputs a $(1+\epsilon)$-approximation to the number of distinct elements in the sliding window model and uses $\O{\frac{1}{\epsilon^2}\log n\log\frac{1}{\epsilon}\log\log n+\frac{1}{\epsilon}\log^2 n}$ bits of space. For $\ell_p$-heavy ...
    A heapable sequence is a sequence of numbers that can be arranged in a min-heap data structure. Finding a longest heapable subsequence of a given sequence was proposed by Byers, Heeringa, Mitzenmacher, and Zervas (ANALCO 2011) as a... more
    A heapable sequence is a sequence of numbers that can be arranged in a min-heap data structure. Finding a longest heapable subsequence of a given sequence was proposed by Byers, Heeringa, Mitzenmacher, and Zervas (ANALCO 2011) as a generalization of the well-studied longest increasing subsequence problem and its complexity still remains open. An equivalent formulation of the longest heapable subsequence problem is that of finding a maximum-sized binary tree in a given permutation directed acyclic graph (permutation DAG). In this work, we study parameterized algorithms for both longest heapable subsequence and maximum-sized binary tree. We introduce alphabet size as a new parameter in the study of computational problems in permutation DAGs and show that this parameter with respect to a fixed topological ordering admits a complete characterization and a polynomial time algorithm. We believe that this parameter is likely to be useful in the context of optimization problems defined over...
    We continue the study of k-monotone Boolean functions in the property testing model, initiated by Canonne et al. (ITCS 2017). A function f : {0, 1} → {0, 1} is said to be kmonotone if it alternates between 0 and 1 at most k times on every... more
    We continue the study of k-monotone Boolean functions in the property testing model, initiated by Canonne et al. (ITCS 2017). A function f : {0, 1} → {0, 1} is said to be kmonotone if it alternates between 0 and 1 at most k times on every ascending chain. Such functions represent a natural generalization of (1-)monotone functions, and have been recently studied in circuit complexity, PAC learning, and cryptography. In property testing, the fact that 1-monotonicity can be locally tested with polyn queries led to a previous conjecture that k-monotonicity can be tested with poly(n) queries. In this work we disprove the conjecture, and show that even 2-monotonicity requires an exponential in √ n number of queries. Furthermore, even the apparently easier task of distinguishing 2-monotone functions from functions that are far from being n.01-monotone also requires an exponential number of queries. Our results follow from constructions of families that are hard for a canonical tester that ...

    And 13 more