The problem of nearest-neighbor classification is a fundamental technique in machine-learning. Gi... more The problem of nearest-neighbor classification is a fundamental technique in machine-learning. Given a training set P of n labeled points in ℝ^d, and an approximation parameter 0 < ε ≤ 1/2, any unlabeled query point should be classified with the class of any of its ε-approximate nearest-neighbors in P. Answering these queries efficiently has been the focus of extensive research, proposing techniques that are mainly tailored towards resolving the more general problem of ε-approximate nearest-neighbor search. While the latest can only hope to provide query time and space complexities dependent on n, the problem of nearest-neighbor classification accepts other parameters more suitable to its analysis. Such is the number k_ε of ε-border points, which describes the complexity of boundaries between sets of points of different classes. This paper presents a new data structure called Chromatic AVD. This is the first approach for ε-approximate nearest-neighbor classification whose space a...
We present space-time tradeoffs for approximate spherical range counting queries. Given a set S o... more We present space-time tradeoffs for approximate spherical range counting queries. Given a set S of n data points in Rd along with a positive approximation factor ε, the goal is to preprocess the points so that, given any Euclidean ball B, we can return the number of points of any subset of S that contains all the points within a (1 − ε)-factor contraction of B, but contains no points that lie outside a (1 + ε)-factor expansion of B. In many applications of range searching it is desirable to offer a tradeoff between space and query time. We present here the first such tradeoffs for approximate range counting queries. Given 0 < ε ≤ 1/2 and a parameter γ, where 2 ≤ γ ≤ 1/ε, we show how to construct a data structure of space O(nγd log(1/ε)) that allows us to answer ε-approximate spherical range counting queries in time O(log(nγ) + 1/(εγ)d−1). The data structure can be built in time O(nγd log(n/ε) log(1/ε)). Here n, ε, and γ are asymptotic quantities, and the dimension d is assumed to...
Approximation problems involving a single convex body in $d$-dimensional space have received a gr... more Approximation problems involving a single convex body in $d$-dimensional space have received a great deal of attention in the computational geometry community. In contrast, works involving multiple convex bodies are generally limited to dimensions $d \leq 3$ and/or do not consider approximation. In this paper, we consider approximations to two natural problems involving multiple convex bodies: detecting whether two polytopes intersect and computing their Minkowski sum. Given an approximation parameter $\varepsilon > 0$, we show how to independently preprocess two polytopes $A,B$ into data structures of size $O(1/\varepsilon^{(d-1)/2})$ such that we can answer in polylogarithmic time whether $A$ and $B$ intersect approximately. More generally, we can answer this for the images of $A$ and $B$ under affine transformations. Next, we show how to $\varepsilon$-approximate the Minkowski sum of two given polytopes defined as the intersection of $n$ halfspaces in $O(n \log(1/\varepsilon) ...
Range searching is a widely-used method in computational geometry for efficiently accessing local... more Range searching is a widely-used method in computational geometry for efficiently accessing local regions of a large data set. Typically, range searching involves either counting or reporting the points lying within a given query region, but it is often desirable to compute statistics that better describe the structure of the point set lying within the region, not just the count. In this paper we consider the geometric minimum spanning tree (MST) problem in the context of range searching where approximation is allowed. We are given a set P of n points in R^d. The objective is to preprocess P so that given an admissible query region Q, it is possible to efficiently approximate the weight of the minimum spanning tree of the subset of P lying within Q. There are two natural sources of approximation error, first by treating Q as a fuzzy object and second by approximating the MST weight itself. To model this, we assume that we are given two positive real approximation parameters eps_q an...
Given a training set $P$ of labeled points, the nearest-neighbor rule predicts the class of an un... more Given a training set $P$ of labeled points, the nearest-neighbor rule predicts the class of an unlabeled query point as the label of its closest point in the set. To improve the time and space complexity of classification, a natural question is how to reduce the training set without significantly affecting the accuracy of the nearest-neighbor rule. Nearest-neighbor condensation deals with finding a subset $R \subseteq P$ such that for every point $p \in P$, $p$'s nearest-neighbor in $R$ has the same label as $p$. This relates to the concept of coresets, which can be broadly defined as subsets of the set, such that an exact result on the coreset corresponds to an approximate result on the original set. However, the guarantees of a coreset hold for any query point, and not only for the points of the training set. This paper introduces the concept of coresets for nearest-neighbor classification. We extend existing criteria used for condensation, and prove sufficient conditions to c...
The problem of nearest-neighbor classification is a fundamental technique in machine-learning. Gi... more The problem of nearest-neighbor classification is a fundamental technique in machine-learning. Given a training set P of n labeled points in ℝ^d, and an approximation parameter 0 < ε ≤ 1/2, any unlabeled query point should be classified with the class of any of its ε-approximate nearest-neighbors in P. Answering these queries efficiently has been the focus of extensive research, proposing techniques that are mainly tailored towards resolving the more general problem of ε-approximate nearest-neighbor search. While the latest can only hope to provide query time and space complexities dependent on n, the problem of nearest-neighbor classification accepts other parameters more suitable to its analysis. Such is the number k_ε of ε-border points, which describes the complexity of boundaries between sets of points of different classes. This paper presents a new data structure called Chromatic AVD. This is the first approach for ε-approximate nearest-neighbor classification whose space a...
We present space-time tradeoffs for approximate spherical range counting queries. Given a set S o... more We present space-time tradeoffs for approximate spherical range counting queries. Given a set S of n data points in Rd along with a positive approximation factor ε, the goal is to preprocess the points so that, given any Euclidean ball B, we can return the number of points of any subset of S that contains all the points within a (1 − ε)-factor contraction of B, but contains no points that lie outside a (1 + ε)-factor expansion of B. In many applications of range searching it is desirable to offer a tradeoff between space and query time. We present here the first such tradeoffs for approximate range counting queries. Given 0 < ε ≤ 1/2 and a parameter γ, where 2 ≤ γ ≤ 1/ε, we show how to construct a data structure of space O(nγd log(1/ε)) that allows us to answer ε-approximate spherical range counting queries in time O(log(nγ) + 1/(εγ)d−1). The data structure can be built in time O(nγd log(n/ε) log(1/ε)). Here n, ε, and γ are asymptotic quantities, and the dimension d is assumed to...
Approximation problems involving a single convex body in $d$-dimensional space have received a gr... more Approximation problems involving a single convex body in $d$-dimensional space have received a great deal of attention in the computational geometry community. In contrast, works involving multiple convex bodies are generally limited to dimensions $d \leq 3$ and/or do not consider approximation. In this paper, we consider approximations to two natural problems involving multiple convex bodies: detecting whether two polytopes intersect and computing their Minkowski sum. Given an approximation parameter $\varepsilon > 0$, we show how to independently preprocess two polytopes $A,B$ into data structures of size $O(1/\varepsilon^{(d-1)/2})$ such that we can answer in polylogarithmic time whether $A$ and $B$ intersect approximately. More generally, we can answer this for the images of $A$ and $B$ under affine transformations. Next, we show how to $\varepsilon$-approximate the Minkowski sum of two given polytopes defined as the intersection of $n$ halfspaces in $O(n \log(1/\varepsilon) ...
Range searching is a widely-used method in computational geometry for efficiently accessing local... more Range searching is a widely-used method in computational geometry for efficiently accessing local regions of a large data set. Typically, range searching involves either counting or reporting the points lying within a given query region, but it is often desirable to compute statistics that better describe the structure of the point set lying within the region, not just the count. In this paper we consider the geometric minimum spanning tree (MST) problem in the context of range searching where approximation is allowed. We are given a set P of n points in R^d. The objective is to preprocess P so that given an admissible query region Q, it is possible to efficiently approximate the weight of the minimum spanning tree of the subset of P lying within Q. There are two natural sources of approximation error, first by treating Q as a fuzzy object and second by approximating the MST weight itself. To model this, we assume that we are given two positive real approximation parameters eps_q an...
Given a training set $P$ of labeled points, the nearest-neighbor rule predicts the class of an un... more Given a training set $P$ of labeled points, the nearest-neighbor rule predicts the class of an unlabeled query point as the label of its closest point in the set. To improve the time and space complexity of classification, a natural question is how to reduce the training set without significantly affecting the accuracy of the nearest-neighbor rule. Nearest-neighbor condensation deals with finding a subset $R \subseteq P$ such that for every point $p \in P$, $p$'s nearest-neighbor in $R$ has the same label as $p$. This relates to the concept of coresets, which can be broadly defined as subsets of the set, such that an exact result on the coreset corresponds to an approximate result on the original set. However, the guarantees of a coreset hold for any query point, and not only for the points of the training set. This paper introduces the concept of coresets for nearest-neighbor classification. We extend existing criteria used for condensation, and prove sufficient conditions to c...
Uploads
Papers by David Mount