Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/1873601.1873695acmconferencesArticle/Chapter ViewAbstractPublication PagessodaConference Proceedingsconference-collections
research-article

A locality-sensitive hash for real vectors

Published: 17 January 2010 Publication History

Abstract

We present a simple and practical algorithm for the c--approximate near neighbor problem (c--NN): given n points P ⊂ Rd and radius R, build a data structure which, given q ∈ Rd, can with probability 1 -- δ return a point p ε P with dist(p, q) ≤ cR if there is any p* ε P with dist(p*, q) ≤ R. For c = d + 1, our algorithm deterministically (δ = 0) preprocesses in time O(nd log d), space O(dn), and answers queries in expected time O(d2); this is the first known algorithm to deterministically guarantee an O(d)---NN solution in constant time with respect to n for all lp metrics. A probabilistic version empirically achieves useful c values (c < 2) where c appears to grow minimally as d → ∞. A query time of O(d log d) is available, providing slightly less accuracy. These techniques can also be used to approximately find (pointers between) all pairs x, y ε P with dist(x, y) ≤ R in time O(nd log d).
The key to the algorithm is a locality-sensitive hash: a mapping h: RdU with the property that h(x) = h(y) is much more likely for nearby x, y. We introduce a somewhat regular simplex which tessellates Rd, and efficiently hash each point in any simplex of this tessellation to all d + 1 corners; any points in neighboring cells will be hashed to a shared corner and noticed as nearby points. This method is completely independent of dimension reduction, so that additional space and time savings are available by first reducing all input vectors.

References

[1]
D. Achlioptas. Database-friendly random projections, Symposium on principles of database systems, 2001.
[2]
A. Andoni and P. Indyk, Near-optimal hashing algorithm for approximate nearest neighbor in high dimensions, Communications of the ACM, 51:117--122, 2008.
[3]
A. Broder, M. Charikar, A. Frieze and M. Mitzenmacher, Min-wise independent permutations, Proceedings of the 30th annual ACM symposium on theory of computing, 1998.
[4]
T. M. Chan, Approximate nearest neighbor queries revisited, Disc. Comp. Geom., 20(1998), pp. 359--373.
[5]
T. M. Chan, Closest-point problems simplified on the RAM, Proceedings of the ACM-SIAM Symposium on discrete algorithms, 2002.
[6]
J. H. Conway and N. J. A. Sloane, Sphere packings, lattices, and groups, Springer, 1993.
[7]
M. Datar, N. Immorlica, P. Indyk and V. Mirrokni, Locality-sensitive hashing scheme based on p-stable distributions, Proceedings of the 20th annual symposium on computational geometry, 2004.
[8]
A. Gionis, P. Indyk and R. Motwani, Similarity search in high dimensions via hashing, Proceedings of the 25th international conference on very large databases, 1999.
[9]
V. Guruswami, J. Lee and A. Razborov, Almost Euclidean subspaces of l 1 via expander codes, Proceedings of the ACM-SIAM Symposium on discrete algorithms, 2008.
[10]
P. Indyk, Uncertainty principles, extractors, and explicit embeddings of l 2 into l 1, Proceedings of the 39th annual ACM symposium on theory of computing, 2007.
[11]
P. Indyk and R. Motwani, Approximate nearest neighbor: toward removing the curse of dimensionality, Proceedings of the 35th IEEE Symposium on Foundations of Computer Science, 1998.
[12]
W. Johnson and J. Lindenstrauss, Extensions of Lip-schitz maps into a Hilbert space, Contemp. Math., 26(1984), pp. 189--206.
[13]
G. Manku, A. Jain and A. Sarma, Detecting near-duplicates for web crawling, Proceedings of the 16th international conference on WWW, 2007.
[14]
R. Panigrahy, Entropy-based nearest neighbor algorithm in high dimensions, Proceedings of the ACM-SIAM Symposium on discrete algorithms, 2006.
[15]
H. Samet, Foundations of Multidimensional and Metric Data Structures, Elsevier, 2006.

Cited By

View all
  • (2014)Optimal Lower Bounds for Locality-Sensitive Hashing (Except When q is Tiny)ACM Transactions on Computation Theory10.1145/25782216:1(1-13)Online publication date: 1-Mar-2014
  • (2010)Efficient SINR queries for CSMA/CA simulationProceedings of the 13th ACM international conference on Modeling, analysis, and simulation of wireless and mobile systems10.1145/1868521.1868532(59-62)Online publication date: 17-Oct-2010
  1. A locality-sensitive hash for real vectors

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SODA '10: Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete algorithms
      January 2010
      1690 pages
      ISBN:9780898716986

      Sponsors

      Publisher

      Society for Industrial and Applied Mathematics

      United States

      Publication History

      Published: 17 January 2010

      Check for updates

      Qualifiers

      • Research-article

      Acceptance Rates

      SODA '10 Paper Acceptance Rate 135 of 445 submissions, 30%;
      Overall Acceptance Rate 411 of 1,322 submissions, 31%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)2
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 15 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2014)Optimal Lower Bounds for Locality-Sensitive Hashing (Except When q is Tiny)ACM Transactions on Computation Theory10.1145/25782216:1(1-13)Online publication date: 1-Mar-2014
      • (2010)Efficient SINR queries for CSMA/CA simulationProceedings of the 13th ACM international conference on Modeling, analysis, and simulation of wireless and mobile systems10.1145/1868521.1868532(59-62)Online publication date: 17-Oct-2010

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media