Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1862344.1862355acmotherconferencesArticle/Chapter ViewAbstractPublication PagessisapConference Proceedingsconference-collections
research-article

On locality sensitive hashing in metric spaces

Published: 18 September 2010 Publication History

Abstract

Modeling proximity search problems as a metric space provides a general framework usable in many areas, like pattern recognition, web search, clustering, data mining, knowledge management, textual and multimedia information retrieval, to name a few. Metric indexes have been improved over the years and many instances of the problem can be solved efficiently. However, when very large/high dimensional metric databases are indexed exact approaches are not yet capable of solving efficiently the problem, the performance in these circumstances is degraded to almost sequential search.
To overcome the above limitation, non-exact proximity searching algorithms can be used to give answers that either in probability or in an approximation factor are close to the exact result. Approximation is acceptable in many contexts, specially when human judgement about closeness is involved.
In vector spaces, on the other hand, there is a very successful approach dubbed Locality Sensitive Hashing which consist in making a succinct representation of the objects. This succinct representation is relatively insensitive to small variations of the locality. Unfortunately, the hashing function have to be carefully designed, very close to the data model, and different functions are used when objects come from different domains.
In this paper we give a new schema to encode objects in a general metric space with a uniform framework, independent from the data model. Finally, we provide experimental support to our claims using several real life databases with different data models and distance functions obtaining excellent results in both the speed and the recall sense, specially for large databases.

References

[1]
}}Giuseppe Amato and Pasquale Savino. Approximate similarity search in metric spaces using inverted files. In InfoScale '08: Proceedings of the 3rd international conference on Scalable information systems, pages 1--10, ICST, Brussels, Belgium, Belgium, 2008. ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering).
[2]
}}Alexandr Andoni and Piotr Indyk. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM, 51(1):117--122, 2008.
[3]
}}Christian Böhm, Stefan Berchtold, and Daniel A. Keim. Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases. ACM Computing Surveys, 33(3):322--373, 2001.
[4]
}}Paolo Bolettieri, Andrea Esuli, Fabrizio Falchi, Claudio Lucchese, Raffaele Perego, Tommaso Piccioli, and Fausto Rabitti. Cophir: a test collection for content-based image retrieval. CoRR, abs/0905.4627v2, 2009.
[5]
}}B. Bustos, G. Navarro, and E. Chávez. Pivot selection techniques for proximity searching in metric spaces. Pattern Recognition Letters, 24(14):2357--2366, 2003.
[6]
}}Edgar Chavez, Karina Figueroa, and Gonzalo Navarro. Effective proximity retrieval by ordering permutations. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(9):1647--1658, September 2008.
[7]
}}Edgar Chávez, Gonzalo Navarro, Ricardo Baeza-Yates, and José Luis Marroquín. Searching in metric spaces. ACM Comput. Surv., 33(3):273--321, 2001.
[8]
}}Andrea Esuli. Pp-index: Using permutation prefixes for efficient and scalable approximate similarity search. In LSDS-IR 2009 Workshop, 2009.
[9]
}}Aristides Gionis, Piotr Indyk, and Rajeev Motwani. Similarity search in high dimensions via hashing. In VLDB '99: Proceedings of the 25th International Conference on Very Large Data Bases, pages 518--529, San Francisco, CA, USA, 1999. Morgan Kaufmann Publishers Inc.
[10]
}}M. Patella and P. Ciaccia. Approximate similarity search: A multi-faceted problem. Journal of Discrete Algorithms, 7(1):36--48, 2009.
[11]
}}Hanan Samet. Foundations of Multidimensional and Metric Data Structures. Morgan Kaufmann Publishers, 2006.
[12]
}}Eric Sadit Tellez, Edgar Chavez, and Antonio Camarena-Ibarrola. A brief index for proximity searching. In Proceedings of 14th Iberoamerican Congress on Pattern Recognition CIARP 2009, 2009.

Cited By

View all
  • (2023)Quality-diversity in dissimilarity spacesProceedings of the Genetic and Evolutionary Computation Conference10.1145/3583131.3590409(1009-1018)Online publication date: 15-Jul-2023
  • (2023)Overview of the SISAP 2023 Indexing ChallengeSimilarity Search and Applications10.1007/978-3-031-46994-7_21(255-264)Online publication date: 27-Oct-2023
  • (2019)Locality-sensitive hashing of permutations for proximity searchingJournal of Intelligent & Fuzzy Systems10.3233/JIFS-179017(1-8)Online publication date: 15-Apr-2019
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
SISAP '10: Proceedings of the Third International Conference on SImilarity Search and APplications
September 2010
130 pages
ISBN:9781450304207
DOI:10.1145/1862344
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

  • Bilkent University: Bilkent University
  • Mexican Computer Science Society

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 September 2010

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

SISAP '10
Sponsor:
  • Bilkent University

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 28 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Quality-diversity in dissimilarity spacesProceedings of the Genetic and Evolutionary Computation Conference10.1145/3583131.3590409(1009-1018)Online publication date: 15-Jul-2023
  • (2023)Overview of the SISAP 2023 Indexing ChallengeSimilarity Search and Applications10.1007/978-3-031-46994-7_21(255-264)Online publication date: 27-Oct-2023
  • (2019)Locality-sensitive hashing of permutations for proximity searchingJournal of Intelligent & Fuzzy Systems10.3233/JIFS-179017(1-8)Online publication date: 15-Apr-2019
  • (2019)Local Sensitive Hashing for Proximity SearchingAdvances in Soft Computing10.1007/978-3-030-33749-0_21(251-261)Online publication date: 27-Oct-2019
  • (2018)Querying Metric Spaces with Bit OperationsSimilarity Search and Applications10.1007/978-3-030-02224-2_3(33-46)Online publication date: 4-Oct-2018
  • (2017)Distance-Based Index Structures for Fast Similarity SearchCybernetics and Systems Analysis10.1007/s10559-017-9966-y53:4(636-658)Online publication date: 1-Jul-2017
  • (2015)Near neighbor searching with K nearest referencesInformation Systems10.1016/j.is.2015.02.00151:C(43-61)Online publication date: 1-Jul-2015
  • (2015)Quantized ranking for permutation-based indexingInformation Systems10.1016/j.is.2015.01.00952:C(163-175)Online publication date: 1-Aug-2015
  • (2014)Large-Scale Distributed Locality-Sensitive Hashing for General Metric DataSimilarity Search and Applications10.1007/978-3-319-11988-5_8(82-93)Online publication date: 2014
  • (2014)Metric Space Searching Based on Random Bisectors and Binary FingerprintsSimilarity Search and Applications10.1007/978-3-319-11988-5_5(50-57)Online publication date: 2014
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media