research-article

A metric cache for similarity search

Authors:

Fabrizio Falchi,

Claudio Lucchese,

Salvatore Orlando,

Raffaele Perego,

Fausto RabittiAuthors Info & Claims

LSDS-IR '08: Proceedings of the 2008 ACM workshop on Large-Scale distributed systems for information retrieval

Pages 43 - 50

https://doi.org/10.1145/1458469.1458473

Published: 30 October 2008 Publication History

Abstract

Similarity search in metric spaces is a general paradigm that can be used in several application fields. It can also be effectively exploited in content-based image retrieval systems, which are shifting their target towards the Web-scale dimension. In this context, an important issue becomes the design of scalable solutions, which combine parallel and distributed architectures with caching at several levels.

To this end, we investigate the design of a similarity cache that works in metric spaces. It is able to answer with exact and approximate results: even when an exact match is not present in cache, our cache may return an approximate result set with quality guarantees. By conducting tests on a collection of one million high-quality digital photos, we show that the proposed caching techniques can have a significant impact on performance, like caching on text queries has been proved effective for traditional Web search engines.

References

[1]

C. Böhm, S. Berchtold, and D. A. Keim. Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases. ACM Comput. Surveys, 33(3):322--373, 2001.

Digital Library

[2]

T. Bozkaya and M. Ozsoyoglu. Indexing large metric spaces for similarity search queries. ACM Trans. Database Syst., 24(3):361--404, 1999.

Digital Library

[3]

E. Chávez, G. Navarro, R. Baeza-Yates, and J. L. Marroquín. Searching in metric spaces. ACM Comp. Surveys, 33(3):273--321, 2001.

Digital Library

[4]

R. Datta, D. Joshi, J. Li, and J. Z. Wang. Image retrieval: Ideas, in uences, and trends of the new age. ACM Computing Surveys, 2007.

Digital Library

[5]

T. Fagni, R. Perego, F. Silvestri, and S. Orlando. Boosting the performance of web search engines: Caching and prefetching query results by exploiting historical usage data. ACM Trans. Inf. Syst., 24(1):51--78, 2006.

Digital Library

[6]

H. Ferhatosmanoglu, E. Tuncel, D. Agrawal, and A. El Abbadi. Approximate nearest neighbor searching in multimedia databases. In Proc. of 17th ICDE, 2001.

Digital Library

[7]

ISO/IEC. Information technology - Multimedia content description interfaces. Part 6: Reference Software, 2003. 15938--6:2003.

[8]

R. Lempel and S. Moran. Predictive caching and prefetching of query results in search engines. In Proc. of the 12th WWW Conference, pages 19--28, New York, NY, USA, 2003. ACM Press.

Digital Library

[9]

P. Lyman and H. R. Varian. How much information, 2003. http://www.sims.berkeley.edu/how-much-info-2003.

[10]

E. P. Markatos. On Caching Search Engine Query Results. Computer Communications, 24(2):137--143, 2001.

Digital Library

[11]

S. Podlipnig and L. Boszormenyi. A survey of web cache replacement strategies. ACM Comput. Surv., 35(4):374--398, 2003.

Digital Library

[12]

P. Salembier and T. Sikora. Introduction to MPEG-7: Multimedia Content Description Interface. John Wiley & Sons, Inc., New York, NY, USA, 2002.

Digital Library

[13]

H. Samet. Foundations of Multidimensional and Metric Data Structures. Computer Graphics and Geometric Modeling. Morgan Kaufmann Pub., CA, USA, 2006.

Digital Library

[14]

C. Silverstein, H. Marais, M. Henzinger, and M. Moricz. Analysis of a very large web search engine query log. SIGIR Forum, 33(1):6--12, 1999.

Digital Library

[15]

R. Weber, H.-J. Schek, and S. Blott. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In Proc. of 24th VLDB, pages 194--205, 1998.

Digital Library

[16]

Y. Xie and D. O'Hallaron. Locality in search engine queries and its implications for caching. In Proceedings of 21st IEEE INFOCOM, 2002.

[17]

P. Zezula, G. Amato, V. Dohnal, and M. Batko. Similarity SearchThe Metric Space Approach, volume 32 of Advances in Database Systems. NY, USA, 2006.

Digital Library

Cited By

Sponner MWaschneck BKumar A(2024)Adapting Neural Networks at Runtime: Current Trends in At-Runtime Optimizations for Deep LearningACM Computing Surveys10.1145/365728356:10(1-40)Online publication date: 14-May-2024
https://dl.acm.org/doi/10.1145/3657283
Si Salem TCastellano GNeglia GPianese FAraldo A(2024)Toward Inference Delivery Networks: Distributing Machine Learning With Optimality GuaranteesIEEE/ACM Transactions on Networking10.1109/TNET.2023.330592232:1(859-873)Online publication date: Feb-2024
https://doi.org/10.1109/TNET.2023.3305922
Ben Mazziane YAlouf SNeglia GMenasche D(2024)TTL model for an LRU-based similarity caching policyComputer Networks10.1016/j.comnet.2024.110206241(110206)Online publication date: Mar-2024
https://doi.org/10.1016/j.comnet.2024.110206
Show More Cited By

Index Terms

A metric cache for similarity search
1. Information systems
  1. Information retrieval
    1. Information retrieval query processing

Recommendations

Caching content-based queries for robust and efficient image retrieval
EDBT '09: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology

In order to become an effective complement to traditional Web-scale text-based image retrieval solutions, content-based image retrieval must address scalability and efficiency issues. In this paper we investigate the possibility of caching the answers ...
Location cache: a low-power L2 cache system
ISLPED '04: Proceedings of the 2004 international symposium on Low power electronics and design

While set-associative caches incur fewer misses than direct-mapped caches, they typically have slower hit times and higher power consumption, when multiple tag and data banks are probed in parallel. This paper presents the location cache structure which ...
Criticality aware tiered cache hierarchy: a fundamental relook at multi-level cache hierarchies
ISCA '18: Proceedings of the 45th Annual International Symposium on Computer Architecture

On-die caches are a popular method to help hide the main memory latency. However, it is difficult to build large caches without substantially increasing their access latency, which in turn hurts performance. To overcome this difficulty, on-die caches ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

LSDS-IR '08: Proceedings of the 2008 ACM workshop on Large-Scale distributed systems for information retrieval

October 2008

90 pages

ISBN:9781605582542

DOI:10.1145/1458469

Program Chairs:
Sebastian Michel
Ecole Polytechnique Fédérale de Lausanne, Switzerland
,
Gleb Skobeltsyn
Ecole Polytechnique Fédérale de Lausanne, Switzerland
,
Wai Gen Yee
Illinois Institute of Technology, Chicago, USA

Copyright © 2008 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 October 2008

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

CIKM08

Sponsor:

CIKM08: Conference on Information and Knowledge Management

October 30, 2008

California, Napa Valley, USA

Acceptance Rates

Overall Acceptance Rate 3 of 5 submissions, 60%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

34
Total Citations
View Citations
301
Total Downloads

Downloads (Last 12 months)18
Downloads (Last 6 weeks)2

Reflects downloads up to 09 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Sponner MWaschneck BKumar A(2024)Adapting Neural Networks at Runtime: Current Trends in At-Runtime Optimizations for Deep LearningACM Computing Surveys10.1145/365728356:10(1-40)Online publication date: 14-May-2024
https://dl.acm.org/doi/10.1145/3657283
Si Salem TCastellano GNeglia GPianese FAraldo A(2024)Toward Inference Delivery Networks: Distributing Machine Learning With Optimality GuaranteesIEEE/ACM Transactions on Networking10.1109/TNET.2023.330592232:1(859-873)Online publication date: Feb-2024
https://doi.org/10.1109/TNET.2023.3305922
Ben Mazziane YAlouf SNeglia GMenasche D(2024)TTL model for an LRU-based similarity caching policyComputer Networks10.1016/j.comnet.2024.110206241(110206)Online publication date: Mar-2024
https://doi.org/10.1016/j.comnet.2024.110206
Si Salem TNeglia GCarra D(2023)Ascent Similarity Caching With Approximate IndexesIEEE/ACM Transactions on Networking10.1109/TNET.2022.321701231:3(1173-1186)Online publication date: Jun-2023
https://doi.org/10.1109/TNET.2022.3217012
Frieder OMele IMuntean CNardini FPerego RTonellotto N(2022)Caching Historical Embeddings in Conversational SearchACM Transactions on the Web10.1145/357851918:4(1-19)Online publication date: 29-Dec-2022
https://dl.acm.org/doi/10.1145/3578519
Sabnis ASalem TNeglia GGaretto MLeonardi ESitaraman R(2022)GRADES: Gradient Descent for Similarity CachingIEEE/ACM Transactions on Networking10.1109/TNET.2022.3187044(1-12)Online publication date: 2022
https://doi.org/10.1109/TNET.2022.3187044
Finamore ARoberts JGallo MRossi D(2022)Accelerating Deep Learning Classification with Error-controlled Approximate-key CachingIEEE INFOCOM 2022 - IEEE Conference on Computer Communications10.1109/INFOCOM48880.2022.9796677(2118-2127)Online publication date: 2-May-2022
https://doi.org/10.1109/INFOCOM48880.2022.9796677
Mazziane YAlouf SNeglia GMenasche D(2022)Computing the Hit Rate of Similarity CachingGLOBECOM 2022 - 2022 IEEE Global Communications Conference10.1109/GLOBECOM48099.2022.10000890(141-146)Online publication date: 4-Dec-2022
https://doi.org/10.1109/GLOBECOM48099.2022.10000890
Sabnis ASalem TNeglia GGaretto MLeonardi ESitaraman R(2021)GRADES: Gradient Descent for Similarity CachingIEEE INFOCOM 2021 - IEEE Conference on Computer Communications10.1109/INFOCOM42981.2021.9488757(1-10)Online publication date: 10-May-2021
https://doi.org/10.1109/INFOCOM42981.2021.9488757
Carra DNeglia G(2021)Taking two Birds with one k-NN Cache2021 IEEE Global Communications Conference (GLOBECOM)10.1109/GLOBECOM46510.2021.9685954(1-6)Online publication date: 7-Dec-2021
https://dl.acm.org/doi/10.1109/GLOBECOM46510.2021.9685954
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents