Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1007/978-3-031-46994-7_26guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

CRANBERRY: Memory-Effective Search in 100M High-Dimensional CLIP Vectors

Published: 27 October 2023 Publication History

Abstract

Recent advances in cross-modal multimedia data analysis necessarily require efficient similarity search on the scales of hundreds of millions of high-dimensional vectors. We address this task by proposing the CRANBERRY algorithm that specifically combines and tunes several existing similarity search strategies. In particular, the algorithm: (1) employs the Voronoi partitioning to obtain a query-relevant candidate set in constant time, (2) applies filtering techniques to prune the obtained candidates significantly, and (3) re-rank the retained candidate vectors with respect to the query vector. Applied to the dataset of 100 million 768-dimensional vectors, the algorithm evaluates 10NN queries with 90 % recall and query latency of 1.2 s on average, all with a throughput of 15 queries per second on a server with 56 core-CPU, and 4.7 q/sec. on a PC.

References

[1]
Amsaleg, L., et al.: Estimating local intrinsic dimensionality. In: Proceedings of the International Conference on Knowledge Discovery and Data Mining, pp. 29–38. KDD 2015, ACM, USA (2015)
[2]
Mic, V.: Binary Sketches for Similarity Search. Dissertation thesis, Masaryk University Brno (2020). https://theses.cz/id/c7kstr/. supervisor: Pavel Zezula
[3]
Mic, V., Novak, D., Zezula, P.: Improving sketches for similarity search. In: Tenth Doctoral Workshop on Mathematical and Engineering Methods in Computer Science (MEMICS 2015), pp. 45–57 (2015)
[4]
Mic, V., Novak, D., Zezula, P.: Designing sketches for similarity filtering. In: 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW), pp. 655–662 (2016)
[5]
Mic, V., Novak, D., Zezula, P.: Binary sketches for secondary filtering. ACM Trans. Inf. Syst. 37(1), 1:1–1:28 (2018)
[6]
Mic, V., Zezula, P.: Concept of relational similarity search. In: Skopal, T., Falchi, F., Lokoc, J., Sapino, M.L., Bartolini, I., Patella, M. (eds.) Similarity Search and Applications. SISAP 2022. LNCS, vol. 13590. Springer, Cham (2022).
[7]
Mic, V., Zezula, P.: Filtering with relational similarity (2023)
[8]
Tellez, E.S., Aumüller, M., Chavez, E.: Overview of the SISAP 2023 indexing challenges. In: Similarity Search and Applications: 16th International Conference, SISAP 2023, A Coruña Spain, October 9–11, Proceedings. Springer (2023)
[9]
Webster, R., Rabin, J., Simon, L., Jurie, F.: On the de-duplication of LAION-2B (2023)
[10]
Zezula, P., Amato, G., Dohnal, V., Batko, M.: Similarity Search - The Metric Space Approach, vol. 32. Springer (2006).

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
Similarity Search and Applications: 16th International Conference, SISAP 2023, A Coruña, Spain, October 9–11, 2023, Proceedings
Oct 2023
324 pages
ISBN:978-3-031-46993-0
DOI:10.1007/978-3-031-46994-7

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 27 October 2023

Author Tags

  1. approximate similarity searching
  2. high-dimensional data
  3. indexing
  4. filtering
  5. LAION dataset

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 27 Jan 2025

Other Metrics

Citations

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media