No abstract available.
Front Matter
Front Matter
Finding HSP Neighbors via an Exact, Hierarchical Approach
The Half Space Proximal (HSP) graph is a low out-degree monotonic graph with wide applications in various domains, including combinatorial optimization in strings, enhancing kNN classification, simplifying chemical networks, estimating local ...
Approximate Similarity Search for Time Series Data Enhanced by Section Min-Hash
Dynamic Time Warping (DTW) is a well-known similarity measure between time series data. Although DTW can calculate the similarity between time series with different lengths, it is computationally expensive. Therefore, fast algorithms that ...
An Alternating Optimization Scheme for Binary Sketches for Cosine Similarity Search
Searching for similar objects in intrinsically high-dimensional data sets is a challenging task. Sketches have been proposed for faster similarity search using linear scans. Binary sketches are one such approach to find a good mapping from the ...
Unbiased Similarity Estimators Using Samples
Computing a similarity measure (or a distance) between two complex objects is a fundamental building block for a huge number of applications in a wide variety of domains. Since many tasks involve computing such similarities among many pairs of ...
Retrieve-and-Rank End-to-End Summarization of Biomedical Studies
An arduous biomedical task involves condensing evidence derived from multiple interrelated studies, given a context as input, to generate reviews or provide answers autonomously. We named this task context-aware multi-document summarization (CA-...
Fine-Grained Categorization of Mobile Applications Through Semantic Similarity Techniques for Apps Classification
The number of Android apps is constantly on the rise. Existing stores allow selecting apps from general named categories. To prevent miscategorization and facilitate user selection of the appropriate app, a closer examination of the categories’ ...
Turbo Scan: Fast Sequential Nearest Neighbor Search in High Dimensions
This paper introduces Turbo Scan (TS), a novel k-nearest neighbor search solution tailored for high-dimensional data and specific workloads where indexing can’t be efficiently amortized over time. There exist situations where the overhead of index ...
The Dataset-Similarity-Based Approach to Select Datasets for Evaluation in Similarity Retrieval
Most papers on similarity retrieval present experiments executed on an assortion of complex datasets. However, no work focuses on analyzing the selection of datasets to evaluate the techniques proposed in the related literature. Ideally, the ...
Suitability of Nearest Neighbour Indexes for Multimedia Relevance Feedback
User relevance feedback (URF) is emerging as an important component of the multimedia analytics toolbox. State-of-the-art URF systems employ high-dimensional vectors of semantic features and train linear-SVM classifiers in each round of ...
Accelerating k-Means Clustering with Cover Trees
The k-means clustering algorithm is a popular algorithm that partitions data into k clusters. There are many improvements to accelerate the standard algorithm. Most current research employs upper and lower bounds on point-to-cluster distances and ...
Is Quantized ANN Search Cursed? Case Study of Quantifying Search and Index Quality
Traditional evaluation of an approximate high-dimensional index typically consists of running a benchmark with known ground truth, analyzing the performance in terms of traditional result quality and latency measures, and then comparing those ...
Minwise-Independent Permutations with Insertion and Deletion of Features
The seminal work of Broder et al. [5] introduces the algorithm that computes a low-dimensional sketch of high-dimensional binary data that closely approximates pairwise Jaccard similarity. Since its invention, has been commonly ...
SDOclust: Clustering with Sparse Data Observers
Sparse Data Observers (SDO) is an unsupervised learning approach developed to cover the need for fast, highly interpretable and intuitively parameterizable anomaly detection. We present SDOclust, an extension that performs clustering while ...
Vec2Doc: Transforming Dense Vectors into Sparse Representations for Efficient Information Retrieval
The rapid development of deep learning and artificial intelligence has transformed our approach to solving scientific problems across various domains, including computer vision, natural language processing, and automatic content generation. ...
Similarity Search with Multiple-Object Queries
Within the topic of similarity search, all work we know assumes that search is based on a dissimilarity space, where a query comprises a single object in the space.
Here, we examine the possibility of a multiple-object query. There are at least ...
Diversity Similarity Join for Big Data
The Similarity Join (SJ) has become one of the most popular and valuable data processing operators in analyzing large amounts of data. Various types of similarity join operators have been effectively used in multiple scenarios. However, these ...
Front Matter
Overview of the SISAP 2023 Indexing Challenge
This manuscript presents the premiere SISAP 2023 Indexing Challenge, which seeks replicable and competitive solutions in the realm of approximate similarity search algorithms. Our aim is recall, all while optimizing build time, search time, and ...
Enhancing Approximate Nearest Neighbor Search: Binary-Indexed LSH-Tries, Trie Rebuilding, and Batch Extraction
Locality-Sensitive-Hashing (LSH) plays a crucial role in approximate nearest neighbour search and similarity-based queries. In this paper, we present a study on the performance of LSH for indexing and searching high-dimensional binary vectors ...
General and Practical Tuning Method for Off-the-Shelf Graph-Based Index: SISAP Indexing Challenge Report by Team UTokyo
Despite the efficacy of graph-based algorithms for Approximate Nearest Neighbor (ANN) searches, the optimal tuning of such systems remains unclear. This study introduces a method to tune the performance of off-the-shelf graph-based indexes, ...
SISAP 2023 Indexing Challenge – Learned Metric Index
This submission into the SISAP Indexing Challenge examines the experimental setup and performance of the Learned Metric Index, which uses an architecture of interconnected learned models to answer similarity queries. An inherent part of this ...
Computational Enhancements of HNSW Targeted to Very Large Datasets
The Hierarchical Navigable Small World (HNSW) Graph is a graph-based approximate similarity search algorithm that achieves fast and accurate search through a hierarchical structure providing long-range and short-range links. The HNSW remains as a ...
CRANBERRY: Memory-Effective Search in 100M High-Dimensional CLIP Vectors
Recent advances in cross-modal multimedia data analysis necessarily require efficient similarity search on the scales of hundreds of millions of high-dimensional vectors. We address this task by proposing the CRANBERRY algorithm that specifically ...
Recommendations
Web search: finding information in billions of pages
RIAO '04: Coupling approaches, coupling media and coupling languages for information retrievalInformation retrieval, especially in the context of the Web, presents a host of challenges that must be addressed in order to better help people find relevant information in a growing sea of text. Such challenges include not only important issues in ...
Nouvelles applications du principe d'inversion dans le calcul analogique experimental
L'article rappelle une methode d'application du principe de l'inversion dans le procede analogique de representation de champs laplaciens sur cuves rheoelectriques permettant d'agrandir la partie la plus interessante du domaine etudie et de faire subir ...