Skip to main content

Vlastislav Dohnal

Masaryk University, Department of Computer Systems and Communications, Faculty Member

Followers

13

Following

4

Co-author

1

Public Views

Address: Brno, Jihomoravsky kraj, Czech Republic

less

InterestsView All (6)

Uploads

Papers by Vlastislav Dohnal

Indexing Metric Spaces

Springer eBooks, 2014

The Encyclopedia of Databases, a comprehensive work, provides easy access to relevant information... more The Encyclopedia of Databases, a comprehensive work, provides easy access to relevant information on all aspects of very large databases. This encyclopedia features alphabetical organization of concepts covering main areas of very large databases. These 1000 entries offer convenient access to information in the field of databases with definitions and illustrations of basic terminology, concepts, methods, and algorithms, references to literature, and cross-references to other entries and journal articles. Topics for the encyclopedia were selected by a distinguished international advisory board, and written by world class experts in the field. The Encyclopedia of Databases is designed to meet the needs of research scientists, professors and graduate-level students in computer science and engineering. This encyclopedia is also suitable for practitioners in industry.

Data-Driven Learned Metric Index: An Unsupervised Approach

Lecture Notes in Computer Science, 2021

Reproducible experiments with Learned Metric Index Framework

Information Systems, Sep 1, 2023

Learned Metric Index — Proposition of learned indexing for unstructured data

Information Systems, Sep 1, 2021

Abstract The main paradigm of similarity searching in metric spaces has remained mostly unchanged... more Abstract The main paradigm of similarity searching in metric spaces has remained mostly unchanged for decades – data objects are organized into a hierarchical structure according to their mutual distances, using representative pivots to reduce the number of distance computations needed to efficiently search the data. We propose an alternative to this paradigm, using machine learning models to replace pivots, thus posing similarity search as a classification problem, which stands in for numerous expensive distance computations. Even a relatively naive implementation of this idea is more than competitive with state-of-the-art methods in terms of speed and recall, proving the concept as viable and showing great potential for its future development.

Learned Indexing in Proteins: Substituting Complex Distance Calculations with Embedding and Clustering Techniques

Lecture Notes in Computer Science, 2022

Separable splits in metric data sets

Proceedings of 9-th Italian Symposium on Advanced Database Systems, Venice, Italy, Jun 27, 2001

Popularity-Based Ranking for Fast Approximate kNN Search

Informatica (lithuanian Academy of Sciences), 2017

Learned Indexing in Proteins: Extended Work on Substituting Complex Distance Calculations with Embedding and Clustering Techniques

arXiv (Cornell University), Aug 18, 2022

EuDML Assessment and Evaluation — Final Report

HAL (Le Centre pour la Communication Scientifique Directe), May 13, 2013

Developing Similarity Search

Similarity searching has become more and more popular, which was stimulated by the growth of dive... more Similarity searching has become more and more popular, which was stimulated by the growth of diverse data archives available on-line that offer search services to users, and by the increasing complexity of data that must be searched. This issue has also been recognized by major Internet search engines, exemplified by Google, that re-cently enriched their image search services by allowing users to search for images by similarity. They usually apply the following procedure. Firstly, a candidate set of im-ages is obtained by a regular text search in images ’ file names and associated textual tags. Then this set is reordered by images ’ content, expressed as color histograms, for example. Finally, this result is presented to the user. In this thesis, we focus on similarity searching – content-based retrieval. In this area, data items are retrieved by their content rather than by textual information associated with them. For example, images are searched by comparing their color histogram...

Motion Words: A Text-Like Representation of 3D Skeleton Sequences

Lecture Notes in Computer Science, 2020

Similarity Search The Metric Space Approach

Advances in Database Systems, 2006

Indexing Structures for Searching in Metric Spaces

A nozzle shut-off valve for injection molding machine for plastic material, especially thermoplas... more A nozzle shut-off valve for injection molding machine for plastic material, especially thermoplastic material, has two pneumatic cylinder-and-plunger units. One such unit has its plunger mounted to reciprocate so as to block the passageway of the plastics through the nozzle. The first such plunger and cylinder unit is provided with a pilot passage in the valve nozzle so that the pressure of molten plastic can be used to open it. The second such unit is much smaller diameter and has its plunger mounted to block the pilot passage so that the second unit in effect becomes a pilot valve and controls the application of fluid pressure through the pilot passage to the first unit plunger or blocking plunger.

Building Self-Organized Image Retrieval Network

We propose a self-organized content-based Image Retrieval Network (IRN) that is inspired by a Met... more We propose a self-organized content-based Image Retrieval Network (IRN) that is inspired by a Metric Social Network (MSN) search system. The proposed network model is strictly data-owner oriented so no data redistribution among peers is needed in order to efficiently process queries. Thus a shared database where each peer is fully in charge of its data, is created. The self-organization of the network is obtained by exploiting the social-network approach of the MSN – the connections between peers in the network are created as social-network relationships formed on the basis of a queryanswer principle. The knowledge of answers to previous queries is used to fast navigate to peers, possibly containing the best answers to new queries. Additionally, the network uses a randomized mechanism to explore new and unvisited parts of the network. In this way, the self-adaptability and robustness of the system are achieved. The proposed concepts are verified using a real network consisting of 2,...

Metric Similarity Search Implementation Framework (MESSIF)

This implementation framework called MESSIF eases the task of building metric-based similarity-se... more This implementation framework called MESSIF eases the task of building metric-based similarity-searching prototypes. It provides a number of modules from storage management to automatic collecting of performance statistics. Due to its open and modular design it is also easy to implement additional modules if necessary. The MESSIF also offers several ready-to-use generic clients that allow to control and test the index structures and also measure its performance.

Towards Scalability of Similarity Searching

With the increasing number of applications that base searching on similarity rather than on exact... more With the increasing number of applications that base searching on similarity rather than on exact matching, novel index structures are needed to speedup execution of similarity queries. An important stream of research in this direction uses the metric space as a model of similarity. We explain the principles and survey the most important representatives of index structures. We put most emphasis on distributed similarity search architectures which try to solve the difficult problem of scalability of similarity searching. The actual achievements are demonstrated by practical experiments. Future research directions are outlined in the conclusions.

Person Tagging in Still Images by Fusing Face and Full-bodyDetections

We address the problem of organizing personal photo albums by assigning tags/names to people pres... more We address the problem of organizing personal photo albums by assigning tags/names to people present in photographs. Our proposed framework improves similar systems such as Google+ Photos (Picasa) by incorporating not only a face detector but also a full-body detector. Both these modalities are combined together to provide the user with tags of people whose face has not been detected or is not even present in the photograph. An implementation of the proposed framework is evaluated on a sample of real life photographs.

Towards Artificial Priority Queues for Similarity Query Execution

2018 IEEE 34th International Conference on Data Engineering Workshops (ICDEW), 2018

Content-based retrieval in large collections of unstructured data is challenging not only from th... more Content-based retrieval in large collections of unstructured data is challenging not only from the difficulty of the defining similarity between data images where the phenomenon of semantic gap appears, but also the efficiency of execution of similarity queries. Search engines providing similarity search typically organize various multimedia data, e.g. images of a photo stock, and support k-nearest neighbor query. Users accessing such systems then look for data items similar to their specific query object and refine results by re-running the search with an object from the previous query results. This paper is motivated by unsatisfactory query execution performance of indexing structures that use metric space as a convenient data model. We present performance behavior of two state-of-the-art representatives and propose a new universal technique for ordering priority queue of data partitions to be accessed during kNN query evaluation. We verify it in experiments on real-life data-sets.

D-index: Distance Index

D-index je indexacni struktura organizujici data modelovana jako metrický prostor, což umožňuje p... more D-index je indexacni struktura organizujici data modelovana jako metrický prostor, což umožňuje podobnostni hledani. Tato struktura ma statický character s ohledem na pocet kapas a urovni definovaných hasovacimi funkcemi. Tyto funkce musi být navrženy před vytvořenim indexu a naplněnim daty. D-index je pak schopen ukladat libovolne množstvi objektů tak, že kapacita kapes je neomezena.

Organizing Similarity Spaces Using Metric Hulls

Similarity Search and Applications

Indexing Metric Spaces

Springer eBooks, 2014

The Encyclopedia of Databases, a comprehensive work, provides easy access to relevant information... more The Encyclopedia of Databases, a comprehensive work, provides easy access to relevant information on all aspects of very large databases. This encyclopedia features alphabetical organization of concepts covering main areas of very large databases. These 1000 entries offer convenient access to information in the field of databases with definitions and illustrations of basic terminology, concepts, methods, and algorithms, references to literature, and cross-references to other entries and journal articles. Topics for the encyclopedia were selected by a distinguished international advisory board, and written by world class experts in the field. The Encyclopedia of Databases is designed to meet the needs of research scientists, professors and graduate-level students in computer science and engineering. This encyclopedia is also suitable for practitioners in industry.

Data-Driven Learned Metric Index: An Unsupervised Approach

Lecture Notes in Computer Science, 2021

Reproducible experiments with Learned Metric Index Framework

Information Systems, Sep 1, 2023

Learned Metric Index — Proposition of learned indexing for unstructured data

Information Systems, Sep 1, 2021

Abstract The main paradigm of similarity searching in metric spaces has remained mostly unchanged... more Abstract The main paradigm of similarity searching in metric spaces has remained mostly unchanged for decades – data objects are organized into a hierarchical structure according to their mutual distances, using representative pivots to reduce the number of distance computations needed to efficiently search the data. We propose an alternative to this paradigm, using machine learning models to replace pivots, thus posing similarity search as a classification problem, which stands in for numerous expensive distance computations. Even a relatively naive implementation of this idea is more than competitive with state-of-the-art methods in terms of speed and recall, proving the concept as viable and showing great potential for its future development.

Learned Indexing in Proteins: Substituting Complex Distance Calculations with Embedding and Clustering Techniques

Lecture Notes in Computer Science, 2022

Separable splits in metric data sets

Proceedings of 9-th Italian Symposium on Advanced Database Systems, Venice, Italy, Jun 27, 2001

Popularity-Based Ranking for Fast Approximate kNN Search

Informatica (lithuanian Academy of Sciences), 2017

Learned Indexing in Proteins: Extended Work on Substituting Complex Distance Calculations with Embedding and Clustering Techniques

arXiv (Cornell University), Aug 18, 2022

EuDML Assessment and Evaluation — Final Report

HAL (Le Centre pour la Communication Scientifique Directe), May 13, 2013

Developing Similarity Search

Similarity searching has become more and more popular, which was stimulated by the growth of dive... more Similarity searching has become more and more popular, which was stimulated by the growth of diverse data archives available on-line that offer search services to users, and by the increasing complexity of data that must be searched. This issue has also been recognized by major Internet search engines, exemplified by Google, that re-cently enriched their image search services by allowing users to search for images by similarity. They usually apply the following procedure. Firstly, a candidate set of im-ages is obtained by a regular text search in images ’ file names and associated textual tags. Then this set is reordered by images ’ content, expressed as color histograms, for example. Finally, this result is presented to the user. In this thesis, we focus on similarity searching – content-based retrieval. In this area, data items are retrieved by their content rather than by textual information associated with them. For example, images are searched by comparing their color histogram...

Motion Words: A Text-Like Representation of 3D Skeleton Sequences

Lecture Notes in Computer Science, 2020

Similarity Search The Metric Space Approach

Advances in Database Systems, 2006

Indexing Structures for Searching in Metric Spaces

A nozzle shut-off valve for injection molding machine for plastic material, especially thermoplas... more A nozzle shut-off valve for injection molding machine for plastic material, especially thermoplastic material, has two pneumatic cylinder-and-plunger units. One such unit has its plunger mounted to reciprocate so as to block the passageway of the plastics through the nozzle. The first such plunger and cylinder unit is provided with a pilot passage in the valve nozzle so that the pressure of molten plastic can be used to open it. The second such unit is much smaller diameter and has its plunger mounted to block the pilot passage so that the second unit in effect becomes a pilot valve and controls the application of fluid pressure through the pilot passage to the first unit plunger or blocking plunger.

Building Self-Organized Image Retrieval Network

We propose a self-organized content-based Image Retrieval Network (IRN) that is inspired by a Met... more We propose a self-organized content-based Image Retrieval Network (IRN) that is inspired by a Metric Social Network (MSN) search system. The proposed network model is strictly data-owner oriented so no data redistribution among peers is needed in order to efficiently process queries. Thus a shared database where each peer is fully in charge of its data, is created. The self-organization of the network is obtained by exploiting the social-network approach of the MSN – the connections between peers in the network are created as social-network relationships formed on the basis of a queryanswer principle. The knowledge of answers to previous queries is used to fast navigate to peers, possibly containing the best answers to new queries. Additionally, the network uses a randomized mechanism to explore new and unvisited parts of the network. In this way, the self-adaptability and robustness of the system are achieved. The proposed concepts are verified using a real network consisting of 2,...

Metric Similarity Search Implementation Framework (MESSIF)

This implementation framework called MESSIF eases the task of building metric-based similarity-se... more This implementation framework called MESSIF eases the task of building metric-based similarity-searching prototypes. It provides a number of modules from storage management to automatic collecting of performance statistics. Due to its open and modular design it is also easy to implement additional modules if necessary. The MESSIF also offers several ready-to-use generic clients that allow to control and test the index structures and also measure its performance.

Towards Scalability of Similarity Searching

With the increasing number of applications that base searching on similarity rather than on exact... more With the increasing number of applications that base searching on similarity rather than on exact matching, novel index structures are needed to speedup execution of similarity queries. An important stream of research in this direction uses the metric space as a model of similarity. We explain the principles and survey the most important representatives of index structures. We put most emphasis on distributed similarity search architectures which try to solve the difficult problem of scalability of similarity searching. The actual achievements are demonstrated by practical experiments. Future research directions are outlined in the conclusions.

Person Tagging in Still Images by Fusing Face and Full-bodyDetections

We address the problem of organizing personal photo albums by assigning tags/names to people pres... more We address the problem of organizing personal photo albums by assigning tags/names to people present in photographs. Our proposed framework improves similar systems such as Google+ Photos (Picasa) by incorporating not only a face detector but also a full-body detector. Both these modalities are combined together to provide the user with tags of people whose face has not been detected or is not even present in the photograph. An implementation of the proposed framework is evaluated on a sample of real life photographs.

Towards Artificial Priority Queues for Similarity Query Execution

2018 IEEE 34th International Conference on Data Engineering Workshops (ICDEW), 2018

Content-based retrieval in large collections of unstructured data is challenging not only from th... more Content-based retrieval in large collections of unstructured data is challenging not only from the difficulty of the defining similarity between data images where the phenomenon of semantic gap appears, but also the efficiency of execution of similarity queries. Search engines providing similarity search typically organize various multimedia data, e.g. images of a photo stock, and support k-nearest neighbor query. Users accessing such systems then look for data items similar to their specific query object and refine results by re-running the search with an object from the previous query results. This paper is motivated by unsatisfactory query execution performance of indexing structures that use metric space as a convenient data model. We present performance behavior of two state-of-the-art representatives and propose a new universal technique for ordering priority queue of data partitions to be accessed during kNN query evaluation. We verify it in experiments on real-life data-sets.

D-index: Distance Index

D-index je indexacni struktura organizujici data modelovana jako metrický prostor, což umožňuje p... more D-index je indexacni struktura organizujici data modelovana jako metrický prostor, což umožňuje podobnostni hledani. Tato struktura ma statický character s ohledem na pocet kapas a urovni definovaných hasovacimi funkcemi. Tyto funkce musi být navrženy před vytvořenim indexu a naplněnim daty. D-index je pak schopen ukladat libovolne množstvi objektů tak, že kapacita kapes je neomezena.

Organizing Similarity Spaces Using Metric Hulls

Similarity Search and Applications