The Encyclopedia of Databases, a comprehensive work, provides easy access to relevant information... more The Encyclopedia of Databases, a comprehensive work, provides easy access to relevant information on all aspects of very large databases. This encyclopedia features alphabetical organization of concepts covering main areas of very large databases. These 1000 entries offer convenient access to information in the field of databases with definitions and illustrations of basic terminology, concepts, methods, and algorithms, references to literature, and cross-references to other entries and journal articles. Topics for the encyclopedia were selected by a distinguished international advisory board, and written by world class experts in the field. The Encyclopedia of Databases is designed to meet the needs of research scientists, professors and graduate-level students in computer science and engineering. This encyclopedia is also suitable for practitioners in industry.
Abstract The main paradigm of similarity searching in metric spaces has remained mostly unchanged... more Abstract The main paradigm of similarity searching in metric spaces has remained mostly unchanged for decades – data objects are organized into a hierarchical structure according to their mutual distances, using representative pivots to reduce the number of distance computations needed to efficiently search the data. We propose an alternative to this paradigm, using machine learning models to replace pivots, thus posing similarity search as a classification problem, which stands in for numerous expensive distance computations. Even a relatively naive implementation of this idea is more than competitive with state-of-the-art methods in terms of speed and recall, proving the concept as viable and showing great potential for its future development.
Similarity searching has become more and more popular, which was stimulated by the growth of dive... more Similarity searching has become more and more popular, which was stimulated by the growth of diverse data archives available on-line that offer search services to users, and by the increasing complexity of data that must be searched. This issue has also been recognized by major Internet search engines, exemplified by Google, that re-cently enriched their image search services by allowing users to search for images by similarity. They usually apply the following procedure. Firstly, a candidate set of im-ages is obtained by a regular text search in images ’ file names and associated textual tags. Then this set is reordered by images ’ content, expressed as color histograms, for example. Finally, this result is presented to the user. In this thesis, we focus on similarity searching – content-based retrieval. In this area, data items are retrieved by their content rather than by textual information associated with them. For example, images are searched by comparing their color histogram...
A nozzle shut-off valve for injection molding machine for plastic material, especially thermoplas... more A nozzle shut-off valve for injection molding machine for plastic material, especially thermoplastic material, has two pneumatic cylinder-and-plunger units. One such unit has its plunger mounted to reciprocate so as to block the passageway of the plastics through the nozzle. The first such plunger and cylinder unit is provided with a pilot passage in the valve nozzle so that the pressure of molten plastic can be used to open it. The second such unit is much smaller diameter and has its plunger mounted to block the pilot passage so that the second unit in effect becomes a pilot valve and controls the application of fluid pressure through the pilot passage to the first unit plunger or blocking plunger.
We propose a self-organized content-based Image Retrieval Network (IRN) that is inspired by a Met... more We propose a self-organized content-based Image Retrieval Network (IRN) that is inspired by a Metric Social Network (MSN) search system. The proposed network model is strictly data-owner oriented so no data redistribution among peers is needed in order to efficiently process queries. Thus a shared database where each peer is fully in charge of its data, is created. The self-organization of the network is obtained by exploiting the social-network approach of the MSN – the connections between peers in the network are created as social-network relationships formed on the basis of a queryanswer principle. The knowledge of answers to previous queries is used to fast navigate to peers, possibly containing the best answers to new queries. Additionally, the network uses a randomized mechanism to explore new and unvisited parts of the network. In this way, the self-adaptability and robustness of the system are achieved. The proposed concepts are verified using a real network consisting of 2,...
This implementation framework called MESSIF eases the task of building metric-based similarity-se... more This implementation framework called MESSIF eases the task of building metric-based similarity-searching prototypes. It provides a number of modules from storage management to automatic collecting of performance statistics. Due to its open and modular design it is also easy to implement additional modules if necessary. The MESSIF also offers several ready-to-use generic clients that allow to control and test the index structures and also measure its performance.
With the increasing number of applications that base searching on similarity rather than on exact... more With the increasing number of applications that base searching on similarity rather than on exact matching, novel index structures are needed to speedup execution of similarity queries. An important stream of research in this direction uses the metric space as a model of similarity. We explain the principles and survey the most important representatives of index structures. We put most emphasis on distributed similarity search architectures which try to solve the difficult problem of scalability of similarity searching. The actual achievements are demonstrated by practical experiments. Future research directions are outlined in the conclusions.
We address the problem of organizing personal photo albums by assigning tags/names to people pres... more We address the problem of organizing personal photo albums by assigning tags/names to people present in photographs. Our proposed framework improves similar systems such as Google+ Photos (Picasa) by incorporating not only a face detector but also a full-body detector. Both these modalities are combined together to provide the user with tags of people whose face has not been detected or is not even present in the photograph. An implementation of the proposed framework is evaluated on a sample of real life photographs.
2018 IEEE 34th International Conference on Data Engineering Workshops (ICDEW), 2018
Content-based retrieval in large collections of unstructured data is challenging not only from th... more Content-based retrieval in large collections of unstructured data is challenging not only from the difficulty of the defining similarity between data images where the phenomenon of semantic gap appears, but also the efficiency of execution of similarity queries. Search engines providing similarity search typically organize various multimedia data, e.g. images of a photo stock, and support k-nearest neighbor query. Users accessing such systems then look for data items similar to their specific query object and refine results by re-running the search with an object from the previous query results. This paper is motivated by unsatisfactory query execution performance of indexing structures that use metric space as a convenient data model. We present performance behavior of two state-of-the-art representatives and propose a new universal technique for ordering priority queue of data partitions to be accessed during kNN query evaluation. We verify it in experiments on real-life data-sets.
D-index je indexacni struktura organizujici data modelovana jako metrický prostor, což umožňuje p... more D-index je indexacni struktura organizujici data modelovana jako metrický prostor, což umožňuje podobnostni hledani. Tato struktura ma statický character s ohledem na pocet kapas a urovni definovaných hasovacimi funkcemi. Tyto funkce musi být navrženy před vytvořenim indexu a naplněnim daty. D-index je pak schopen ukladat libovolne množstvi objektů tak, že kapacita kapes je neomezena.
The Encyclopedia of Databases, a comprehensive work, provides easy access to relevant information... more The Encyclopedia of Databases, a comprehensive work, provides easy access to relevant information on all aspects of very large databases. This encyclopedia features alphabetical organization of concepts covering main areas of very large databases. These 1000 entries offer convenient access to information in the field of databases with definitions and illustrations of basic terminology, concepts, methods, and algorithms, references to literature, and cross-references to other entries and journal articles. Topics for the encyclopedia were selected by a distinguished international advisory board, and written by world class experts in the field. The Encyclopedia of Databases is designed to meet the needs of research scientists, professors and graduate-level students in computer science and engineering. This encyclopedia is also suitable for practitioners in industry.
Abstract The main paradigm of similarity searching in metric spaces has remained mostly unchanged... more Abstract The main paradigm of similarity searching in metric spaces has remained mostly unchanged for decades – data objects are organized into a hierarchical structure according to their mutual distances, using representative pivots to reduce the number of distance computations needed to efficiently search the data. We propose an alternative to this paradigm, using machine learning models to replace pivots, thus posing similarity search as a classification problem, which stands in for numerous expensive distance computations. Even a relatively naive implementation of this idea is more than competitive with state-of-the-art methods in terms of speed and recall, proving the concept as viable and showing great potential for its future development.
Similarity searching has become more and more popular, which was stimulated by the growth of dive... more Similarity searching has become more and more popular, which was stimulated by the growth of diverse data archives available on-line that offer search services to users, and by the increasing complexity of data that must be searched. This issue has also been recognized by major Internet search engines, exemplified by Google, that re-cently enriched their image search services by allowing users to search for images by similarity. They usually apply the following procedure. Firstly, a candidate set of im-ages is obtained by a regular text search in images ’ file names and associated textual tags. Then this set is reordered by images ’ content, expressed as color histograms, for example. Finally, this result is presented to the user. In this thesis, we focus on similarity searching – content-based retrieval. In this area, data items are retrieved by their content rather than by textual information associated with them. For example, images are searched by comparing their color histogram...
A nozzle shut-off valve for injection molding machine for plastic material, especially thermoplas... more A nozzle shut-off valve for injection molding machine for plastic material, especially thermoplastic material, has two pneumatic cylinder-and-plunger units. One such unit has its plunger mounted to reciprocate so as to block the passageway of the plastics through the nozzle. The first such plunger and cylinder unit is provided with a pilot passage in the valve nozzle so that the pressure of molten plastic can be used to open it. The second such unit is much smaller diameter and has its plunger mounted to block the pilot passage so that the second unit in effect becomes a pilot valve and controls the application of fluid pressure through the pilot passage to the first unit plunger or blocking plunger.
We propose a self-organized content-based Image Retrieval Network (IRN) that is inspired by a Met... more We propose a self-organized content-based Image Retrieval Network (IRN) that is inspired by a Metric Social Network (MSN) search system. The proposed network model is strictly data-owner oriented so no data redistribution among peers is needed in order to efficiently process queries. Thus a shared database where each peer is fully in charge of its data, is created. The self-organization of the network is obtained by exploiting the social-network approach of the MSN – the connections between peers in the network are created as social-network relationships formed on the basis of a queryanswer principle. The knowledge of answers to previous queries is used to fast navigate to peers, possibly containing the best answers to new queries. Additionally, the network uses a randomized mechanism to explore new and unvisited parts of the network. In this way, the self-adaptability and robustness of the system are achieved. The proposed concepts are verified using a real network consisting of 2,...
This implementation framework called MESSIF eases the task of building metric-based similarity-se... more This implementation framework called MESSIF eases the task of building metric-based similarity-searching prototypes. It provides a number of modules from storage management to automatic collecting of performance statistics. Due to its open and modular design it is also easy to implement additional modules if necessary. The MESSIF also offers several ready-to-use generic clients that allow to control and test the index structures and also measure its performance.
With the increasing number of applications that base searching on similarity rather than on exact... more With the increasing number of applications that base searching on similarity rather than on exact matching, novel index structures are needed to speedup execution of similarity queries. An important stream of research in this direction uses the metric space as a model of similarity. We explain the principles and survey the most important representatives of index structures. We put most emphasis on distributed similarity search architectures which try to solve the difficult problem of scalability of similarity searching. The actual achievements are demonstrated by practical experiments. Future research directions are outlined in the conclusions.
We address the problem of organizing personal photo albums by assigning tags/names to people pres... more We address the problem of organizing personal photo albums by assigning tags/names to people present in photographs. Our proposed framework improves similar systems such as Google+ Photos (Picasa) by incorporating not only a face detector but also a full-body detector. Both these modalities are combined together to provide the user with tags of people whose face has not been detected or is not even present in the photograph. An implementation of the proposed framework is evaluated on a sample of real life photographs.
2018 IEEE 34th International Conference on Data Engineering Workshops (ICDEW), 2018
Content-based retrieval in large collections of unstructured data is challenging not only from th... more Content-based retrieval in large collections of unstructured data is challenging not only from the difficulty of the defining similarity between data images where the phenomenon of semantic gap appears, but also the efficiency of execution of similarity queries. Search engines providing similarity search typically organize various multimedia data, e.g. images of a photo stock, and support k-nearest neighbor query. Users accessing such systems then look for data items similar to their specific query object and refine results by re-running the search with an object from the previous query results. This paper is motivated by unsatisfactory query execution performance of indexing structures that use metric space as a convenient data model. We present performance behavior of two state-of-the-art representatives and propose a new universal technique for ordering priority queue of data partitions to be accessed during kNN query evaluation. We verify it in experiments on real-life data-sets.
D-index je indexacni struktura organizujici data modelovana jako metrický prostor, což umožňuje p... more D-index je indexacni struktura organizujici data modelovana jako metrický prostor, což umožňuje podobnostni hledani. Tato struktura ma statický character s ohledem na pocet kapas a urovni definovaných hasovacimi funkcemi. Tyto funkce musi být navrženy před vytvořenim indexu a naplněnim daty. D-index je pak schopen ukladat libovolne množstvi objektů tak, že kapacita kapes je neomezena.
Uploads
Papers by Vlastislav Dohnal