Proceedings of the 10th International Workshop on Multimedia Information Systems (MIS 2004), College Park, MD, 2004
When similarity queries over multimedia databases are processed by splitting the overall query co... more When similarity queries over multimedia databases are processed by splitting the overall query condition into a set of sub-queries, the problem of how to efficiently and effectively integrate the sub-queries' results arises. The common approach is to use a (monotone) scoring function, like min and average, to compute an overall similarity score by aggregating the partial scores an object obtains on the sub-queries. In order to minimize the number of database accesses, a ���middleware��� algorithm, like TA by Fagin, Lotem and ...
The M-tree is a dynamic paged structure that can be effectively used to index multimedia database... more The M-tree is a dynamic paged structure that can be effectively used to index multimedia databases, where objects are represented by means of complex features and similarity queries require the computation of time-consuming distance functions. The initial loading of the M-tree, however, can be very expensive. In this paper we propose a fast (bulk) loading algorithm to speed-up the creation of the tree on a given dataset. Experimental results show that our BulkLoading algorithm can significantly improve the index’ performance with respect to M-tree insertion methods, and its performance is comparable to that of static metric trees.
A new access method, called M-tree, is proposed to organize and search large data sets from a gen... more A new access method, called M-tree, is proposed to organize and search large data sets from a generic “metric space”, i.e. where object proximity is only defined by a distance function satisfying the positivity, symmetry, and triangle inequality postulates. We detail algorithms for insertion of objects and split management, which keep the M-tree always balanced - several heuristic split alternatives are considered and experimentally evaluated. Algorithms for similarity (range and k-nearest neighbors) queries are also described. Results from extensive experimentation with a prototype system are reported, considering as the performance criteria the number of page I/O’s and the number of distance computations. The results demonstrate that the Mtree indeed extends the domain of applicability beyond the traditional vector spaces, performs reasonably well in high-dimensional data spaces, and scales well in case of growing files.
Searching... Advanced Search. Google Search Engine. S; SISAP; 2009; 2009 Second International Wor... more Searching... Advanced Search. Google Search Engine. S; SISAP; 2009; 2009 Second International Workshop on Similarity Search and Applications. This Publication, Digital Library, Advanced Search, ...
We review the major paradigms for similarity queries, in particular those that allow approximate ... more We review the major paradigms for similarity queries, in particular those that allow approximate results. We propose an original classification schema which easily allows existing approaches to be compared along several independent coordinates, such as quality of results, error metrics, and user interaction.
Dealing with user preferences is becoming a widespread issue in novel data-intensive application ... more Dealing with user preferences is becoming a widespread issue in novel data-intensive application domains, such as electronic catalogs, e-commerce, multimedia databases, and real estates. Although some recent works have studied the problem under specific assumptions, an understanding of the issues involved in the more general case is still missing. In this paper we assume that user preferences are expressed in a qualitative way over the tuples of a relation schema (e.g., I prefer product A to product B), which is quite natural from the user point of view and also includes, as a proper sub-case, quantitative preferences defined by means of a scoring function. Starting from an analysis of basic properties of (qualitative) preferences, we consider the Best operator, which can be used to smoothly embed preferences in queries. We study general properties of this operator and present a practical algorithm for its computation. We also describe a special data structure, called β-tree, that c...
Preference queries aim to retrieve from large databases (DB’s) those objects that better match us... more Preference queries aim to retrieve from large databases (DB’s) those objects that better match user’s requirements. With the aim of supporting modern DB applications, such as context-aware ones, in which conditional preferences are the rule, in this paper we investigate the possibility of adopting conditional preference networks (CP-nets) for DB querying. To this end, we also consider the relevant case in which CP-nets are not completely specified, a likely case for complex DB scenarios. We first show that the ceteris paribus (all else being equal) semantics, commonly associated with CP-nets, can lead to counterintuitive results if the CP-net is incomplete and the DB is incomplete as well. Then, we introduce a new totalitarian (i.e., not ceteris paribus) semantics and, rather surprisingly, prove that our semantics is equivalent to ceteris paribus for complete acyclic CP-nets and that yields the same optimal results if the DB is complete. Finally, we show that when both the CP-net an...
Ranking objects according to different criteria is a central issue in many data-intensive applica... more Ranking objects according to different criteria is a central issue in many data-intensive applications. Yet, no existing solution deals with the case of partially specified score aggregation functions (e.g., a weighted sum with no precisely known weight values). We address multi-source top-k queries with constraints (rather than precise values) on the weights. Our solution is instance optimal and provides increased flexibility with negligible overhead wrt classical top-k queries.
The management of user preferences is becoming a fundamental ingredient of modern Web-based data-... more The management of user preferences is becoming a fundamental ingredient of modern Web-based data-intensive applications, in which information filtering is crucial to reduce the volume of data presented to the user. However, though deriving and modeling user preferences has been largely studied in recent years, there is still a need for practical methods to efficiently incorporate preferences in actual systems. In this paper we consider the qualitative approach to user preferences in which a binary preference relation is defined among objects and a special operator (called Best) is used to extract relevant data according to the preference relation. In this framework, we propose and study a special index structure, called β-tree, which can be used for a rapid evaluation of the Best operator. We then present a number of practical algorithms for the efficient maintenance of β-trees in front of database updates and discuss some relevant implementation issues.
Dealing with user preferences is becoming a widespread issue in novel data-intensive application ... more Dealing with user preferences is becoming a widespread issue in novel data-intensive application domains, such as electronic catalogs, e-commerce, multimedia databases, and real estates. Given a set of preferences, an important problem is to efficiently determine which are the “best” objects, according to such preferences. In this paper we assume that preferences are expressed in a qualitative way over the tuples of a relation schema (e.g., I prefer product A to product B), which is quite natural from the user point of view and also includes, as a proper subcase, quantitative preferences defined by means of a scoring function. Starting from an analysis of basic properties of (qualitative) preferences, we consider the Best operator, which can be used to smoothly embed preferences in queries of relational algebra. We study general properties of this operator and present a practical algorithm for its computation. We show how the algorithm improves the simple nested-loops approach and c...
Traditionally, skyline and ranking queries have been treated separately as alternative ways of di... more Traditionally, skyline and ranking queries have been treated separately as alternative ways of discovering interesting data in potentially large datasets. While ranking queries adopt a specific scoring function to rank tuples, skyline queries return the set of non-dominated tuples and are independent of attribute scales and scoring functions. Ranking queries are thus less general, but cheaper to compute and widely used. In this paper, we integrate these two approaches under the unifying framework of restricted skylines by applying the notion of dominance to a set of scoring functions of interest.
Preference queries aim to retrieve from large databases those objects that better match user’s re... more Preference queries aim to retrieve from large databases those objects that better match user’s requirements. Approaches proposed so far in the DB field for specifying preferences are limited when one needs to consider conditional, rather than absolute, preferences (e.g., I prefer driving by car in winter, and by motorbike in summer), which are common in context-aware applications. CP-nets are a powerful formalism for concisely representing such preferences, which has its roots in decision making problems. However, CP-nets, being based on a ceteris paribus (all else being equal) interpretation, are hardly applicable in complex DB scenarios. In this paper we introduce a new totalitarian (i.e., not ceteris paribus) semantics for CP-nets. We prove that our semantics is equivalent to ceteris paribus for complete acyclic CP-nets, whereas it avoids some counterintuitive effects of ceteris paribus when the CP-net is partially specified.
M-tree is a dynamic access method suitable to index generic “metric spaces”, where the function u... more M-tree is a dynamic access method suitable to index generic “metric spaces”, where the function used to compute the distance between any two objects satisfies the positivity, symmetry, and triangle inequality postulates. The M-tree design fulfills typical requirements of multimedia applications, where objects are indexed using complex features, and similarity queries can require application of time-consuming distance functions. In this paper we describe the basic search and management algorithms of M-tree, introduce several heuristic split policies, and experimentally evaluate them, considering both I/O and CPU costs. Results also show that M-tree performs better than R∗-tree on highdimensional vector spaces.
We introduce a cost model for the M-tree access method [Ciaccia et al., 1997] which provides esti... more We introduce a cost model for the M-tree access method [Ciaccia et al., 1997] which provides estimates of CPU (distance computations) and I/O costs for the execution of similarity queries as a function of each single query. This model is said to be query-sensitive, since it takes into account, by relying on the novel notion of “witness”, the “position” of the query point inside the metric space indexed by the M-tree. We describe the basic concepts underlying the model along with different methods which can be used for its implementation; finally, we experimentally validate the model over both real and synthetic datasets.
When composing multiple preferences characterizing the most suitable results for a user, several ... more When composing multiple preferences characterizing the most suitable results for a user, several issues may arise. Indeed, preferences can be partially contradictory, suffer from a mismatch with the level of detail of the actual data, and even lack natural properties such as transitivity. In this paper we formally investigate the problem of retrieving the best results complying with multiple preferences expressed in a logic-based language. Data are stored in relational tables with taxonomic domains, which allow the specification of preferences also over values that are more generic than those in the database. In this framework, we introduce two operators that rewrite preferences for enforcing the important properties of transitivity, which guarantees soundness of the result, and specificity, which solves all conflicts among preferences. Although, as we show, these two properties cannot be fully achieved together, we use our operators to identify the only two alternatives that ensure...
International Journal of Software Engineering and Knowledge Engineering
The use of formal methods early in the development process has been advocated as a way of improvi... more The use of formal methods early in the development process has been advocated as a way of improving the quality of software products and their production process. Here we study the influence of a formal requirements document on the next phase in the software process, that is design. We suggest that formal design should coherently follow from formal requirements. We show that two different formal notations can be effectively used, one for writing requirements specification and one for design specification. We also consider how a design specification can be formally checked with respect to requirements specification. The notations we choose are well known: the Z notation for requirements and the Larch two-tiered language for design. We show how a number of tools based on these notations can be used to improve the quality of the documents produced during the development process.
Proceedings of the 10th International Workshop on Multimedia Information Systems (MIS 2004), College Park, MD, 2004
When similarity queries over multimedia databases are processed by splitting the overall query co... more When similarity queries over multimedia databases are processed by splitting the overall query condition into a set of sub-queries, the problem of how to efficiently and effectively integrate the sub-queries' results arises. The common approach is to use a (monotone) scoring function, like min and average, to compute an overall similarity score by aggregating the partial scores an object obtains on the sub-queries. In order to minimize the number of database accesses, a ���middleware��� algorithm, like TA by Fagin, Lotem and ...
The M-tree is a dynamic paged structure that can be effectively used to index multimedia database... more The M-tree is a dynamic paged structure that can be effectively used to index multimedia databases, where objects are represented by means of complex features and similarity queries require the computation of time-consuming distance functions. The initial loading of the M-tree, however, can be very expensive. In this paper we propose a fast (bulk) loading algorithm to speed-up the creation of the tree on a given dataset. Experimental results show that our BulkLoading algorithm can significantly improve the index’ performance with respect to M-tree insertion methods, and its performance is comparable to that of static metric trees.
A new access method, called M-tree, is proposed to organize and search large data sets from a gen... more A new access method, called M-tree, is proposed to organize and search large data sets from a generic “metric space”, i.e. where object proximity is only defined by a distance function satisfying the positivity, symmetry, and triangle inequality postulates. We detail algorithms for insertion of objects and split management, which keep the M-tree always balanced - several heuristic split alternatives are considered and experimentally evaluated. Algorithms for similarity (range and k-nearest neighbors) queries are also described. Results from extensive experimentation with a prototype system are reported, considering as the performance criteria the number of page I/O’s and the number of distance computations. The results demonstrate that the Mtree indeed extends the domain of applicability beyond the traditional vector spaces, performs reasonably well in high-dimensional data spaces, and scales well in case of growing files.
Searching... Advanced Search. Google Search Engine. S; SISAP; 2009; 2009 Second International Wor... more Searching... Advanced Search. Google Search Engine. S; SISAP; 2009; 2009 Second International Workshop on Similarity Search and Applications. This Publication, Digital Library, Advanced Search, ...
We review the major paradigms for similarity queries, in particular those that allow approximate ... more We review the major paradigms for similarity queries, in particular those that allow approximate results. We propose an original classification schema which easily allows existing approaches to be compared along several independent coordinates, such as quality of results, error metrics, and user interaction.
Dealing with user preferences is becoming a widespread issue in novel data-intensive application ... more Dealing with user preferences is becoming a widespread issue in novel data-intensive application domains, such as electronic catalogs, e-commerce, multimedia databases, and real estates. Although some recent works have studied the problem under specific assumptions, an understanding of the issues involved in the more general case is still missing. In this paper we assume that user preferences are expressed in a qualitative way over the tuples of a relation schema (e.g., I prefer product A to product B), which is quite natural from the user point of view and also includes, as a proper sub-case, quantitative preferences defined by means of a scoring function. Starting from an analysis of basic properties of (qualitative) preferences, we consider the Best operator, which can be used to smoothly embed preferences in queries. We study general properties of this operator and present a practical algorithm for its computation. We also describe a special data structure, called β-tree, that c...
Preference queries aim to retrieve from large databases (DB’s) those objects that better match us... more Preference queries aim to retrieve from large databases (DB’s) those objects that better match user’s requirements. With the aim of supporting modern DB applications, such as context-aware ones, in which conditional preferences are the rule, in this paper we investigate the possibility of adopting conditional preference networks (CP-nets) for DB querying. To this end, we also consider the relevant case in which CP-nets are not completely specified, a likely case for complex DB scenarios. We first show that the ceteris paribus (all else being equal) semantics, commonly associated with CP-nets, can lead to counterintuitive results if the CP-net is incomplete and the DB is incomplete as well. Then, we introduce a new totalitarian (i.e., not ceteris paribus) semantics and, rather surprisingly, prove that our semantics is equivalent to ceteris paribus for complete acyclic CP-nets and that yields the same optimal results if the DB is complete. Finally, we show that when both the CP-net an...
Ranking objects according to different criteria is a central issue in many data-intensive applica... more Ranking objects according to different criteria is a central issue in many data-intensive applications. Yet, no existing solution deals with the case of partially specified score aggregation functions (e.g., a weighted sum with no precisely known weight values). We address multi-source top-k queries with constraints (rather than precise values) on the weights. Our solution is instance optimal and provides increased flexibility with negligible overhead wrt classical top-k queries.
The management of user preferences is becoming a fundamental ingredient of modern Web-based data-... more The management of user preferences is becoming a fundamental ingredient of modern Web-based data-intensive applications, in which information filtering is crucial to reduce the volume of data presented to the user. However, though deriving and modeling user preferences has been largely studied in recent years, there is still a need for practical methods to efficiently incorporate preferences in actual systems. In this paper we consider the qualitative approach to user preferences in which a binary preference relation is defined among objects and a special operator (called Best) is used to extract relevant data according to the preference relation. In this framework, we propose and study a special index structure, called β-tree, which can be used for a rapid evaluation of the Best operator. We then present a number of practical algorithms for the efficient maintenance of β-trees in front of database updates and discuss some relevant implementation issues.
Dealing with user preferences is becoming a widespread issue in novel data-intensive application ... more Dealing with user preferences is becoming a widespread issue in novel data-intensive application domains, such as electronic catalogs, e-commerce, multimedia databases, and real estates. Given a set of preferences, an important problem is to efficiently determine which are the “best” objects, according to such preferences. In this paper we assume that preferences are expressed in a qualitative way over the tuples of a relation schema (e.g., I prefer product A to product B), which is quite natural from the user point of view and also includes, as a proper subcase, quantitative preferences defined by means of a scoring function. Starting from an analysis of basic properties of (qualitative) preferences, we consider the Best operator, which can be used to smoothly embed preferences in queries of relational algebra. We study general properties of this operator and present a practical algorithm for its computation. We show how the algorithm improves the simple nested-loops approach and c...
Traditionally, skyline and ranking queries have been treated separately as alternative ways of di... more Traditionally, skyline and ranking queries have been treated separately as alternative ways of discovering interesting data in potentially large datasets. While ranking queries adopt a specific scoring function to rank tuples, skyline queries return the set of non-dominated tuples and are independent of attribute scales and scoring functions. Ranking queries are thus less general, but cheaper to compute and widely used. In this paper, we integrate these two approaches under the unifying framework of restricted skylines by applying the notion of dominance to a set of scoring functions of interest.
Preference queries aim to retrieve from large databases those objects that better match user’s re... more Preference queries aim to retrieve from large databases those objects that better match user’s requirements. Approaches proposed so far in the DB field for specifying preferences are limited when one needs to consider conditional, rather than absolute, preferences (e.g., I prefer driving by car in winter, and by motorbike in summer), which are common in context-aware applications. CP-nets are a powerful formalism for concisely representing such preferences, which has its roots in decision making problems. However, CP-nets, being based on a ceteris paribus (all else being equal) interpretation, are hardly applicable in complex DB scenarios. In this paper we introduce a new totalitarian (i.e., not ceteris paribus) semantics for CP-nets. We prove that our semantics is equivalent to ceteris paribus for complete acyclic CP-nets, whereas it avoids some counterintuitive effects of ceteris paribus when the CP-net is partially specified.
M-tree is a dynamic access method suitable to index generic “metric spaces”, where the function u... more M-tree is a dynamic access method suitable to index generic “metric spaces”, where the function used to compute the distance between any two objects satisfies the positivity, symmetry, and triangle inequality postulates. The M-tree design fulfills typical requirements of multimedia applications, where objects are indexed using complex features, and similarity queries can require application of time-consuming distance functions. In this paper we describe the basic search and management algorithms of M-tree, introduce several heuristic split policies, and experimentally evaluate them, considering both I/O and CPU costs. Results also show that M-tree performs better than R∗-tree on highdimensional vector spaces.
We introduce a cost model for the M-tree access method [Ciaccia et al., 1997] which provides esti... more We introduce a cost model for the M-tree access method [Ciaccia et al., 1997] which provides estimates of CPU (distance computations) and I/O costs for the execution of similarity queries as a function of each single query. This model is said to be query-sensitive, since it takes into account, by relying on the novel notion of “witness”, the “position” of the query point inside the metric space indexed by the M-tree. We describe the basic concepts underlying the model along with different methods which can be used for its implementation; finally, we experimentally validate the model over both real and synthetic datasets.
When composing multiple preferences characterizing the most suitable results for a user, several ... more When composing multiple preferences characterizing the most suitable results for a user, several issues may arise. Indeed, preferences can be partially contradictory, suffer from a mismatch with the level of detail of the actual data, and even lack natural properties such as transitivity. In this paper we formally investigate the problem of retrieving the best results complying with multiple preferences expressed in a logic-based language. Data are stored in relational tables with taxonomic domains, which allow the specification of preferences also over values that are more generic than those in the database. In this framework, we introduce two operators that rewrite preferences for enforcing the important properties of transitivity, which guarantees soundness of the result, and specificity, which solves all conflicts among preferences. Although, as we show, these two properties cannot be fully achieved together, we use our operators to identify the only two alternatives that ensure...
International Journal of Software Engineering and Knowledge Engineering
The use of formal methods early in the development process has been advocated as a way of improvi... more The use of formal methods early in the development process has been advocated as a way of improving the quality of software products and their production process. Here we study the influence of a formal requirements document on the next phase in the software process, that is design. We suggest that formal design should coherently follow from formal requirements. We show that two different formal notations can be effectively used, one for writing requirements specification and one for design specification. We also consider how a design specification can be formally checked with respect to requirements specification. The notations we choose are well known: the Z notation for requirements and the Larch two-tiered language for design. We show how a number of tools based on these notations can be used to improve the quality of the documents produced during the development process.
Uploads
Papers by Paolo Ciaccia