A framework for computing consistent answers to boolean aggregate queries in numerical databases ... more A framework for computing consistent answers to boolean aggregate queries in numerical databases violating a given set of aggregate constraints is introduced. Both aggregate constraints and queries are aggregation expressions consisting of linear inequalities on aggregate-sum functions. In particular, our approach works for a specific but expressive form of aggregation expressions (called steady aggregation expressions) and computes consistent answers by solving Integer Linear Programming (ILP) problem instances.
A ‘functional’ query is a query whose answer is always defined and unique i.e. it is either true ... more A ‘functional’ query is a query whose answer is always defined and unique i.e. it is either true or false in all models. It has been shown that the expressive powers of the various types of stable models, when restricted to the class of DATALOG¬ functional queries, do not in practice go beyond those of well-founded semantics, except for the least undefined stable models which, instead, capture the whole boolean hierarchyBH. In this paper we present a ‘functional’ language which, by means of a disciplined use of negation, achieves the desired level of expressiveness up toBH. Although the semantics of the new language is partial, all atoms in the source program are defined and possibly undefined atoms are introduced in a rewriting phase to increase the expressive power. We show that the language satisfies ‘desirable’ properties better than classical languages with (unstratified) negation and stable model semantics. We present an algorithm for the evaluation of functional queries and we show that exponential time resolution is required for hard problems only. Finally we present the architecture of a prototype of the language which has been developed.
The proliferation of information available on the World Wide Web and the new emerging technologie... more The proliferation of information available on the World Wide Web and the new emerging technologies that have reduced the barriers in organizing and publishing documents, have made the support for navigation and personalization of Web sites an appealing and promising task for the Web community. One of the most challenging activities in the design of modern sites which goes beyond any particular domain consists of making the process of retrieving relevant documents easier. This paper proposes a new technique for Web navigation based on current algorithms used in recommendation systems. Our approach identifies really relevant documents adopting methodologies similar to those successfully used in current search engines. This approach has been effectively used for the implementation of a lightweight Web site personalization tool, that permits to navigate towards relevant Web pages regardless of the original Web site structure.
... Sergio Flesca DEIS-UNICAL Via P. Bucci 87036 Rende (CS) Italy flesca@deis.unical.it Filippo F... more ... Sergio Flesca DEIS-UNICAL Via P. Bucci 87036 Rende (CS) Italy flesca@deis.unical.it Filippo Furfaro DEIS-UNICAL Via P. Bucci 87036 Rende (CS) Italy furfaro@deis.unical.it Elio Masciari ICAR-CNR Institute of Italian National Research Council masciari@icar.cnr.it ...
IEEE Transactions on Knowledge and Data Engineering, 2011
The PDF format represents the de facto standard for print-oriented documents. In this paper, we a... more The PDF format represents the de facto standard for print-oriented documents. In this paper, we address the problem of wrapping PDF documents, which raises new challenges in several contexts of text data management. Our proposal is based on a novel bottom-up hierarchical wrapping approach that exploits fuzzy logic to handle the “uncertainty” which is intrinsic to the structure and presentation of PDF documents. A PDF wrapper is defined by specifying a set of group type definitions that impose a target structure to groups of tokens containing the required information. Constraints on token groupings are formulated as fuzzy conditions, which are defined on spatial and content predicates of tokens. We define a formal semantics for PDF wrappers and propose an algorithm for wrapper evaluation working in polynomial time with respect to the size of a PDF document. The proposed approach has been implemented in a wrapper generation system that offers visual capabilities to assist the designer in specifying and evaluating a PDF wrapper. Experimental results have shown good accuracy and applicability of our system to PDF documents of various domains.
IEEE Transactions on Knowledge and Data Engineering, 2011
Query relaxation is the process of weakening a query to a more general one, and it is frequently ... more Query relaxation is the process of weakening a query to a more general one, and it is frequently employed to support approximate query answering. In this paper, rewriting systems for a wide fragment of XPath are investigated, which accomplish query relaxation through the application of simple rewriting rules transforming navigational axes and node tests into relaxed ones. Specifically, a general yet simple form of rewriting rules is considered, which subsumes the forms adopted in several rewriting systems for approximate XPath query answering. The expressiveness of rewriting systems based on this form of rules is characterized in terms of their capability of transforming a query into every more general formulation. It is shown that traditional rewriting systems are not only incomplete w.r.t. containment, but also w.r.t. the stricter form known as containment by homomorphism. This limitation is overcome by defining a set R* of rewriting rules which are still of the same simple form of traditional ones, but are expressive enough to catch at least containment by homomorphism. Then, an algorithm is proposed which exploits R* to provide approximate answers of queries along with a measure of their approximation degree.
A framework for the partial evaluation of SPARQL queries on multiple RDF data sources, both at a ... more A framework for the partial evaluation of SPARQL queries on multiple RDF data sources, both at a local and global level, is proposed. According to the proposed approach, global evaluation of queries is accomplished by first performing local evaluation on each data source, then merging the obtained results. When merging the results, term equivalence across different sources is evaluated by looking at the context of each term. Moreover, the framework allows scoring partial answers by evaluating how much a partial answer is able to capture each concept expressed in the query. Finally, a distributed index structure is proposed that supports early pruning of useless intermediate results.
Often Web users want to be notified when specific information contained in a Web page has been mo... more Often Web users want to be notified when specific information contained in a Web page has been modified. The problem of detecting Web document changes has been deeply investigated and several systems providing notification of Web page changes are available. These systems do not provide notification of changes on specific information contained in a Web page. In this work we present a system called CDWeb that performs this task. It allows users to monitor a whole document or specific portions of it. Users can also specify what kind of changes they are interested in, such as structural changes, or semantic changes. The system provides a flexible and adaptive view of the Web: it tracks user queries and creates user profiles, in order to associate a personalized view to each user
A framework for computing consistent answers to boolean aggregate queries in numerical databases ... more A framework for computing consistent answers to boolean aggregate queries in numerical databases violating a given set of aggregate constraints is introduced. Both aggregate constraints and queries are aggregation expressions consisting of linear inequalities on aggregate-sum functions. In particular, our approach works for a specific but expressive form of aggregation expressions (called steady aggregation expressions) and computes consistent answers by solving Integer Linear Programming (ILP) problem instances.
A ‘functional’ query is a query whose answer is always defined and unique i.e. it is either true ... more A ‘functional’ query is a query whose answer is always defined and unique i.e. it is either true or false in all models. It has been shown that the expressive powers of the various types of stable models, when restricted to the class of DATALOG¬ functional queries, do not in practice go beyond those of well-founded semantics, except for the least undefined stable models which, instead, capture the whole boolean hierarchyBH. In this paper we present a ‘functional’ language which, by means of a disciplined use of negation, achieves the desired level of expressiveness up toBH. Although the semantics of the new language is partial, all atoms in the source program are defined and possibly undefined atoms are introduced in a rewriting phase to increase the expressive power. We show that the language satisfies ‘desirable’ properties better than classical languages with (unstratified) negation and stable model semantics. We present an algorithm for the evaluation of functional queries and we show that exponential time resolution is required for hard problems only. Finally we present the architecture of a prototype of the language which has been developed.
The proliferation of information available on the World Wide Web and the new emerging technologie... more The proliferation of information available on the World Wide Web and the new emerging technologies that have reduced the barriers in organizing and publishing documents, have made the support for navigation and personalization of Web sites an appealing and promising task for the Web community. One of the most challenging activities in the design of modern sites which goes beyond any particular domain consists of making the process of retrieving relevant documents easier. This paper proposes a new technique for Web navigation based on current algorithms used in recommendation systems. Our approach identifies really relevant documents adopting methodologies similar to those successfully used in current search engines. This approach has been effectively used for the implementation of a lightweight Web site personalization tool, that permits to navigate towards relevant Web pages regardless of the original Web site structure.
... Sergio Flesca DEIS-UNICAL Via P. Bucci 87036 Rende (CS) Italy flesca@deis.unical.it Filippo F... more ... Sergio Flesca DEIS-UNICAL Via P. Bucci 87036 Rende (CS) Italy flesca@deis.unical.it Filippo Furfaro DEIS-UNICAL Via P. Bucci 87036 Rende (CS) Italy furfaro@deis.unical.it Elio Masciari ICAR-CNR Institute of Italian National Research Council masciari@icar.cnr.it ...
IEEE Transactions on Knowledge and Data Engineering, 2011
The PDF format represents the de facto standard for print-oriented documents. In this paper, we a... more The PDF format represents the de facto standard for print-oriented documents. In this paper, we address the problem of wrapping PDF documents, which raises new challenges in several contexts of text data management. Our proposal is based on a novel bottom-up hierarchical wrapping approach that exploits fuzzy logic to handle the “uncertainty” which is intrinsic to the structure and presentation of PDF documents. A PDF wrapper is defined by specifying a set of group type definitions that impose a target structure to groups of tokens containing the required information. Constraints on token groupings are formulated as fuzzy conditions, which are defined on spatial and content predicates of tokens. We define a formal semantics for PDF wrappers and propose an algorithm for wrapper evaluation working in polynomial time with respect to the size of a PDF document. The proposed approach has been implemented in a wrapper generation system that offers visual capabilities to assist the designer in specifying and evaluating a PDF wrapper. Experimental results have shown good accuracy and applicability of our system to PDF documents of various domains.
IEEE Transactions on Knowledge and Data Engineering, 2011
Query relaxation is the process of weakening a query to a more general one, and it is frequently ... more Query relaxation is the process of weakening a query to a more general one, and it is frequently employed to support approximate query answering. In this paper, rewriting systems for a wide fragment of XPath are investigated, which accomplish query relaxation through the application of simple rewriting rules transforming navigational axes and node tests into relaxed ones. Specifically, a general yet simple form of rewriting rules is considered, which subsumes the forms adopted in several rewriting systems for approximate XPath query answering. The expressiveness of rewriting systems based on this form of rules is characterized in terms of their capability of transforming a query into every more general formulation. It is shown that traditional rewriting systems are not only incomplete w.r.t. containment, but also w.r.t. the stricter form known as containment by homomorphism. This limitation is overcome by defining a set R* of rewriting rules which are still of the same simple form of traditional ones, but are expressive enough to catch at least containment by homomorphism. Then, an algorithm is proposed which exploits R* to provide approximate answers of queries along with a measure of their approximation degree.
A framework for the partial evaluation of SPARQL queries on multiple RDF data sources, both at a ... more A framework for the partial evaluation of SPARQL queries on multiple RDF data sources, both at a local and global level, is proposed. According to the proposed approach, global evaluation of queries is accomplished by first performing local evaluation on each data source, then merging the obtained results. When merging the results, term equivalence across different sources is evaluated by looking at the context of each term. Moreover, the framework allows scoring partial answers by evaluating how much a partial answer is able to capture each concept expressed in the query. Finally, a distributed index structure is proposed that supports early pruning of useless intermediate results.
Often Web users want to be notified when specific information contained in a Web page has been mo... more Often Web users want to be notified when specific information contained in a Web page has been modified. The problem of detecting Web document changes has been deeply investigated and several systems providing notification of Web page changes are available. These systems do not provide notification of changes on specific information contained in a Web page. In this work we present a system called CDWeb that performs this task. It allows users to monitor a whole document or specific portions of it. Users can also specify what kind of changes they are interested in, such as structural changes, or semantic changes. The system provides a flexible and adaptive view of the Web: it tracks user queries and creates user profiles, in order to associate a personalized view to each user
Uploads
Papers by Sergio Flesca