Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content
  • Electronic copies of my publications are found -locally- at http://www.cs.uoi.gr/~pvassil/publications/publications.h... moreedit
  • Yannis Vassiliou, Timoleon K. Sellisedit
Assessment is the process of comparing the actual to the expected behavior of a business phenomenon and judging the outcome of the comparison. In this paper we propose assess, a novel querying operator that supports assessment based on... more
Assessment is the process of comparing the actual to the expected behavior of a business phenomenon and judging the outcome of the comparison. In this paper we propose assess, a novel querying operator that supports assessment based on the results of a query on a data cube. This operator requires (1) the specification of an OLAP query over a measure of a data cube, to define the target cube to be assessed; (2) the specification of a reference cube of comparison (benchmark), which represents the expected performance of the measure; (3) the specification of how to perform the comparison between the target cube and the benchmark, and (4) a labeling function that classifies the result of this comparison using a set of labels. After introducing an SQL-like syntax for our operator, we formally define its semantics in terms of a set of logical operators. To support the computation of assess we propose a basic plan as well as some optimization strategies, then we experimentally evaluate the...
The software processes that facilitate the original loading and the periodic refreshment of the data warehouse contents are commonly known as Extraction-Transformation-Loading (ETL) processes. The intention of this survey is to present... more
The software processes that facilitate the original loading and the periodic refreshment of the data warehouse contents are commonly known as Extraction-Transformation-Loading (ETL) processes. The intention of this survey is to present the research work in the field of ETL technology in a structured way. To this end, we organize the coverage of the field as follows: (a) first, we cover the conceptual and logical modeling of ETL processes, along with some design methods, (b) we visit each stage of the E-T-L triplet, and examine problems that fall within each of these stages, (c) we discuss problems that pertain to the entirety of an ETL process, and, (d) we review some research prototypes of academic origin.
Abstract. During data warehouse design, the designer frequently encounters the problem of choosing among different alternatives for the same design construct. The behavior of the chosen design in the presence of evolution events is an... more
Abstract. During data warehouse design, the designer frequently encounters the problem of choosing among different alternatives for the same design construct. The behavior of the chosen design in the presence of evolution events is an im-portant parameter for this choice. This paper proposes metrics to assess the quality of the warehouse design from the viewpoint of evolution. We employ a graph-based model to uniformly abstract relations and software modules, like queries, views, reports, and ETL activities. We annotate the warehouse graph with policies for the management of evolution events. The proposed metrics are based on graph-theoretic properties of the warehouse graph to assess the sensi-tivity of the graph to a set of possible events. We evaluate our metrics with ex-periments over alternative configurations of the same warehouse schema. 1
Data visualization is one of the major issues of database research. OLAP a decision support technology, is clearly in the center of this effort. Thus far, visualization has not been incorporated in the abstraction levels of DBMS... more
Data visualization is one of the major issues of database research. OLAP a decision support technology, is clearly in the center of this effort. Thus far, visualization has not been incorporated in the abstraction levels of DBMS architecture (conceptual, logical, physical); neither has it been formally treated in this context. In this paper we start by reconsidering the separation of the aforementioned abstraction levels to take visualization into consideration. Then, we present the Cube Presentation Model (CPM), a novel presentational model for OLAP screens. The proposal lies on the fundamental idea of separating the logical part of a data cube computation from the presentational part of the client tool. Then, CPM can be naturally mapped on the Table Lens, which is an advanced visualization technique from the Human-Computer Interaction area, particularly tailored for cross-tab reports. Based on the particularities of Table Lens, we propose automated proactive support to the user fo...
Service-oriented computing is now acknowledged as a central paradigm for Internet computing, supported by tremendous research and technology development over the last 10 years. However, the evolution of the Internet, and in particular,... more
Service-oriented computing is now acknowledged as a central paradigm for Internet computing, supported by tremendous research and technology development over the last 10 years. However, the evolution of the Internet, and in particular, the latest Future Internet vision, challenges the paradigm. Indeed, service-oriented computing has to face the ultra large scale and heterogeneity of the Future Internet, which are orders of magnitude higher than those of today’s service-oriented systems. This article aims at contributing to this objective by identifying the key research directions to be followed in light of the latest state of the art. This article more specifically focuses on research challenges for service-oriented middleware design, therefore, investigating service description, discovery, access, and composition in the Future Internet of services.
Developing a tool that provides support for different refactorings, through a set of refactoring detectors which identify opportunities for source code improvements, is not easy. Our experience in developing such a tool for refactoring... more
Developing a tool that provides support for different refactorings, through a set of refactoring detectors which identify opportunities for source code improvements, is not easy. Our experience in developing such a tool for refactoring object-oriented software revealed the Three-Step Refactoring Detector pattern. The main idea behind the pattern is to develop an extensible hierarchy of refactoring detectors, with respect to a general three-step refactoring detection process. The proposed pattern facilitates the expansion of the hierarchy with new refactoring detectors and enables the reuse of existing refactoring detectors, provided by third party developers. Concerning maintainability, the pattern promotes the development of simple, clean and technology independent refactoring detectors. We have used the pattern for the development of 11 different refactoring detectors in the context of our tool. The pattern has not been observed in other contexts. However, the usage of the pattern...
Inmany modern applications we have todeal withhuge volumes of data. Many techniqueshavebeen developedon how to extract knowledge, statistical usually, from them, especially in thecontext of datamining. The results of such operations are... more
Inmany modern applications we have todeal withhuge volumes of data. Many techniqueshavebeen developedon how to extract knowledge, statistical usually, from them, especially in thecontext of datamining. The results of such operations are abstract andcompact representations of the original data, which wename patterns.Still,these patternshave to be further elaborated to be used in an effective way. In this paper we present the architecture of a Pattern Base Management System that can be used to efficiently store, and query patterns. Wepresent its logical structureand we comment on the criteria of whether the existing systems for storing and manipulating data can cover the special user requirements that patterns impose.
This deliverable specifies the main concepts of the CHOReOS middleware architecture. Starting from the Future Internet (FI) challenges for scalability, heterogeneity, mobility, awareness, and adaptation that have been investigated in... more
This deliverable specifies the main concepts of the CHOReOS middleware architecture. Starting from the Future Internet (FI) challenges for scalability, heterogeneity, mobility, awareness, and adaptation that have been investigated in prior work done in WP1, we introduce the aforementioned concepts to deal with the requirements derived from the FI challenges.
The Future Internet envisions a ubiquitous world where available services can be easily discovered and coordinated so as to fit users needs. Service choreographies will play a central role in this vision as an effective means to allow... more
The Future Internet envisions a ubiquitous world where available services can be easily discovered and coordinated so as to fit users needs. Service choreographies will play a central role in this vision as an effective means to allow heterogeneous services to suitably collaborate. This deliverable defines the CHOReOS Dynamic Development Process Model by refining and completing the artefacts/activities and the relationships among them already sketched into the DoW (under the WP2 description), and conceptualized by the the ...
Abstract: The D1. 1 deliverable clarifies baseline, progress, and state of the art that CHOReOS will address. For each of the first four CHOReOS work packages, WP1 to WP4, this deliverable gives a precise definition of the state of the... more
Abstract: The D1. 1 deliverable clarifies baseline, progress, and state of the art that CHOReOS will address. For each of the first four CHOReOS work packages, WP1 to WP4, this deliverable gives a precise definition of the state of the art, an indication of the envisaged progress beyond the state of the art by CHOReOS and the baseline for its research.
Characteristic sets (CS) organize RDF triples based on the set of properties associated with their subject nodes. This concept was recently used in indexing techniques, as it can capture the implicit schema of RDF data. While most... more
Characteristic sets (CS) organize RDF triples based on the set of properties associated with their subject nodes. This concept was recently used in indexing techniques, as it can capture the implicit schema of RDF data. While most CS-based approaches yield significant improvements in space and query performance, they fail to perform well when answering complex query workloads in the presence of schema heterogeneity, i.e., when the number of CSs becomes very large, resulting in a highly partitioned data organization. In this paper, we address this problem by introducing a novel technique, for merging CSs based on their hierarchical structure. Our method employs a lattice to capture the hierarchical relationships between CSs, identifies dense CSs and merges dense CSs with their ancestors. We have implemented our algorithm on top of a relational backbone, where each merged CS is stored in a relational table, and therefore, CS merging results in a smaller number of required tables to ho...
Modern data analysis applications require the ability to provide on-demand integration of data sources while offering a flexible and user-friendly query interface. Traditional techniques for answering queries using views, focused on a... more
Modern data analysis applications require the ability to provide on-demand integration of data sources while offering a flexible and user-friendly query interface. Traditional techniques for answering queries using views, focused on a rather static setting, fail to address such requirements. To overcome these issues, we propose a fully-fledged data integration approach based on graphbased constructs. The extensibility of graphs allows us to extend the traditional framework for data integration with view definitions. Furthermore, we also propose a query language based on subgraphs. We tackle query answering via a query rewriting algorithm based on well-known algorithms for answering queries using views. We experimentally show that the proposed method yields good performance and does not introduce a significant overhead.
In this paper, we discuss the problem of organizing the different ways of computing the interestingness of a particular cell derived from a cube in the context of a hierarchical, multidimensional space. We start from an in-depth study of... more
In this paper, we discuss the problem of organizing the different ways of computing the interestingness of a particular cell derived from a cube in the context of a hierarchical, multidimensional space. We start from an in-depth study of the interestingness aspects in the study of human behavior and include in our survey the approaches taken by computer-science efforts in the area of data mining and user recommendations. We move on to structure interestingness along different fundamental, high level aspects, and, due to their high-level nature, we also move towards much more concrete data-oriented definitions of interestingness aspects.
Characteristic sets (CS) organize RDF triples based on the set of properties associated with their subject nodes. This concept was recently used in indexing techniques, as it can capture the implicit schema of RDF data. While most... more
Characteristic sets (CS) organize RDF triples based on the set of properties associated with their subject nodes. This concept was recently used in indexing techniques, as it can capture the implicit schema of RDF data. While most CS-based approaches yield significant improvements in space and query performance, they fail to perform well when answering complex query workloads in the presence of schema heterogeneity, i.e., when the number of CSs becomes very large, resulting in a highly partitioned data organization. In this paper, we address this problem by introducing a novel technique, for merging CSs based on their hierarchical structure. Our method employs a lattice to capture the hierarchical relationships between CSs, identifies dense CSs and merges dense CSs with their ancestors, thus reducing the size of the CSs as well as the links between them.We implemented our algorithm on top of a relational backbone, where each merged CS is stored in a relational table, and we performe...
In this vision paper we structure a vision for the Business Intelligence of the near future in terms of a model with novel concepts and operators. We envision systems where the end-user requests information at a very high level, expressed... more
In this vision paper we structure a vision for the Business Intelligence of the near future in terms of a model with novel concepts and operators. We envision systems where the end-user requests information at a very high level, expressed as his intention to discover information, and the system transforms this request to the concrete execution of algorithms in order to compute, visualize and comment data and important highlights among them as an answer to the information request made by the end-user.
Extraction-Transformation-Loading (ETL) tools are pieces of software responsible for the extraction of data from several sources, their cleansing, customization and insertion into a data warehouse. In this paper, we propose a methodology... more
Extraction-Transformation-Loading (ETL) tools are pieces of software responsible for the extraction of data from several sources, their cleansing, customization and insertion into a data warehouse. In this paper, we propose a methodology for the earliest stages of the data warehouse design, with the goal of tracing the analysis of the structure and content of the existing data sources and their intentional mapping to the common conceptual data warehouse model. The methodology comprises a set of steps that can be summarized as follows: (a) identification of the proper data stores; (b) candidates and active candidates for the involved data stores; (c) attribute mapping between the providers and the consumers, and (d) annotation of the diagram with runtime constraints.
Data narration has received increasing interest in several communities while lacking models and tools for handling, building and structuring data narratives. We present a simple prototype for supporting data narrative, based on a... more
Data narration has received increasing interest in several communities while lacking models and tools for handling, building and structuring data narratives. We present a simple prototype for supporting data narrative, based on a conceptual model defined in [4]. It guides a data narrator from scratch: fetch and explore data, abstract important messages based on an intentional goal, structure the contents of the data story, and render it in a visual manner. This prototype is implemented in Java as a web application using Spring, d3.js, JFreeChart and Apache PDFBox.
This paper introduces the Traveling Analyst Problem (TAP), an original strongly NP-hard problem where an automated algorithm assists an analyst to explore a dataset, by suggesting the most interesting and coherent set of queries that are... more
This paper introduces the Traveling Analyst Problem (TAP), an original strongly NP-hard problem where an automated algorithm assists an analyst to explore a dataset, by suggesting the most interesting and coherent set of queries that are estimated to be completed under a time constraint. We motivate the problem, study its complexity, propose a simple heuristic under simplifying assumptions for approximating it, and run preliminary tests to observe the behavior of this heuristic.
Evolving dependency magnets, i.e., software modules upon which a large number of other modules depend, is always a hard task. As Robert C. Martin has nicely summarized it (see http://www.oodesign.com/design-principles.html), fundamental... more
Evolving dependency magnets, i.e., software modules upon which a large number of other modules depend, is always a hard task. As Robert C. Martin has nicely summarized it (see http://www.oodesign.com/design-principles.html), fundamental problems of bad design that hinder evolution include immobility, i.e., difficulty in reuse, rigidity, i.e., the tendency for software to be difficult to change and fragility, i.e., the tendency of the software to break in many places every time it is changed. In such cases, developers are reluctant to evolve the software to avoid facing the impact of change. How are these fundamentals related to schema evolution? We know that changes in the schema of a database affect a large (and not necessarily traced) number of surrounding applications, without explicit identification of the impact. These affected applications can then suffer from syntactic and semantic inconsistencies – with syntactic inconsistency leading to application crashes and semantic inco...
Data warehouse metadata are pieces of information stored in one or more special-purpose metadata repositories that include (a) information on the contents of the data warehouse, their location and their structure, (b) information on the... more
Data warehouse metadata are pieces of information stored in one or more special-purpose metadata repositories that include (a) information on the contents of the data warehouse, their location and their structure, (b) information on the processes that take place in the data warehouse back-stage, concerning the refreshment of the warehouse with clean, up-to-date, semantically and structurally reconciled data, (c) information on the implicit semantics of data (with respect to a common enterprise model), along with any other kind of data that aids the end-user exploit the information of the warehouse, (d) information on the infrastructure and physical characteristics of components and the sources of the data warehouse, and, (e) information including security, authentication, and usage statistics that aids the administrator tune the operation of the data warehouse as appropriate.
Multidimensional data are published in the web of data under common directives, such as the Resource Description Framework (RDF). The increasing volume and diversity of these data pose the challenge of finding relations between them in a... more
Multidimensional data are published in the web of data under common directives, such as the Resource Description Framework (RDF). The increasing volume and diversity of these data pose the challenge of finding relations between them in a most efficient and accurate way, by taking into advantage their overlapping schemes. In this paper we define two types of relationships between multidimensional RDF data, and we propose algorithms for efficient and scalable computation of these relationships. Specifically, we define the notions of containment and complementarity between points in multidimensional dataspaces, as different aspects of relatedness, and we propose a baseline method for computing them, as well as two alternative methods that target speed and scalability. We provide an experimental evaluation over real-world and synthetic datasets and we compare our approach to a SPARQL-based and a rule-based alternative, which prove to be inefficient for increasing input sizes.

And 200 more