Data Warehouses provide sophisticated tools for analyzing complex data online, in particular by a... more Data Warehouses provide sophisticated tools for analyzing complex data online, in particular by aggregating data along dimensions spanned by master data. Changes to these master data is a frequent threat to the correctness of OLAP results, in particular for multi- period data analysis, trend calculations, etc. As dimension data might change in underlying data sources without notifying the data warehouse we are exploring the application of data mining techniques for detecting such changes and contribute to avoiding incorrect results of OLAP queries.
In order to establish a useful data warehouse, it must be correct and consistent. Hence, when sel... more In order to establish a useful data warehouse, it must be correct and consistent. Hence, when selecting the data sources for building the data warehouse, it is essential know exactly about the concept and structure of all possible data sources and the dependencies between them. In a perfect world, this knowledge stems from an integrated, enterprize-wide data model. However, the reality is different and often an explicit model is not available. This paper proposes an approach for identifying data sources for a data warehouse, even without having detailed knowledge about interdependencies of data sources. Furthermore, we are able to confine the number of potential data sources. Hence, our approach reduces the time needed to build and maintain a data warehouse and it increases the data quality of the data warehouse.
Medical research is a highly collaborative process in an interdisciplinary environment that may b... more Medical research is a highly collaborative process in an interdisciplinary environment that may be effectively supported by a Computer Supported Cooperative Work (CSCW) system. Research activities should be traceable in order to allow verification of results, repeatability of experiments and documentation as learning processes. Therefore, by recording the provenance of data together with the collaborative context it is embedded into, novel types of provenance queries may be answered. We designed and implemented a next-generation CSCW system providing both the collaborative functionalities as well as the definition and execution of structured processes. We integrated a data provenance model recording process- and collaboration-related operations automatically and demonstrate the capabilities of the model by answering specific data provenance queries from the biomedical domain.
Ontologies are shared conceptualizations of certain domains. Especially in legal and regulatory o... more Ontologies are shared conceptualizations of certain domains. Especially in legal and regulatory ontologies modifications like the passing of a new law, decisions by high courts, new insights by scholars, etc. have to be considered. Otherwise, we would not be able to identify which knowledge (which ontology) was valid at an arbitrary timepoint in the past. And without this knowledge we would for instance not be able to identify why a user came to a specific decision. In this paper we will show how a simple ontology description formalism, namely a directed graph, has to be extended to represent changing knowledge. Furthermore, we will present the operations that are necessary to manipulate such an ontology. Finally, we will discuss different implementation approaches.
Abstract. Multi-dimensional analysis is one of the most important ap-plications of data warehouse... more Abstract. Multi-dimensional analysis is one of the most important ap-plications of data warehouses, giving the possibility to aggregate and compare data along dimensions relevant in the application domain. Typ-ically time is one of the dimensions we find in data ...
Time is one of the dimensions we frequently find in data warehouses allowing comparisons of data ... more Time is one of the dimensions we frequently find in data warehouses allowing comparisons of data in different periods. In current multi-dimensional data warehouse technology changes of dimension data cannot be represented adequately since all dimensions are (implicitly) considered as orthogonal. We propose an extension of the multi-dimensional data model employed in data warehouses allowing to cope correctly with changes in dimension data: a temporal multi-dimensional data model allows the registration of temporal versions of dimension data. Mappings are provided to transfer data between different temporal versions of the instances of dimensions and enable the system to correctly answer queries spanning multiple periods and thus different versions of dimension data.
Page 1. A Model for a Temporal Data Warehouse Johann Eder University of Klagenfurt Dep. of Inform... more Page 1. A Model for a Temporal Data Warehouse Johann Eder University of Klagenfurt Dep. of Informatics-Systems eder@isys.uni-klu.ac.at Christian Koncilia University of Klagenfurt Dep. of Informatics-Systems koncilia@isys.uni-klu.ac.at ...
Data Warehouses provide sophisticated tools for analyzing complex data online, in particular by a... more Data Warehouses provide sophisticated tools for analyzing complex data online, in particular by aggregating data along dimensions spanned by master data. Changes to these master data is a frequent threat to the correctness of OLAP results, in particular for multi- period data analysis, trend calculations, etc. As dimension data might change in underlying data sources without notifying the data warehouse, we are exploring the application of data mining techniques for detecting such changes and contribute to avoiding incorrect results of OLAP queries.
We present a technique for discovering and representing changes between versions of data warehous... more We present a technique for discovering and representing changes between versions of data warehouse structures. We select a tree comparison algorithm, adapt it for the particularities of multidimensional data structures and extend it with a module for detection of node renamings. The result of these algorithms are so called editscripts consisting of transformation operations which, when executed in sequence, transform the earlier version to the later, and thus show the relationships between the elements of different versions of data warehouse structures. This procedure helps data warehouse administrators to register changes. We describe a prototypical implementation of the concept which imports multidimensional structures from Hyperion Essbase data warehouses, compares these versions and generates a list of differences.
Data Warehouses provide sophisticated tools for analyzing complex data online, in particular by a... more Data Warehouses provide sophisticated tools for analyzing complex data online, in particular by aggregating data along dimensions spanned by master data. Changes to these master data is a frequent threat to the correctness of OLAP results, in particular for multi- period data analysis, trend calculations, etc. As dimension data might change in underlying data sources without notifying the data warehouse we are exploring the application of data mining techniques for detecting such changes and contribute to avoiding incorrect results of OLAP queries.
In order to establish a useful data warehouse, it must be correct and consistent. Hence, when sel... more In order to establish a useful data warehouse, it must be correct and consistent. Hence, when selecting the data sources for building the data warehouse, it is essential know exactly about the concept and structure of all possible data sources and the dependencies between them. In a perfect world, this knowledge stems from an integrated, enterprize-wide data model. However, the reality is different and often an explicit model is not available. This paper proposes an approach for identifying data sources for a data warehouse, even without having detailed knowledge about interdependencies of data sources. Furthermore, we are able to confine the number of potential data sources. Hence, our approach reduces the time needed to build and maintain a data warehouse and it increases the data quality of the data warehouse.
Medical research is a highly collaborative process in an interdisciplinary environment that may b... more Medical research is a highly collaborative process in an interdisciplinary environment that may be effectively supported by a Computer Supported Cooperative Work (CSCW) system. Research activities should be traceable in order to allow verification of results, repeatability of experiments and documentation as learning processes. Therefore, by recording the provenance of data together with the collaborative context it is embedded into, novel types of provenance queries may be answered. We designed and implemented a next-generation CSCW system providing both the collaborative functionalities as well as the definition and execution of structured processes. We integrated a data provenance model recording process- and collaboration-related operations automatically and demonstrate the capabilities of the model by answering specific data provenance queries from the biomedical domain.
Ontologies are shared conceptualizations of certain domains. Especially in legal and regulatory o... more Ontologies are shared conceptualizations of certain domains. Especially in legal and regulatory ontologies modifications like the passing of a new law, decisions by high courts, new insights by scholars, etc. have to be considered. Otherwise, we would not be able to identify which knowledge (which ontology) was valid at an arbitrary timepoint in the past. And without this knowledge we would for instance not be able to identify why a user came to a specific decision. In this paper we will show how a simple ontology description formalism, namely a directed graph, has to be extended to represent changing knowledge. Furthermore, we will present the operations that are necessary to manipulate such an ontology. Finally, we will discuss different implementation approaches.
Abstract. Multi-dimensional analysis is one of the most important ap-plications of data warehouse... more Abstract. Multi-dimensional analysis is one of the most important ap-plications of data warehouses, giving the possibility to aggregate and compare data along dimensions relevant in the application domain. Typ-ically time is one of the dimensions we find in data ...
Time is one of the dimensions we frequently find in data warehouses allowing comparisons of data ... more Time is one of the dimensions we frequently find in data warehouses allowing comparisons of data in different periods. In current multi-dimensional data warehouse technology changes of dimension data cannot be represented adequately since all dimensions are (implicitly) considered as orthogonal. We propose an extension of the multi-dimensional data model employed in data warehouses allowing to cope correctly with changes in dimension data: a temporal multi-dimensional data model allows the registration of temporal versions of dimension data. Mappings are provided to transfer data between different temporal versions of the instances of dimensions and enable the system to correctly answer queries spanning multiple periods and thus different versions of dimension data.
Page 1. A Model for a Temporal Data Warehouse Johann Eder University of Klagenfurt Dep. of Inform... more Page 1. A Model for a Temporal Data Warehouse Johann Eder University of Klagenfurt Dep. of Informatics-Systems eder@isys.uni-klu.ac.at Christian Koncilia University of Klagenfurt Dep. of Informatics-Systems koncilia@isys.uni-klu.ac.at ...
Data Warehouses provide sophisticated tools for analyzing complex data online, in particular by a... more Data Warehouses provide sophisticated tools for analyzing complex data online, in particular by aggregating data along dimensions spanned by master data. Changes to these master data is a frequent threat to the correctness of OLAP results, in particular for multi- period data analysis, trend calculations, etc. As dimension data might change in underlying data sources without notifying the data warehouse, we are exploring the application of data mining techniques for detecting such changes and contribute to avoiding incorrect results of OLAP queries.
We present a technique for discovering and representing changes between versions of data warehous... more We present a technique for discovering and representing changes between versions of data warehouse structures. We select a tree comparison algorithm, adapt it for the particularities of multidimensional data structures and extend it with a module for detection of node renamings. The result of these algorithms are so called editscripts consisting of transformation operations which, when executed in sequence, transform the earlier version to the later, and thus show the relationships between the elements of different versions of data warehouse structures. This procedure helps data warehouse administrators to register changes. We describe a prototypical implementation of the concept which imports multidimensional structures from Hyperion Essbase data warehouses, compares these versions and generates a list of differences.
Uploads
Papers by Christian Koncilia