Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Characteristics of A Data Warehouse

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 5

Characteristics of a data warehouse

• Subject oriented. Data are organized based on how the users refer to them.

• Integrated. All inconsistencies regarding naming convention and value representations are
removed.

• Nonvolatile. Data are stored in read-only format and do not change over time.

• Time variant. Data are not current but normally time series.

• Summarized Operational data are mapped into a decision-usable format

• Large volume. Time series data sets are normally quite large.

• Not normalized. DW data can be, and often are, redundant.

• Metadata. Data about data are stored.

• Data sources. Data come from internal and external unintegrated operational systems.

Data warehouse models topology

• Enterprise warehouse

• collects all of the information about subjects spanning the entire organization

• Data Mart

• a subset of corporate-wide data that is of value to a specific groups of users. Its


scope is confined to specific, selected groups, such as marketing data mart

• Independent vs. dependent (directly from warehouse) data mart

• The central data warehouse – a single physical database contains all of the data for a
specific functional area

• The distributed data warehouse – the components are distributed across several physical
databases

• Virtual warehouse; the end users have direct access to the data stores, using tools enabled
at the data access layer

• A set of views over operational databases

• Only some of the possible summary views may be materialized

• A Virtual Data Warehouse approach is often chosen when there are infrequent demands for
data and management wants to determine if/how users will use operational data.

• One of the weaknesses of a Virtual Data Warehouse approach is that user queries a made
against operational DBs.

• One way to minimize this problem is to build a “Query Monitor” to check the performance
characteristics of a query before executing it.
• A Coarse Data Warehouse: is often chosen when the organization has a relatively clean/new
operational system and management wants to make the operational data more easily
available for just that system.

• A Central Data Warehouse: is often chosen when the organization has a clear understanding
about it Information Access needs and wants to provide “quality”, “integrated” , information
to its knowledge workers

• A Distributed Data Warehouse: is similar in most respects to a Central Data Warehouse,


except that the data is distributed to separate mini-Data Warehouses (Data Marts )on local
or specialized servers

META DATA REPOSITORY

• Meta data is the data defining warehouse objects. It has the following kinds

• Description of the structure of the warehouse

• schema, view, dimensions, hierarchies, derived data defn, data mart


locations and contents

• Operational meta-data

• data lineage (history of migrated data and transformation path), currency of


data (active, archived, or purged), monitoring information (warehouse usage
statistics, error reports, audit trails)

• The algorithms used for summarization

• The mapping from operational environment to the data warehouse

• Data related to system performance

• Business data

DATA WAREHOUSE BACK END TOOLS AND UTILITIES

• Data extraction:

• get data from multiple, heterogeneous, and external sources

• Data cleaning:

• detect errors in the data and rectify them when possible

• Data transformation:

• convert data from legacy or host format to warehouse format

• Load:

• sort, summarize, consolidate, compute views, check integrity, and build indicies and
partitions

• Refresh

• propagate the updates from the data sources to the warehouse


THE DATA WAREHOUSE ARCHITECTURE

The architecture consists of various interconnected elements:

• Operational and external database layer – the source data for the DW

• Information access layer – the tools the end user access to extract and analyze the
data

• Data access layer – the interface between the operational and information access
layers

• Metadata layer – the data directory or repository of metadata information

Additional layers are:

• Process management layer – the scheduler or job controller

• Application messaging layer – the “middleware” that transports information around


the firm

• Physical data warehouse layer – where the actual data used in the DSS are located

• Data staging layer – all of the processes necessary to select, edit, summarize and
load warehouse data from the operational and external data bases

COMPONENTS
Metadata is “data about data”.

COMPONENTS OF METADATA

• Transformation maps – records that show what transformations were applied

• Extraction & relationship history – records that show what data was analyzed

• Algorithms for summarization – methods available for aggregating and summarizing

• Data ownership – records that show origin

• Patterns of access – records that show what data are accessed and how often

DATA WAREHOUSE: It is a subject-oriented, integrated, time-variant, and nonvolatile collection


of data in support of management’s decision-making process

COMPARISON WITH TRADITIONAL DATA WAREHOUSE

• By comparison: an OLTP (on-line transaction processing ) or operational database system is


used to deal with the everyday running of one aspect of an enterprise.

• OLTP systems are usually designed independently of each other and it is difficult for them to
share information

• Data Warehouses are mainly optimized for appropriate data access.

– Traditional databases are transactional and are optimized for both transaction
processing and integrity assurance.

• Data warehouses emphasize more on historical data as their main purpose is to support
time-series and trend analysis.

• In transactional databases transaction is the mechanism of change to the database. By


contrast, information in data warehouse is relatively coarse grained and DWs are regarded
as non-real time. The periodic refresh policy is carefully chosen, usually incremental.
Compared with transactional databases, data warehouses are nonvolatile

Data warehouse Operational system

Subject oriented Transaction oriented

Large (hundreds of GB up Small (MB up to several GB)


to several TB)
Historic data Current data

De-normalized table Normalized table structure (many tables, few columns per table)
structure (few tables,
many columns per table)

Batch updates Continuous updates

Usually very complex Simple to complex queries


queries

IMPORTANCE/NEED FOR DATA WAREHOUSE

• Consolidation of information resources

• Improved query performance

• Separate research and decision support functions from the operational systems

• Foundation for data mining, data visualization, advanced reporting and OLAP tools

DEFINITIONS

• Data Warehouse – The queryable source of data in the enterprise. It is comprised of the
union of all of its constituent data marts.

• Data Mart – A logical subset of the complete data warehouse. Often viewed as a restriction
of the data warehouse to a single business process or to a group of related business
processes targeted toward a particular business group.

• Operational Data Store (ODS) – A point of integration for operational systems that
developed independent of each other. Since an ODS supports day to day operations, it
needs to be continually updated.

• Data mining: - the process of discovering meaningful new correlations, patterns, and trends
by sifting through large amounts of stored data, using pattern recognition technologies and
statistical and mathematical techniques

Business Intelligence (BI) technologies: An environment that includes a data warehouse (or
more commonly one or more data marts) together with tools such as OLAP and /or data mining

You might also like