Characteristics of A Data Warehouse
Characteristics of A Data Warehouse
Characteristics of A Data Warehouse
• Subject oriented. Data are organized based on how the users refer to them.
• Integrated. All inconsistencies regarding naming convention and value representations are
removed.
• Nonvolatile. Data are stored in read-only format and do not change over time.
• Time variant. Data are not current but normally time series.
• Large volume. Time series data sets are normally quite large.
• Data sources. Data come from internal and external unintegrated operational systems.
• Enterprise warehouse
• collects all of the information about subjects spanning the entire organization
• Data Mart
• The central data warehouse – a single physical database contains all of the data for a
specific functional area
• The distributed data warehouse – the components are distributed across several physical
databases
• Virtual warehouse; the end users have direct access to the data stores, using tools enabled
at the data access layer
• A Virtual Data Warehouse approach is often chosen when there are infrequent demands for
data and management wants to determine if/how users will use operational data.
• One of the weaknesses of a Virtual Data Warehouse approach is that user queries a made
against operational DBs.
• One way to minimize this problem is to build a “Query Monitor” to check the performance
characteristics of a query before executing it.
• A Coarse Data Warehouse: is often chosen when the organization has a relatively clean/new
operational system and management wants to make the operational data more easily
available for just that system.
• A Central Data Warehouse: is often chosen when the organization has a clear understanding
about it Information Access needs and wants to provide “quality”, “integrated” , information
to its knowledge workers
• Meta data is the data defining warehouse objects. It has the following kinds
• Operational meta-data
• Business data
• Data extraction:
• Data cleaning:
• Data transformation:
• Load:
• sort, summarize, consolidate, compute views, check integrity, and build indicies and
partitions
• Refresh
• Operational and external database layer – the source data for the DW
• Information access layer – the tools the end user access to extract and analyze the
data
• Data access layer – the interface between the operational and information access
layers
• Physical data warehouse layer – where the actual data used in the DSS are located
• Data staging layer – all of the processes necessary to select, edit, summarize and
load warehouse data from the operational and external data bases
COMPONENTS
Metadata is “data about data”.
COMPONENTS OF METADATA
• Extraction & relationship history – records that show what data was analyzed
• Patterns of access – records that show what data are accessed and how often
• OLTP systems are usually designed independently of each other and it is difficult for them to
share information
– Traditional databases are transactional and are optimized for both transaction
processing and integrity assurance.
• Data warehouses emphasize more on historical data as their main purpose is to support
time-series and trend analysis.
De-normalized table Normalized table structure (many tables, few columns per table)
structure (few tables,
many columns per table)
• Separate research and decision support functions from the operational systems
• Foundation for data mining, data visualization, advanced reporting and OLAP tools
DEFINITIONS
• Data Warehouse – The queryable source of data in the enterprise. It is comprised of the
union of all of its constituent data marts.
• Data Mart – A logical subset of the complete data warehouse. Often viewed as a restriction
of the data warehouse to a single business process or to a group of related business
processes targeted toward a particular business group.
• Operational Data Store (ODS) – A point of integration for operational systems that
developed independent of each other. Since an ODS supports day to day operations, it
needs to be continually updated.
• Data mining: - the process of discovering meaningful new correlations, patterns, and trends
by sifting through large amounts of stored data, using pattern recognition technologies and
statistical and mathematical techniques
Business Intelligence (BI) technologies: An environment that includes a data warehouse (or
more commonly one or more data marts) together with tools such as OLAP and /or data mining