0% found this document useful (0 votes)

5 views

Unit 1

Uploaded by

Astha Shukla

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

Unit 1

Uploaded by

Astha Shukla

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

UNIT-1

DATA WAREHOUSING

Overview:The term "Data Warehouse" was first coined by Bill Inmon in

1990. According to Inmon, a data warehouse is a subject oriented,
integrated, time-variant, and non-volatile collection of data. This data
helps analysts to take informed decisions in an organization.

An operational database undergoes frequent changes on a daily basis

on account of the transactions that take place. Suppose a business
executive wants to analyze previous feedback on any data such as a
product, a supplier, or any consumer data, then the executive will have
no data available to analyze because the previous data has been
updated due to transactions.

A data warehouses provides us generalized and consolidated data in

multidimensional view. Along with generalized and consolidated view of
data, a data warehouses also provides us Online Analytical Processing
(OLAP) tools. These tools help us in interactive and effective analysis of
data in a multidimensional space. This analysis results in data
generalization and data mining.

Data mining functions such as association, clustering, classification,

prediction can be integrated with OLAP operations to enhance the
interactive mining of knowledge
At multiple level of abstraction. That’s why data warehouse has now
become an important platform fo data analysis and online analytical
processing.

Understanding a Data Warehouse:

● A data warehouse is a database, which is kept separate from the
organization's operational database.

● There is no frequent updating done in a data warehouse.

● It possesses consolidated historical data, which helps the
organization to analyze its business.

● A data warehouse helps executives to organize, understand, and

use their data to take strategic decisions.

● Data warehouse systems help in the integration of diversity of

application systems.
● A data warehouse system helps in consolidated historical data
analysis.

Why a Data Warehouse is Separated from Operational Databases

A data warehouses is kept separate from operational databases due to

the following reasons-

● An operational database is constructed for well-known tasks and

workloads such as searching particular records, indexing, etc. In
contract, data warehouse queries are often complex and they
present a general form of data.

● Operational databases support concurrent processing of multiple

transactions. Concurrency control and recovery mechanisms are
required for operational databases to ensure robustness and
consistency of the database.

● An operational database query allows to read and modify

operations, while an OLAP query needs only read only access of
stored data.

● An operational database maintains current data. On the other

hand, a data warehouse maintains historical data.

Data Warehouse Features:

The key features of a data warehouse are discussed below-

● Subject Oriented: A data warehouse is subject oriented
because it provides information around a subject rather than the
organization's ongoing operations. These subjects can be product,
customers, suppliers, sales, revenue, etc. A data warehouse does
not focus on the ongoing operations, rather it focuses on modelling
and analysis of data for decision making.

● Integrated: A data warehouse is constructed by integrating data

from heterogeneous sources such as relational databases, flat
files, etc. This integration enhances the effective analysis of data.

● Time Variant: The data collected in a data warehouse is

identified with a particular time period. The data in a data
warehouse provides information from the historical point of view.

● Non-volatile: Non-volatile means the previous data is not

erased when new data is added to it. A data warehouse is kept
separate from the operational database and therefore frequent
changes in operational database is not reflected in the data
warehouse.

Note- A data warehouse does not require transaction processing,

recovery, and concurrency controls, because it is physically stored and
separate from the operational database.

Data Warehouse Applications:

As discussed before, a data warehouse helps business executives to

organize, analyze, and use their data for decision making. A data
warehouse serves as a sole part of a plan-execute-assess "closed-loop"
feedback system for the enterprise management. Data warehouses are
widely used in the following fields -

● Financial services

● Banking services
● Consumer goods

● Retail sectors

● Controlled manufacturing

Types of Data Warehouse

Information processing, analytical processing, and data mining are the

three types of data warehouse applications that are discussed below -

● Information Processing: A data warehouse allows to process

the data stored in it. The data can be processed by means of
querying, basic statistical analysis, reporting using crosstabs,
tables, charts, or graphs. Analytical Processing: A data warehouse
supports analytical processing of the information stored in it. The
data can be analyzed by means of basic OLAP operations,
including slice-and-dice, drill down, drill up, and pivoting.

● Data Mining: Data mining supports knowledge discovery by

finding hidden patterns and associations, constructing analytical
models, performing classification and prediction. These mining
results can be presented using the visualization tools.

Data Warehousing and Architecture:

Data warehousing is the process of constructing and using a data

warehouse. A data warehouse is constructed by integrating data from
multiple heterogeneous sources that support analytical reporting,
structured and/or ad hoc queries, and decision making. Data
warehousing involves data cleaning, data integration, and data
consolidations.

Using Data Warehouse Information:

There are decision support technologies that help utilize the data
available in a data warehouse. These technologies help executives to
use the warehouse quickly and effectively. They can gather data,
analyze it, and take decisions based on the information present in the
warehouse. The information gathered in a warehouse can be used in
any of the following domains -

● Tuning Production Strategies: The product strategies can

be well tuned by repositioning the products and managing the
product portfolios by comparing the sales quarterly or yearly.

● Customer Analysis: Customer analysis is done by analyzing

the customer's buying preferences, buying time, budget cycles,
etc.

● Operations Analysis: Data warehousing also helps in

customer relationship management, and making environmental
corrections. The information also allows us to analyze business
operations.

Integrating Heterogeneous Databases

To integrate heterogeneous databases, we have two approaches -

● Query-driven Approach

● Update-driven Approach

Query-Driven Approach:
This is the traditional approach to integrate heterogeneous databases.
This approach was used to build wrappers and integrators on top of
multiple heterogeneous databases. These integrators are also known as
mediators.

Process of Query-Driven Approach

● When a query is issued to a client side, a metadata dictionary
translates the query into an appropriate form for individual
heterogeneous sites involved.

● Now these queries are mapped and sent to the local query
processor.

● The results from heterogeneous sites are integrated into a global

answer set.

Disadvantages

● Query-driven approach needs complex integration and filtering

processes.

● This approach is very inefficient.

● It is very expensive for frequent queries.

● This approach is also very expensive for queries that require

aggregations.

Update-Driven Approach:
This is an alternative to the traditional approach. Today's data
warehouse systems follow update-driven approach rather than the
traditional approach discussed earlier. In update-driven approach, the
information from multiple heterogeneous sources are integrated in
advance and are stored in a warehouse. This information is available for
direct querying and analysis.

Advantages:

This approach has the following advantages -

● This approach provide high performance.

● The data is copied, processed, integrated, annotated, summarized
and restructured in semantic data store in advance.

● Query processing does not require an interface to process data at

local sources.

Functions of Data Warehouse Tools and Utilities

The following are the functions of data warehouse tools and utilities-

● Data Extraction- Involves gathering data from multiple

heterogeneous sources.

● Data Cleaning-Involves finding and correcting the errors in data.

● Data Transformation-involves converting the data from legacy

format to warehouse format.

● Data Loading-Involves sorting, summarizing, consolidating,

checking integrity, and building indices and partitions.

● Refreshing-Involves updating from data sources to warehouse.

Note Data cleaning and data transformation are important steps in

improving the quality of data and data mining results.

Business Analysis Framework:

The business analyst get the information from the data warehouses to
measure the performance and make critical adjustments in order to win
over other business. holders in the market. Having a data warehouse
offers the following advantages-

● Since a data warehouse can gather information quickly and

efficiently, it can enhance business productivity.
● A data warehouse provides us a consistent view of customers and
items, hence, it helps us manage customer relationship.

● A data warehouse also helps in bringing down the costs by

tracking trends, patterns over a long period in a consistent and
reliable manner.

To design an effective and efficient data warehouse, we need to

understand and analyze the business needs and construct a business
analysis framework. Each person has different views regarding the
design of a data warehouse. These views are as follows -

● The top-down view- This view allows the selection of relevant

information needed for a data warehouse.

● The data source view- This view presents the information being
captured, stored, and managed by the operational system.

● The data warehouse view- This view includes the fact tables and
dimension tables. It represents the information stored inside the
data warehouse.

● The business query view- It is the view of the data from the
viewpoint of the end-user.

Three-Tier Data Warehouse Architecture

Generally a data warehouses adopts a three-tier architecture. Following

are the three tiers of the data warehouse architecture.

● Bottom Tier- The bottom tier of the architecture is the data

warehouse database server. it is the relational database system.
We use the back end tools and utilities to feed data into the bottom
tier. These back end tools and utilities perform the Extract, Clean,
Load, and refresh functions.
● Middle Tier- In the middle tier, we have the OLAP Server that can
be implemented in either of the following ways.

❖ By Relational OLAP (ROLAP), which is an extended

relationaldatabase management system. The ROLAP maps
the operations on multidimensional data to standard
relational operations.
❖ By Multidimensional OLAP (MOLAP) model, which directly
implements the multidimensional data and operations.
● Top-Tier- This tier is the front-end client layer. This layer holds the
query tools and reporting tools, analysis tools and data mining
tools.

Data Warehouse Models:

From the perspective of data warehouse architecture, we have the

following data warehouse models -

● Virtual Warehouse

● Data mart

● Enterprise Warehouse

Virtual Warehouse

The view over an operational data warehouse is known as a virtual

warehouse. It is easy to build a virtual warehouse. Building a virtual
warehouse requires excess capacity on operational database servers.

Data Mart

Data mart contains a subset of organization-wide data. This subset of

data is valuable to specific groups of an organization.

In other words, we can claim that data marts contain data specific to a
particular group. For example, the marketing data mart may contain data
related to items, customers, and sales. Data marts are confined to
subjects.

Points to remember about data marts -

● Window-based or Unix/Linux-based servers are used to

implement data marts. They are implemented on low-cost servers.
● The implementation data mart cycles is measured in short periods
of time, i.e., in weeks rather than months or years.
● The life cycle of a data mart may be complex in long run, if its
planning and design are not organization-wide.
● Data marts are small in size.
● The source of a data mart is departmentally structured data
warehouse.
● Data marts are customized by department.
● Data mart are flexible.

Enterprise Warehouse
● An enterprise warehouse collects all the information and the
subjects spanning an entire organization
● It provides us enterprise-wide data integration.
● The data is integrated from operational systems and external
information providers.
● This information can vary from a few gigabytes to hundreds of
gigabytes, terabytes or beyond.

Load Manager Architecture

This component performs the operations required to extract and load

process.

● Extract the data from source system.

● Fast Load the extracted data into temporary data store.
● Perform simple transformations into structure similar to the one in
the data warehouse.
Extract Data from Source

The data is extracted from the operational databases or the external

information providers. Gateways is the application programs that are
used to extract data. It is supported by underlying DBMS and allows
client program to generate SQL to be executed at a server. Open
Database Connection(ODBC), Java Database Connection (JDBC), are
examples of gateway.

Fast Load
● In order to minimize the total load window the data need to be
loaded into the warehouse in the fastest possible time.
● The transformations affects the speed of data processing.
● It is more effective to load the data into relational database prior to
applying transformations and checks.
● Gateway technology proves to be not suitable, since they tend not
be performant when large data volumes are involved.

Simple Transformations

While loading it may be required to perform simple transformations. After

this has been completed we are in position to do the complex checks.
Suppose we are loading the EPOS sales transaction we need to perform
the following checks:
● Strip out all the columns that are not required within the
warehouse.
● Convert all the values to required data types.

Warehouse Manager

A warehouse manager is responsible for the warehouse management

process. It consists of third-party system software, C programs, and shell
scripts.

The size and complexity of warehouse managers varies between

specific solutions.
Warehouse Manager Architecture

A warehouse manager includes the following -

● The controlling process
● Stored procedures or C with SQL
● Backup/Recovery tool
● SQL Scripts

Operations Performed by Warehouse Manager

● A warehouse manager analyzes the data to perform consistency
and referential integrity checks.
● Creates indexes, business views, partition views against the base
data.
● Generates new aggregations and updates existing aggregations.
Generates normalizations.
● Transforms and merges the source data into the published data
warehouse.
● Backup the data in the data warehouse.
● Archives the data that has reached the end of its captured life.

Note- A warehouse Manager also analyzes query profiles to determine

index and aggregations are appropriate.

Query Manager:
● Query manager is responsible for directing the queries to the
suitable tables.
● By directing the queries to appropriate tables, the speed of
querying and response generation can be increased.
● Query manager is responsible for scheduling the execution of the
queries posed by the user.

Query Manager Architecture:

The following screenshot shows the architecture of a query manager. It

includes the following:
● Query redirection via C tool or RDBMS
● Stored procedures
● Query management tool
● Query scheduling via C tool or RDBMS
● Query scheduling via third-party software

Detailed Information:

Detailed information is not kept online, rather it is aggregated to the next

level of detail and then archived to tape. The detailed information part of
data warehouse keeps the detailed information in the starflake schema.
Detailed information is loaded into the data warehouse to supplement
the aggregated data.

The following diagram shows a pictorial impression of where detailed

information is stored and how it is used.

Note- If detailed information is held offline to minimize disk storage, we

should make sure that the data has been extracted, cleaned up, and
transformed into starflake schema before it is archived.

Summary Information

Summary Information is a part of data warehouse that stores predefined

aggregations. These aggregations are generated by the warehouse
manager. Summary Information must be treated as transient. It changes
on-the-go in order to respond to the changing query profiles.

The points to note about summary information are as follows -

● Summary information speeds up the performance of common
queries.
● It increases the operational cost.
● It needs to be updated whenever new data is loaded into the data
warehouse.
● It may not have been backed up, since it can be generated fresh
from the detailed information.
Difference between Database System and Data Warehouse

Database System:
Database System is used in traditional way of storing and retrieving
data. The major task of database system is to perform query processing.
These systems are generally referred as online transaction processing
system. These systems are used day to day operations of an
organization.

Characteristics of Database
● Offers security and removes redundancy
● Allow multiple views of the data
● Database system follows the ACID compliance (Atomicity,
Consistency,Isolation, and Durability).
● Allows insulation between programs and data
● Sharing of data and multiuser transaction processing
● Relational Database support multi-user environment

Data Warehouse:
Data Warehouse is the place where huge amount of data is stored. It is
meant for users or knowledge workers in the role of data analysis and
decision making. These systems are supposed to organize and present
data in different format and different forms in order to serve the need of
the specific user for specific purpose. These systems are referred as
online analytical processing.

Characteristics of Data Warehouse

● A data warehouse is subject oriented as it offers information
related to theme instead of companies' ongoing operations.
● The data also needs to be stored in the Dataware house in
common and unanimously acceptable manner.
● The time horizon for the data warehouse is relatively extensive
compared with other operational systems.
● A data warehouse is non-volatile which means the previous data is
not erased when new information is entered in it.
Data warehousing Advantages:

The successful implementation of a data warehouse can bring major,

benefits to an organization including

● Potential high returns on investment

Implementation of data warehousing by an organization requires a huge
investment typically from Rs 10 lack to 50 lacks. However, a study by the
International Data Corporation (IDC) in 1996 reported that average
three-year returns on investment (ROI) in data warehousing reached
401%.

● Competitive advantage
The huge returns on investment for those companies that have
successfully implemented a data warehouse is evidence of the
enormous competitive advantage. that accompanies this technology.
The competitive advantage is gained by allowing decision-makers
access to data that can reveal previously unavailable, unknown, and
untapped information on, for example, customers, trends, and demands.

● Increased productivity of corporate decision-makers

Data warehousing improves the productivity of corporate
decision-makers by creating an integrated database of consistent,
subject-oriented, historical data. It integrates data from multiple
incompatible systems into a form that provides one consistent view of
the organization. By transforming data into meaningful information, a
data warehouse allows business managers to perform more substantive,
accurate, and consistent analysis.

● More cost-effective decision-making

Data warehousing helps to reduce the overall cost of the product- by
reducing the number of channels.

● Better enterprise intelligence.

It helps to provide better enterprise intelligence.
❖ Enhanced customer service.
❖ It is used to enhance customer" service.
Metadata: Concepts and Classifications

Metadata is simply defined as data about data. The data that is used to
represent other data is known as metadata. For example, the index of a
book serves as a metadata for the contents in the book. In other words,
we can say that metadata is the summarized data that leads us to
detailed data. In terms of data warehouse, we can define metadata as
follows.
● Metadata is the road-map to a data warehouse.
● Metadata in a data warehouse defines the warehouse objects.
● Metadata acts as a directory. This directory helps the decision
support system to locate the contents of a data warehouse.

Categories of Metadata:

● Business Metadata: It has the data ownership information,

business definition, and changing policies.

● Technical Metadata: It includes database system names, table

and column names and sizes, data types and allowed values.
Technical metadata also includes structural information such as
primary and foreign key attributes and indices.

● Operational Metadata: It includes currency of data and data

lineage. Currency of data means whether the data is active,
archived, or purged. Lineage of data means the history of data
migrated and transformation. applied on it.

Role of Metadata:

Metadata has a very important role in a data warehouse. The role of

metadata in a warehouse is different from the warehouse data, yet it
plays an important role. The various roles of metadata are explained
below.
● Metadata acts as a directory.
● This directory helps the decision support system to locate the
contents of the data warehouse.
● Metadata helps in decision support system for mapping of data
when data is transformed from operational environment to data
warehouse environment.
● Metadata helps in summarization between current detailed data
and highly summarized data.
● Metadata also helps in summarization between lightly detailed
data and highly summarized data.
● Metadata is used for query tools.
● Metadata is used in extraction and cleansing tools.
● Metadata is used in reporting tools.
● Metadata is used in transformation tools.
● Metadata plays an important role in loading functions.

Metadata Repository:

Metadata repository is an integral part of a data warehouse system. It

has the following metadata:

● Definition of data warehouse: It includes the description of

structure of data warehouse. The description is defined by
schema, view, hierarchies, derived data definitions, and data mart
locations and contents.
● Business metadata: It contains has the data ownership
information, business definition, and changing policies.
● Operational Metadata: It includes currency of data and data
lineage.Currency of data means whether the data is active,
archived, or purged.Lineage of data means the history of data
migrated and transformation applied on it.
● Data for mapping from operational environment to data
warehouse: It includes the source databases and their contents,
data extraction, data partition cleaning, transformation rules, data
refresh and purging rules.
● Algorithms for summarization: It includes dimension algorithms,
data on granularity, aggregation, summarizing, etc.
Challenges in Metadata Management:

The importance of metadata can not be overstated. Metadata helps in

driving the accuracy of reports, validates data transformation, and
ensures the accuracy of calculations. Metadata also enforces the
definition of business terms to business end-users. With all these uses of
metadata, it also has its challenges. Some of the challenges are
discussed below.

● Metadata in a big organization is scattered across the

organization. This metadata is spread in spreadsheets, databases,
and applications.
● Metadata could be present in text files or multimedia files. To use
this data for information management solutions, it has to be
correctly defined.
● There are no industry-wide accepted standards. Data
management solution vendors have narrow focus.
● There are no easy and accepted methods of passing metadata.

Multi-Dimensional Data Model, Data Cubes, Stars, Snow

Flakes, Fact Constellations

Multi-Dimensional Data Model

A multidimensional model views data in the form of a data-cube. A data

cube enables data to be modeled and viewed in multiple dimensions. It
is defined by dimensions and facts.

The dimensions are the perspectives or entities concerning which an

organization keeps records. For example, a shop may create a sales
data warehouse to keep records of the store's sales for the dimension
time, Item, and location. These dimensions allow the save to keep track
of things, for example, monthly sales of items and the locations at which
the iterns were sold. Each dimension has a table related to it, called a
dimensional table, which describes the dimension further. For example,
a dimensional table for an item may contain the attributes item_name,
brand, and type.

A multidimensional data model is organized around a central theme, for

example, sales. This theme is represented by a fact table. Facts are
numerical measures. The fact table contains the names of the facts or
measures of the related dimensional tables.

Working on a Multidimensional Data Model

On the basis of the pre-decided steps, the Multidimensional Data Model

works.

The following stages should be followed by every project for building a

Multi Dimensional Data Model:
● Stage 1: Assembling data from the client. In first stage, a Multi
Dimensional Data Model collects correct data from the client.
Mostly, software professionals provide simplicity to the client about
the range of data which can be gained with the selected
technology and collect the complete data in detail.
● Stage 2: Grouping different segments of the system: in the second
stage, the Multi Dimensional Data Model recognizes and classifies
all the data to the respective section they belong to and also builds
it problem-free to apply step by step.
● Stage 3: Noticing the different proportions. In the third stage, it is
the basis on which the design of the system is based. In this stage,
the main factors are recognized according to the user's point of
view. These factors are also known as "Dimensions".
● Stage 4: Preparing the actual-time factors and their respective
qualities: In the fourth stage, the factors which are recognized in
the previous step are used further for identifying the related
qualities. These qualities are also known as "attributes" in the
database.
● Stage 5: Finding the actuality of factors which are listed previously
and their qualities: In the fifth stage, A Multi Dimensional Data
Model separates and differentiates the actuality from the factors
which are collected by it. These actually play a significant role in
the arrangement of a Multi Dimensional Data Model.
● Stage 6: Building the Schema to place the data, with respect to
the information collected from the steps above. In the sixth stage,
on the basis of the data which was collected previously, a Schema
is built.

Data Cubes

In computer programming contexts, a data cube (or datacube) is a

multi-dimensional ("n-D") array of values. Typically, the term datacube is
applied in contexts where these arrays are massively larger than the
hosting computer's main memory, examples include
multi-terabyte/petabyte data warehouses and time series of image data.

The data cube is used to represent data (sometimes called facts) along
some dimensions of interest. For example, in OLAP such dimensions
could be the subsidiaries a company has, the products the company
offers, and time; in this setup, a fact would be a sales event where a
particular product has been sold in a particular subsidiary at a particular
time. In satellite image timeseries dimensions would be Latitude and
Longitude coordinates and time, a fact (sometimes called measure)
would be a pixel at a given space and time as taken by the satellite
(following some processing that is not of concern here). Even though it is
called a cube (and the examples provided above happen to be
3-dimensional for brevity), a data cube generally is a multi-dimensional
concept which can be 1-dimensional, 2-dimensional, 3-dimensional, or
higher-dimensional. In any case, every dimension divides data into
groups of cells whereas each cell in the cube represents a single
measure of interest. Sometimes cubes hold only few values with the rest
being empty, i.e. undefined, sometimes most or all cube coordinates
hold a cell value. In the first case such data are called sparse, in the
second case they are called dense, although there is no hard delineation
between both.
Applications:

Multi-dimensional arrays can meaningfully represent spatio-temporal

sensor, image, and simulation data, but also statistics data where the
semantics of dimensions is not necessarily of spatial or temporal nature.
Generally, any kind of axis can be combined with any other into a
datacube.

Stars:

Star schema is the fundamental schema among the data mart schema
and it is simplest This schema is widely used to develop or build a data
warehouse and dimensional data marts. It includes one or more fact
tables indexing any number of dimensional tables. The star schema is a
necessary cause of the snowflake schema. It is also efficient for handling
basic queries

It is said to be star as its physical model resembles to the star shape

having a fact table at Its center and the dimension tables at its peripheral
representing the star's points.

Advantages of Star Schema:

Simpler Queries

Join logic of star scherna is quite cinch in comparison to other join logic
which are needed to fetch data from a transactional schema that is
highly normalized

Simplified Business Reporting Logic

In comparison to a transactional schema that is highly normalized, the

star schema makes simpler common business reporting logic, such as
as of reporting and period-over-period.
Feeding Cubes

Star schema is widely used by all OLAP systems to design OLAP cubes
efficiently. In fact major OLAP systems deliver a ROLAP mode of
operation which can use a star schema as a source without designing a
cube structure.

Disadvantages of Star Schema:

● Data integrity is not enforced well since in a highly de-normalized
schema state.
● Not flexible in terms if analytical needs as a normalized data
model.
● Star schemas don't reinforce many-to-many relationships within
business entities at least not frequently.

Snow Flakes

Snowflake Schema in data warehouse is a logical arrangement of tables

in a multidimensional database such that the ER diagram resembles a
snowflake shape. A Snowflake Schema is an extension of a Star
Schema, and it adds additional dimensions.The dimension tables are
normalized which splits data into additional tables.

Characteristics of Snowflake:

● The main benefit of the snowflake schema it uses smaller disk

space.
● Easier to implement a dimension is added to the Schema
● Due to multiple tables query performance is reduced
● The primary challenge that you will face while using the snowflake
Schema is that you need to perform more maintenance efforts
because of the more lookup tables.
Fact Constellations
Fact constellation is a measure of online analytical processing, which is
a collection of multiple fact tables sharing dimension tables, viewed as a
collection of stars. It can be seen as an extension of the star schema.

A fact constellation schema has multiple fact tables. It is also known as

galaxy schema. It is widely used schema and more complex than star
schema and snowflake schema. It is possible to create fact constellation
schema by splitting original star schema into more star schema. It has
many fact tables and some common dimension table.

Advantage:

Provides a flexible schema.

Disadvantage:

It is much more complex and hence, hard to implement and maintain.

Concept hierarchy, 3 Tier Architecture, ETL, Data Marting

Concept Hierarchy

A concept hierarchy represents a series of mappings from a set of

low-level concepts to larger-level, more general concepts. Concept
hierarchy organizes information or concepts in a hierarchical structure or
a specific partial order, which are used for defining knowledge in brief,
high-level methods, and creating possible mining knowledge at several
levels of abstraction.

A conceptual hierarchy includes a set of nodes organized in a tree,

where the nodes define values of an attribute known as concepts. A
specific node, "ANY", is constrained for the root of the tree. A number is
created to the level of each node in a conceptual hierarchy. The level of
the root node is one. The level of a non-root node is one more the level
of its parent level number.
Because values are defined by nodes, the levels of nodes can also be
used to describe the levels of values. Concept hierarchy enables raw
information to be managed at a higher and more generalized level of
abstraction.

There are several types of concept hierarchies which are as

follows:

Set-Grouping Hierarchy: A set-grouping hierarchy constructs values for

a given attribute or dimension into groups or constant range values. It is
also known as instance hierarchy because the partial series of the
hierarchy is represented on the set of instances or values of an attribute.
These hierarchies have more functional sense and are so approved than
other hierarchies.

Schema Hierarchy:Schema hierarchy represents the total or partial

order between attributes in the database. It can define existing semantic
relationships between attributes. In a database, more than one schema
hierarchy can be generated by using multiple sequences and grouping of
attributes.

Operation-Derived Hierarchy: Operation-derived hierarchy is

represented by a set of operations on the data. These operations are
defined by users, professionals, or the datal mining system. These
hierarchies are usually represented for mathematical attributes. Such
operations can be as easy as range value comparison, as difficult as a
data clustering and data distribution analysis algorithm.

Rule-based Hierarchy: In a rule-based hierarchy either a whole concept

hierarchy or an allocation of it is represented by a set of rules and is
computed dynamically based on the current information and rule
definition. A lattice-like architecture is used for graphically defining this
type of hierarchy, in which each child-parent route is connected with a
generalization rule.
3 Tier Architecture

Data Warehouses usually have a three-level (tier) architecture that

includes:
● Bottom Tier (Data Warehouse Server)
● Middle Tier (OLAP Server)
● Top Tier (Front end Tools).

A bottom-tier that consists of the Data Warehouse server, which is

almost always an RDBMS. It may include several specialized data marts
and a metadata repository.

Data from operational databases and external sources (such as user

profile data provided by external consultants) are extracted using
application program interfaces called a gateway. A gateway is provided
by the underlying DBMS and allows customer programs to generate
SQL code to be executed at a server.

A middle-tier which consists of an OLAP server for fast querying of the

data warehouse.

The OLAP server is implemented using either.

(1) A Relational OLAP (ROLAP) model, le, an extended relational DBMS

that maps functions on multidimensional data to standard relational
operations.

(2) A Multidimensional OLAP (MOLAP) model, le, a particular purpose

server that directly imolamente multidimensional information and
operations.

A top-tier that contains front-end tools for displaying results provided by

OLAP, as well as additional tools for data mining of the OLAP-generated
data.
ETL:

ETL is a process in Data Warehousing and it stands for Extract,

Transform and Load. It is a process in which an ETL tool extracts the
data from various data source systems, transforms it in the staging area,
and then finally, loads it into the Data Warehouse system.

Extraction:

The first step of the ETL process is extraction. In this step, data from
various source systems is extracted which can be in various formats like
relational databases, No SQL, XML, and flat files into the staging area. It
is important to extract the data from various source systems and store it
into the staging area first and not directly into the data warehouse
because the extracted data is in various formats and can be corrupted
also. Hence loading it directly into the data warehouse may damage it
and rollback will be much more difficult. Therefore, this is one of the
most important steps of ETL process.

Transformation:

The second step of the ETL process is transformation. In this step, a set
of rules or functions are applied on the extracted data to convert it into a
single standard format. It may involve following processes/tasks:
● Filtering: Loading only certain attributes into the data warehouse.
● Cleaning: Filling up the NULL values with some default values,
mapping U.S.A, United States, and America into USA, etc.
● Joining: Joining multiple attributes into one.
● Splitting: Splitting a single attribute into multiple attributes.
● Sorting: Sorting tuples on the basis of some attribute (generally
key- attribute).

Loading:

The third and final step of the ETL process is loading. In this step, the
transformed data is finally loaded into the data warehouse. Sometimes
the data is updated by loading into the data warehouse very frequently
and sometimes it is done after longer but regular intervals. The rate and
period of loading solely depends on the requirements and varies from
system to system.

Data Marting

A Data Mart is focused on a single functional area of an organization and

contains a subset of data stored in a Data Warehouse. A Data Mart is a
condensed version of Data Warehouse and is designed for use by a
specific department, unit or set of users in an organization. E.g.,
Marketing, Sales, HR or finance. It is often controlled by a single
department in an organization

Data Mart usually draws data from only a few sources compared to a
Data warehouse. Data marts are small in size and are more flexible
compared to a Datawarehouse.

There are three main types of data mart:

Dependent: Dependent data marts are created by drawing data directly

from operational, external or both sources.

Independent: Independent data mart is created without the use of a

central data warehouse

Hybrid: This type of data marts can take data from data warehouses or
operational systems.

Use of Data warehousing in Current Industry Scenario

Finance:

The application of data warehousing in the financial industry is the same

as in the banking sector. The right solution helps the financing industry
analyze customer expenses that enable them to outline better strategies
to maximize profits at both ends.
Banking:

With the perfect Data Warehousing solution, bankers can manage all
their available resources more effectively. They can better analyze their
consumer data, government regulations, and market trends to facilitate
better decision-making.

Education:

The educational sector requires data warehousing to have a

comprehensive view of their students' and faculty data. It provides
educational institutions access to real-time data feeds to make valued
and informed decisions.

Manufacturing & Distribution:

With an effective data warehousing solution, organizations in the

manufacturing & distribution sector can organize all their data under one
roof and predict market changes, analyze the latest trends, view
development areas, and finally can make result-driven decisions.

Healthcare:

Another critical use of data warehouses is in the Healthcare sector. All

the clinical, financial, and employee data are stored in the warehouse,
and analysis is run to derive valuable insights to strategize resources in
the best way possible.

Insurance:

In the Insurance sector, data warehousing is required to maintain

existing customers' records and analyze the same to up see client trends
to bring more footsteps towards the business.
Services:

In the services sector, data warehousing is used for maintaining

customer details, financial records, and resources to analyze patterns
and boost decision-making for positive outcomes.

Retailing:

Retailers are the mediators between wholesalers and end customers,

and that's why it is necessary for them to maintain the records of both
parties. For helping them store data in an organized manner, the
application of data warehousing comes into the frame.

MB760 - MB770 - MPS5502 - Maintenenance Manual - 1 - of - 2 - R10 PDF
100% (1)
MB760 - MB770 - MPS5502 - Maintenenance Manual - 1 - of - 2 - R10 PDF
181 pages
Unit 1 - Data Mining and Warehousing - WWW - Rgpvnotes.in
No ratings yet
Unit 1 - Data Mining and Warehousing - WWW - Rgpvnotes.in
16 pages
Unit - 1 Introduction To Data Warehousing
No ratings yet
Unit - 1 Introduction To Data Warehousing
57 pages
Learn Data Warehousing in 24 Hours
From Everand
Learn Data Warehousing in 24 Hours
Alex Nordeen
No ratings yet
Learn SAP BI in 24 Hours
From Everand
Learn SAP BI in 24 Hours
Alex Nordeen
3/5 (1)
دليل المعلم Mega Goal 3
No ratings yet
دليل المعلم Mega Goal 3
257 pages
Data Warehousing
No ratings yet
Data Warehousing
77 pages
Unit 1 (DWDM).docx
No ratings yet
Unit 1 (DWDM).docx
50 pages
Data Warehouse
No ratings yet
Data Warehouse
109 pages
Data and AI - Data Warehousing
No ratings yet
Data and AI - Data Warehousing
58 pages
Notes DWDM
No ratings yet
Notes DWDM
12 pages
Advance Concept in Data Bases Unit-5 by Arun Pratap Singh
100% (1)
Advance Concept in Data Bases Unit-5 by Arun Pratap Singh
82 pages
R16 4-2 DataMining Notes UNIT-I
No ratings yet
R16 4-2 DataMining Notes UNIT-I
31 pages
Data Warehousing&Data Mining
No ratings yet
Data Warehousing&Data Mining
170 pages
DWH overview
No ratings yet
DWH overview
57 pages
Chapter 1
No ratings yet
Chapter 1
22 pages
Unit 1 (DWDM)
No ratings yet
Unit 1 (DWDM)
51 pages
Data Warehousing Quick Guide
No ratings yet
Data Warehousing Quick Guide
66 pages
data_20warehouse_20week_201_281_29
No ratings yet
data_20warehouse_20week_201_281_29
27 pages
Lecture 3
No ratings yet
Lecture 3
49 pages
Data Warehouse Concepts
No ratings yet
Data Warehouse Concepts
53 pages
Data Warehousing Quick Guide
No ratings yet
Data Warehousing Quick Guide
43 pages
Km-secB
No ratings yet
Km-secB
18 pages
DMW Unit 1
No ratings yet
DMW Unit 1
56 pages
What Is Data Warehouse
No ratings yet
What Is Data Warehouse
19 pages
Data Mining and Warehousing: Kapil Sharma
No ratings yet
Data Mining and Warehousing: Kapil Sharma
55 pages
Send18 Whiteboard: o o o o o
No ratings yet
Send18 Whiteboard: o o o o o
74 pages
Unit 1 (DWDM)
No ratings yet
Unit 1 (DWDM)
52 pages
Module 3
No ratings yet
Module 3
17 pages
DWH Fundamentals (Training Material)
No ratings yet
DWH Fundamentals (Training Material)
21 pages
Data Warehousing: Understanding A Data Warehouse
No ratings yet
Data Warehousing: Understanding A Data Warehouse
4 pages
Unit 2 Data Warehousing and OLAP
No ratings yet
Unit 2 Data Warehousing and OLAP
72 pages
CS2032 Data Warehousing and Data Mining PPT Unit I
No ratings yet
CS2032 Data Warehousing and Data Mining PPT Unit I
88 pages
1a Ravi
No ratings yet
1a Ravi
37 pages
Data Mining
No ratings yet
Data Mining
65 pages
Chapter 2
No ratings yet
Chapter 2
44 pages
Data Warehouse: Meaning, Features, Applications, Architecture, Functions, Terminology
No ratings yet
Data Warehouse: Meaning, Features, Applications, Architecture, Functions, Terminology
13 pages
Data Warehousing
No ratings yet
Data Warehousing
77 pages
Data Warehouse
No ratings yet
Data Warehouse
16 pages
Unit 1
No ratings yet
Unit 1
22 pages
Presented By: Nirmalya Fadikar B.E. Information Technology
No ratings yet
Presented By: Nirmalya Fadikar B.E. Information Technology
8 pages
Module 3 Introduction To Data Warehouse
No ratings yet
Module 3 Introduction To Data Warehouse
34 pages
Data Warehousing - Quick Guide - Tutorialspoint
No ratings yet
Data Warehousing - Quick Guide - Tutorialspoint
67 pages
DWM UNIT-I NOTES
No ratings yet
DWM UNIT-I NOTES
9 pages
DATA WAREHOUSE
No ratings yet
DATA WAREHOUSE
143 pages
data Lake and warehouse technique
No ratings yet
data Lake and warehouse technique
14 pages
DWDM (1)
No ratings yet
DWDM (1)
61 pages
Unit 6 Data Warehousing
No ratings yet
Unit 6 Data Warehousing
40 pages
Assignment 531
No ratings yet
Assignment 531
2 pages
Lect 14 DM
No ratings yet
Lect 14 DM
33 pages
Data Mining
No ratings yet
Data Mining
142 pages
Data Warehouse Definition: - Users and System Orientation
No ratings yet
Data Warehouse Definition: - Users and System Orientation
6 pages
Data Warehouse and Data Mining Notes
No ratings yet
Data Warehouse and Data Mining Notes
31 pages
Data Warehousing Fundamentals
No ratings yet
Data Warehousing Fundamentals
108 pages
Module 1 DMDW
No ratings yet
Module 1 DMDW
64 pages
unit one
No ratings yet
unit one
41 pages
Data Warehousing
No ratings yet
Data Warehousing
10 pages
Warehousing
No ratings yet
Warehousing
15 pages
DWHDM_22CSE120__MODULE-1
No ratings yet
DWHDM_22CSE120__MODULE-1
45 pages
DWH Week 03
No ratings yet
DWH Week 03
17 pages
DWDM Notes - Final
No ratings yet
DWDM Notes - Final
46 pages
Database Management System
From Everand
Database Management System
Manish Soni
No ratings yet
Buy People Link 1 Microphone with Speaker online _ Government e Marketplace (GeM)
No ratings yet
Buy People Link 1 Microphone with Speaker online _ Government e Marketplace (GeM)
7 pages
B.flt.8009 Technical Assessment
No ratings yet
B.flt.8009 Technical Assessment
75 pages
Examen de Certificación Práctica CCENT Nº1
No ratings yet
Examen de Certificación Práctica CCENT Nº1
19 pages
Project Report ET3491
No ratings yet
Project Report ET3491
18 pages
Case Study Presentation
No ratings yet
Case Study Presentation
21 pages
Arm Architecture
No ratings yet
Arm Architecture
6 pages
(Ebook) VLSI design by Das, Debaprasad ISBN 9780198094869, 9781680158717, 0198094868, 1680158716 - Download the ebook now for instant access to all chapters
No ratings yet
(Ebook) VLSI design by Das, Debaprasad ISBN 9780198094869, 9781680158717, 0198094868, 1680158716 - Download the ebook now for instant access to all chapters
51 pages
Minutes of Meeting Held on 21.01.2025
No ratings yet
Minutes of Meeting Held on 21.01.2025
5 pages
Unit 4 Iap
No ratings yet
Unit 4 Iap
26 pages
Bca Jan 2024 Paper Solution c Lan
No ratings yet
Bca Jan 2024 Paper Solution c Lan
24 pages
Keywords: Registration System, Development
100% (1)
Keywords: Registration System, Development
31 pages
EDIFACT
No ratings yet
EDIFACT
71 pages
Samsung Mm-E320 SCH
No ratings yet
Samsung Mm-E320 SCH
8 pages
ICMP Lab 2
No ratings yet
ICMP Lab 2
3 pages
Kiran Abinitio
No ratings yet
Kiran Abinitio
66 pages
Uji Turunan Kedua
No ratings yet
Uji Turunan Kedua
2 pages
Hexagon Analysis Solutions
No ratings yet
Hexagon Analysis Solutions
7 pages
Computer-Aided_Design_in_the_United_States_19491984_Designing_in_a_Closed_World
No ratings yet
Computer-Aided_Design_in_the_United_States_19491984_Designing_in_a_Closed_World
17 pages
Transit Capacity & Quality of Service Manual, Third Edition: Multimodal Transit LOS Computational Engine
No ratings yet
Transit Capacity & Quality of Service Manual, Third Edition: Multimodal Transit LOS Computational Engine
13 pages
POWER9 Enterprise Servers Level 2
No ratings yet
POWER9 Enterprise Servers Level 2
9 pages
Module 5-Os
No ratings yet
Module 5-Os
25 pages
Apex Ch10c1 Chassis At2408s Ch04t1002 Om8839ps Tda4605 TV SM
No ratings yet
Apex Ch10c1 Chassis At2408s Ch04t1002 Om8839ps Tda4605 TV SM
61 pages
Primo Back Office Guide
No ratings yet
Primo Back Office Guide
616 pages
Codigo Ficheros PHP
No ratings yet
Codigo Ficheros PHP
33 pages
ATMdesk Field Setup Manual
No ratings yet
ATMdesk Field Setup Manual
29 pages
SWOT Analysis of Samsung Corporation LTD
No ratings yet
SWOT Analysis of Samsung Corporation LTD
5 pages
Perbandingan Lampu Induksi LVD Dan Lampu LED
No ratings yet
Perbandingan Lampu Induksi LVD Dan Lampu LED
5 pages
Unite 6: Exercise 1 A Chipped
No ratings yet
Unite 6: Exercise 1 A Chipped
6 pages