OLAP

Revision
OLAP
1
Objectives
OLTP
OLTP Applications, benefits
OLTP benchmarks
Data partitioning in OLTP
Comparison between OLTP and OLAP
Multi-Dimensional Data Model: Data Cube
OLAP types and operations
Data modeling: Star and Snowflake schema
Denormalization
2
OLTP
Online transaction processing, or OLTP,
is a class of information systems that
facilitate and manage transaction-oriented
applications, typically for data entry and
retrieval transaction processing.
OLTP has also been used to refer to
processing in which the system responds
immediately to user requests.
3
OLTP
Online transaction processing (OLTP) involves
gathering input information, processing the
information and updating existing information to
reflect the gathered and processed information.
Most organizations use a database management
system to support OLTP
OLTP is carried in a client server system
On line transaction process concerns about
concurrency and atomicity
4
OLTP applications
Online transaction processing applications are high
throughput and insert or update-intensive in
database management.
An automated teller machine (ATM) for a bank is
an example of a commercial transaction processing
application.
These applications are used concurrently by
hundreds of users. The key goals of OLTP
applications are availability, speed, concurrency
and recoverability.
Online banking is completely based on online
transaction processing systems.
5
RDBMS used for OLTP
Database Systems have been used traditionally
for OLTP
◦ clerical data processing tasks
◦ detailed, up to date data
◦ structured repetitive tasks
◦ read/update a few records
◦ isolation, recovery and integrity are critical
The data warehouse and the OLTP data base are
both relational databases. However, the
objectives of both these databases are different.
6
Online transaction processing systems (Advantages)
Easy and best solution for online shoppers.

These systems are most efficient and have excellent
response times.
Very easy to use; as simple as fill a form and the rest
will be taken care of by the web and database
servers.
Credit cards are also well-handled by these systems.
You can access anything on the web and choose to
buy it because all financial transactions methods are
supported by these systems.
7
Online transaction processing systems
(Disadvantages)
At times, there occur millions and millions of requests at a time which gets
difficult to handle.
During purchases even if the servers hang for few seconds a large number of
transactions get effected, in turn effecting the organizations reputation.
Databases store all user data and account information, if these servers are
hacked, it could lead to financial and personal problems (theft).
In case of hardware failures of the online transaction processing systems,
visitors of a website get in trouble and their online transactions get effected.
Online transaction processing involves a lot of staff working in groups to
maintain inventory.
These online transaction systems impose processing costs on the buyers and
sellers as well.
The fundamental of operation of online transaction systems is atomicity.
Atomicity ensures that if any step fails in the process of a transaction, the entire
transaction must fail, due to which the same steps have to be repeated again and
again while filling forms which cause dissatisfaction among buyers.
Electricity problem is another issue, i.e. if there is a shortage in electric supply
additional backup facilities like generators and related hardware is a must. 8
OLTP benchmarks
The Transaction Processing Performance
Council (TPC) is the benchmark to measure
the performance and price/performance of
transaction processing systems
9
TPC–C benchmark
The term transaction is often applied to a wide variety of
business and computer functions
 A transaction could refer to a set of operations including
disk read/writes, operating system calls, or some form of
data transfer from one subsystem to another
TCP-C is a mixture of read-only and update intensive
transactions that simulate the activities found in complex
OLTP application environments.
A typical transaction, as defined by the TPC, would
include the updating to a database system for such things
as inventory control (goods), airline reservations
(services), or banking (money).
10
TPC–C benchmark
In these environments, a number of customers
or service representatives input and manage
their transactions via a terminal or desktop
computer connected to a database.
Typically, the TPC produces benchmarks that
measure transaction processing (TP) and
database (DB) performance in terms of how
many transactions a given system and database
can perform per unit of time, e.g., transactions
per second or transactions per minute.
11
TPC-C Benchmark Bench Example
Workload consists of five OLTP transaction types.
New Order - Enter new order from customer. (45%)
Payment – update customer balance to reflect a payment. (43%)
 Delivery – deliver orders.(4%)
 The Delivery business transaction consists of processing a batch
of 10 new (not yet delivered) orders.
Order Status- retrieve status of customers most recent order.
(4%)
Stock – monitor warehouse inventory. (4%)
The Stock-Level business transaction determines the number of
recently sold items that have a stock level below a specified
threshold
12
Data partitioning in OLTP
Scalability – is the property of system which can
accommodate changes in transaction volume without
affecting the performance.
Partitioning is a common technique used for scaling
databases, particularly for scaling
updates, by distributing the partitions across a cluster
of nodes,and routing the writes to their respective
partitions.
Data Partitioning is also the process of logically
and/or physically partitioning data into segments that
are more easily maintained or accessed.
13
Different partitioning strategies
Vertical partitioning
Horizontal partitioning
◦ Range partition
◦ Hash partition
◦ List partition
Vertical Partitioning
Resumes SSN Name Address Resume Picture
234234 Mary Huston Clob1… Blob1…
345345 Sue Seattle Clob2… Blob2…
345343 Joan Seattle Clob3… Blob3…
234234 Ann Portland Clob4… Blob4…
T1 T2 T3
SSN Name Address SSN Resume SSN Picture
234234 Mary Huston 234234 Clob1… 234234 Blob1…
345345 Sue Seattle 345345 Clob2… 345345 Blob2…
...
15
Horizontal Partitioning
Customers
CustomersInHouston
SSN Name City Country
234234 Mary Houston USA
234234 Mary Houston USA
345345 Sue Seattle USA CustomersInSeattle
345343 Joan Seattle USA SSN Name City Country
234234 Ann Portland USA 345345 Sue Seattle USA
-- Frank Calgary Canada 345343 Joan Seattle USA
-- Jean Montreal Canada
CustomersInCanada
-- Frank Calgary Canada
-- Jean Montreal Canada
16
Types of Horizontal Partitioning
17
Range partitioning
Range partitioning maps data to partitions
based on ranges of values of the partitioning
key that you establish for each partition. It
is the most common type of partitioning and
is often used with dates. For a table with a
date column as the partitioning key, the
January-2005 partition would contain rows
with partitioning key values from 01-Jan-
2005 to 31-Jan-2005.
18
List partitioning
List partitioning enables you to explicitly
control how rows map to partitions by specifying
a list of discrete values for the partitioning key in
the description for each partition.
E.g. a warehouse table containing sales summary
data by product, state, and month/year could be
partitioned into geographic regions.
 The advantage of list partitioning is that you
can group and organize unordered and
unrelated sets of data in a natural way.
19
Hash partitioning
Hash partitioning maps data to partitions based on a
hashing algorithm that Oracle applies to the
partitioning key that you identify. The hashing
algorithm evenly distributes rows among partitions,
giving partitions approximately the same size.
Hash partitioning is the ideal method for distributing
data evenly across devices. Hash partitioning is also
an easy-to-use alternative to range partitioning,
especially when the data to be partitioned is not
historical or has no obvious partitioning key.
20
Online Analytical Processing(OLAP)
 OLAP is a category of software tools that provides analysis
of data stored in a database.
 OLAP is a category of applications and technologies for
collecting, managing, processing, and presenting
multidimensional data for analysis and management
purposes.
 OLAP tools allow the user to query, browse, and summarize
information in a very efficient, interactive, and dynamic way.
Product
Data
Warehouse
Region
Time
Online analytical processing (OLAP)
Multidimensional data analysis
◦ 3-D graphics, Pivot Tables, Crosstabs, etc.
◦ Compatible with Spreadsheets & Statistical packages
◦ Advanced Data Presentation Functions
Advanced Database Support
◦ Access to many kinds of DBMS’s, flat files, and
internal and external data sources
◦ Support for Very Large Databases
◦ Advanced data navigation
Easy-to-use end-user interfaces
Support Client/Server architecture
22
Online Analytical Processing (OLAP)
 A widely adopted definition for OLAP used today in five key words is:
Fast Analysis of Shared Multidimensional Information (FASMI).
 Fast refers to the speed that an OLAP system is able to deliver most
responses to the end user.
 Analysis refers to the ability of an OLAP system to manage any business
logic and statistical analysis relevant for the application and user. In
addition, the system must allow users to define new ad hoc calculations as
part of the analysis and report without having to program them.
 Shared refers to the ability of an OLAP system being able to implement all
security requirements necessary for confidentiality and the concurrent
update locking at an appropriate level when multiple write access is
required.
 Multidimensional refers an OLAP system must provide a multidimensional
view of data. This includes supporting hierarchies and multiple hierarchies.
 Information refers to all of the data and derived data needed, wherever the
data resides and however much of the data is relevant for the application.
Online Analytical Processing
(OLAP)
Implemented in a multi-user client/server
mode
Offers consistently rapid response to
queries, regardless of database size and
complexity
OLAP helps user to synthesize enterprise
information and analyze historical data
Operational v/s Information System
Features Operational Information

(OLTP) (OLAP)
Characteristics Operational processing Informational processing
Orientation Transaction Analysis
User Clerk,DBA,database Knowledge workers
professional
Function Day to day operation Decision support
Data Current Historical
View Detailed,flat relational Summarized,
multidimensional
DB design Application oriented Subject oriented
Unit of work Short ,simple transaction Complex query
Access Read/write Mostly read
Operational v/s Information System
Features Operational Information

(OLTP) (OLAP)
Focus Data in Information out
Number of records Tens, hundreds millions
accessed
Number of users thousands hundreds
DB size 100MB to GB 100 GB to TB
Priority High performance,high High flexibility, end-
availability user autonomy
Metric Transaction throughput Query throughput
OLTP vs. OLAP
OLTP OLAP
users clerk, IT professional knowledge worker
function day to day operations decision support
DB design application-oriented subject-oriented
data current, up-to-date historical,
detailed, flat relational summarized, multidimensional
isolated integrated, consolidated
usage repetitive ad-hoc
access read/write lots of scans
index/hash on prim. key
unit of work short, simple transaction complex query
# records accessed tens millions
#users thousands hundreds
DB size 100MB-GB 100GB-TB
metric transaction throughput query throughput, response
12/16/20 Data Mining: Concepts and Techniques 27

OLTP vs. OLAP
1.OLTP systems require high concurrency, reliability, locking which provide
good performance for short and simple OLTP queries.
An OLAP query is very complex and does not require these properties. Use of
OLAP query on OLTP system degrades its performance.
2.An OLAP query reads HUGE amount of data and generates the required result.
The query is very complex too. Thus special primitives have to provided to
support this kind of data access.
3.OLAP systems access historical data and not current volatile data while OLTP
systems access current up-to-date data and do not need historical data.
Multi-Dimensional Data Model: Data Cube
Multidimensional data model views data in the form
of a data cube
A data cube allows data to be modeled and viewed in
multiple dimensions
Dimensions are entities with respect to which an
organization wants to keep records such as time,
item, branch, location etc
◦ Dimension table gives further descriptions about
dimension e.g. time (day, week, month, year etc)
◦ Fact table contains measures and keys to each of the
related dimensions tables e.g. dollars sold
A cube is a visual representation of a

multidimensional table and has just three
dimensions: rows, columns and layers.
OLAP databases are referred often as “cubes”
since they have a multidimensional nature
OLAP cubes are easy to create and manipulate
Users can have multiple cubes for their business
data: one cube for customers, one for sales, one
for production, one for geography, etc.
Data cube
This multi-dimensional data can be represented using a data
cube as shown below.
This figure shows a 3-Dimensional dataModel.
X –Dimension : Item type
Y –Dimension : Time/Period
Z –Dimension : Location
Each cell represents the items sold of type ‘x’, in location ‘z’
during the quarter ‘y’.
This is easily visualized as Dimensions are 3.
What if we want to represent the store where it was sold too?
We can add more dimensions. This makes representation
complex.
Data cube is thus a n -dimensional data mode
32
Data cube
Salesvolume as a function of product, month, and
region
Dimensions: Product, Location, Time
33
OLAP Operations
OLAP provides a user-friendly environment for
interactive data analysis.
A number of OLAP data cube operations exist to
materialize different views of data, allowing interactive
querying and analysis of the data.
The most popular end user operations on dimensional
data are:
Roll-up
Drill-down
Slice and dice
Pivot (rotate)
34
Drill Up (Roll up)
Roll-up performs aggregation on a data cube in any of the
following ways:
By climbing up a concept hierarchy for a dimension
By dimension reduction
Product
Category e.g Electrical Appliance
Sub Category e.g Kitchen

Region
Product e.g Toaster
Time
Drill Up (Roll up)
Roll-up is performed by climbing up a concept
hierarchy for the dimension location.
Initially the concept hierarchy was "street < city <
province < country".
On rolling up, the data is aggregated by ascending
the location hierarchy from the level of city to the
level of country.
The data is grouped into cities rather than countries.
When roll-up is performed, one or more dimensions
from the data cube are removed.
36
Drill Up (Roll up)
37
Drill Down (roll down)
Drill-down is the reverse operation of roll-up. It is performed by either
of the following ways:
By stepping down a concept hierarchy for a dimension
By introducing a new dimension.
Product
Category e.g Electrical Appliance
Sub Category e.g Kitchen

Region
Product e.g Toaster
Time
Drill-down is performed by stepping down a
concept hierarchy for the dimension time.
Initially the concept hierarchy was "day < month <
quarter < year."
On drilling down, the time dimension is descended
from the level of quarter to the level of month.
When drill-down is performed, one or more
dimensions from the data cube are added.
It navigates the data from less detailed data to
highly detailed data.
39
The result of a drill-down
operation performed on the
central cube by stepping down a
concept hierarchy for
temperature can be defined as
week- -day--cool. Drill-down
occurs by descending the time
hierarchy from the level of week
to the more detailed level of day.
Also new dimensions can be
added to the cube, because drill-
down adds more detail to the
given data.
40
Slice
The slice operation is based on selecting one dimension and focusing on a
portion of a cube
It will form a new sub-cube by selecting one or more dimensions.
Product
Product=Toaster
Region
Region
Time
Time
Slice
Slice performs a
selection on one
dimension of the given
cube, thus resulting in
a subcube. For
example, in the cube
example above, if we
make the selection,
temperature=cool we
will obtain the
following cube:
42
Dice
The dice operation creates a sub-cube by
focusing on two or more dimensions.
Dice selects two or more dimensions from a
given cube and provides a new sub-cube.
The dice operation on the cube based on the
following selection criteria involves three
dimensions.
(location = "Toronto" or "Vancouver")
(time = "Q1" or "Q2")
(item =" Mobile" or "Modem")
43
Dice
The dice operation defines
a subcube by performing
a selection on two or more
dimensions. For example,
applying the selection
(time = day 3 OR time =
day 4) AND (temperature
= cool OR temperature =
hot) to the original cube
we get the following
subcube (still two-
dimensional):
44
Pivot (rotate)
- reorient the cube, visualization, 3D to series of 2 D planes
Pivoting, or rotation, changes the perspective in presenting the data to the user
Product
Product
Region
Time
Region
Time
Pivot (rotate)
Pivot otherwise known as Rotate changes
the dimensional orientation of the cube,
i.e. rotates the data axes to view the data
from different perspectives.
Pivot groups data with different
dimensions.
46
Pivot (rotate)
47
OLAP Operations
Presentation
Product
Reporting
Tool
Region
Report
Time
Data Warehouse Schema
Data Warehouse environment usually transforms
the relational data model into some special
architectures.
Each Schema has a Fact table that stores all the
facts about the subject/measure.
Each fact is associated with multiple dimension
keys that are linked to Dimension Tables.
The most commonly used Data Warehouse
Schemas are:
Data Warehouse Schema
Star Schema
◦ Single Fact table with n –Dimension tables
linked to it.
Snowflake Schema
◦ Single Fact table with n-Dimension tables
organized as a hierarchy.
Fact Constellation Schema
◦ Multiple Facts table sharing dimension tables.
51
Star Schema
A fact table in the middle connected to a set of
dimension tables
A single, large and central fact table and one
table for each dimension.
Every fact points to one tuple in each of the
dimensions and has additional attributes.
Usually the fact tables in a star schema are in
third normal form(3NF) whereas dimensional
tables are de-normalized.
The star schema is the simplest architecture, it is
most commonly used nowadays and is
recommended by Oracle.
Star Schema
Fact Tables
A fact table typically has two types of columns:
foreign keys to dimension tables and measures
those that contain numeric facts.
Dimension Tables
A dimension is a structure usually composed of one
or more hierarchies that categorizes data. The
primary keys of each of the dimension tables are
part of the composite primary key of the fact table.
Dimension tables are generally small in size than
fact table.
53
Star Schema Example
Store Dimension Fact Table Time Dimension
Store Key Store Key Period Key
Store Name Product Key Year
City Period Key
Quarter
Units
State Month
Price
Region
Product Key
Product Desc
Product Dimension
Benefits: Easy to understand, easy to define hierarchies, reduces no. of physical

joins.
SnowFlake Schema
Variant of star schema model.
A single, large and central fact table and one
or more tables for each dimension.
Dimension tables are normalized i.e. split
dimension table data into additional tables
"Snowflaking" is a method of normalising
the dimension tables in a star schema.
SnowFlake Schema Example
Store Dimension Fact Table Time Dimension

Store Key Period Key
Store Key
Product Key Year
Store Name Period Key
Quarter
City Key Units
Month
Price
City Dimension
City Key
Product Key
City
Product Desc
State
Region Product Dimension
Drawbacks: Time consuming joins, report generation slow

SnowFlake Schema Example
58
Fact Constellation Schema
Multiple fact tables share dimension tables.

This schema is viewed as collection of stars
hence called galaxy schema or fact constellation.
For each star schema it is possible to construct
fact constellation schema
for example by splitting the original star schema
into more star schemes each of them describes
facts on another level of dimension hierarchies
Sophisticated application requires such schema.
Fact Constellation Example
Sales Shipping
Fact Table Fact Table
Store Key Product Dimension
Shipper Key
Product Key Product Key Store Key
Period Key Product Desc Product Key
Units
Period Key
Price
Units
Price
Store Dimension
Store Key
Store Name
City
State
Region
Fact Constellation Example
62
Fact Constellation schema
The main shortcoming of the fact

constellation schema is a more complicated
design because many variants for particular
kinds of aggregation must be considered and
selected.
Moreover, dimension tables are still large.
64
Concept hierarchy
A concept hierarchy defines a sequence of
mapping from a set of low-level concepts to
higher level, more general concepts
A Concept Hierarchy example
all all
region Europe ... North_America
country Germany ... Spain Canada ... Mexico
city Frankfurt ... Vancouver ... Toronto
office L. Chan ... M. Wind
66
Case Study
XYZ Foods & Beverages is a new company
which produces dairy, bread and meat products
with production unit located at Baroda.
There products are sold in North, North West
and Western region of India.
They have sales units at Mumbai, Pune ,
Ahemdabad ,Delhi and Baroda.
The President of the company wants sales
information.
Sales Information
Report: The number of units sold in first quarter
113
Report: The number of units sold over time (per month)
January February March April

14 41 33 25
Sales Information
Report : The number of items sold for each product with time
Jan Feb Mar Apr

Wheat Bread 6 17
Cheese 6 16 6 8
Time
Swiss Rolls 8 25 21
Product
Sales Information
Report: The number of items sold in each City for each
product with time
Jan Feb Mar Apr

Mumbai Wheat Bread 3 10 City
Cheese 3 16 6
Swiss Rolls 4 16 6
Time
Pune Wheat Bread 3 7
Cheese 3 8
Product
Swiss Rolls 4 9 15
Sales Information
Report: The number of items sold and income in each region for
each product with time.
Jan Feb Mar Apr

Rs U Rs U Rs U Rs U
Mumbai Wheat Bread 7.44 3 24.80 10
Cheese 7.95 3 42.40 16 15.90 6
Swiss Rolls 7.32 4 29.98 16 10.98 6
Pune Wheat Bread 7.44 3 17.36 7
Cheese 7.95 3 21.20 8
Swiss Rolls 7.32 4 16.47 9 27.45 15
Sales Measures & Dimensions
Measure – Units sold, Amount.
Dimensions – Product, Time, Region.
Sales Data Warehouse Model
Fact Table
City Product Month Units Rupees
Mumbai Cheese January 3 7.95
Mumbai Swiss Rolls January 4 7.32
Pune Cheese January 3 7.95
Pune Swiss Rolls January 4 7.32
Mumbai Cheese February 16 42.40
Fact Table
City_ID Prod_ID Time_ID Units Rupees

1 589 1/1/1998 3 7.95
1 1218 1/1/1998 4 7.32
2 589 1/1/1998 3 7.95
2 1218 1/1/1998 4 7.32
1 589 2/1/1998 16 42.40
Product Dimension Tables
Prod_ID Product_Name Product_Category_ID
589 Cheese 1
590 Wheat Bread 2
288 Coconut Cookies 3
1218 Swiss Roll 2
Product_Category_Id Product_Category
1 Milk
2 Bread
3 Cookies
Region Dimension Table
City_ID City Region Country
1 Mumbai West India

2 Pune NorthWest India
Time
Product
Sales Fact Product
Category
Region
Sales Data Warehouse Model: Snowflake
Schema
78
City Product Time Units Dollars

All (M+P) All Qtr 1 113 251.26
(Cheese+
Wheat
Bread+
Swiss Roll)
Mumbai All All 64 146.07
Mumbai Cheese All 38 66.25
Mumbai Wheat Qtr1 13 32.24
Bread
Mumbai Wheat March 3 7.44
Bread
Assignment 1
Suppose that a data warehouse consists of the three
dimensions time, doctor, and patient, and the two measures
count and charge, where charge is the fee that a doctor
charges a patient for a visit.
1. Enumerate three classes of schemas that are popularly used
for modeling data warehouses.
2. Draw a schema diagram for the above data warehouse using
one of the schema classes listed in (1).
3. Starting with the base cube [day, doctor, patient], what
specific OLAP operations should be performed in order to list
the total fee collected by each doctor in 2004?
4. To obtain the same list, write an SQL query assuming the data
are stored in a relational database with the schema fee (day,
month, year, doctor, hospital, patient, count, charge).
80
Solution
1. roll up from day to month to year
2. slice for year = “2004”
3. roll up on patient from individual patient to all
Select doctor, Sum(charge) From fee Where year = 2004 Group by doctor;
81
Assignment 2
Design a data warehouse for a regional
weather bureau. The weather bureau has
about 1,000 probes, which are scattered
throughout various land and ocean locations
in the region to collect basic weather data,
including air pressure, temperature, and
precipitation at each hour. All data are sent
to the central station, which has collected
such data for over 10 years.
82
Assignment 2 solution
Since the weather bureau has about 1,000 probes scattered throughout various
land and ocean locations, we need to construct a spatial data warehouse so that a
user can view weather patterns on a map by month,
by region, and by di®erent combinations of temperature and precipitation, and
can dynamically drill down or roll up along any dimension to explore desired
patterns.
83
Assignment 3
Suppose that a data warehouse for Big University consists of the
following four dimensions: student, course, semester, and
instructor, and two measures count and avg grade. When at the
lowest conceptual level (e.g., for a given student, course, semester,
and instructor combination), the avg grade measure stores the
actual course grade of the student. At higher conceptual levels, avg
grade stores the average grade for the given student.
 Draw a snowflake schema diagram for the data warehouse.
What specific OLAP operations should one perform in order to list
the average grade of CS courses for each Big University student?
To obtain the same list, write an SQL query assuming the data are
stored in a relational database with the schema big_university
(student, course, department, semester, instructor, grade).
84
OLAP Server
In order to offer consistent, rapid response to queries
(regardless of database size and complexity), OLAP
needs to be implemented in a multi-user client/server
mode.
An OLAP Server is a high-capacity, multi-user data
manipulation engine specifically designed to support
and operate on multi-dimensional data structure.
The server design and data structure are optimized for
rapid ad-hoc information retrieval in any orientation.
Types of OLAP Servers:
◦ MOLAP server
◦ ROLAP server
◦ HOLAP server
Multidimensional OLAP (MOLAP
In MOLAP, data is stored in a multidimensional
cube and not in the relational database
It uses specialized data structures to organize,
navigate and analyze data
It uses array technology and efficient storage
techniques that minimize the disk space
requirements.
MOLAP differs significantly in that (in some
software) it requires the pre-computation and
storage of information in the cube — the operation
known as processing.
Multidimensional OLAP (MOLAP)
Advantages:
Excellent performance: MOLAP cubes are built for fast data
retrieval, and is optimal for slicing and dicing operations.
It uses array technology and efficient storage techniques that
minimize the disk space requirements.
Can perform complex calculations: All calculations have been
pre-generated when the cube is created. Hence, complex
calculations are not only doable, but they return quickly.
MOLAP example:
Analysis and budgeting in a financial department
Sales analysis
Multidimensional OLAP (MOLAP)
Disadvantages:
Only a limited amount of data can be efficiently stored and
analyzed because all calculations are performed when the cube
is built, it is not possible to include a large amount of data in the
cube itself
Underlying data structures are limited in their ability to support
multiple subject areas and provide access to detailed data.
Storage, Navigation and analysis of data are limited because
the data is designed according to previously determined
requirements. Data may need to be physically reorganized to
optimally support new requirements.
Requires additional investment: Cube technology are often
proprietary and do not already exist in the organization.
Therefore, to adopt MOLAP technology, chances are additional
investments in human and capital resources are needed.
Relational OLAP (ROLAP)
ROLAP is form of online analytical processing
that performs dynamic multidimensional analysis
of data stored in a relational database rather than
in a multidimensional database.
 It is the fastest-growing type of OLAP tool
 It does not require the pre-computation and storage
of information.
 They stand between relational back-end server and
client front-end tools
 Advantages:
 Can handle large amounts of data: ROLAP itself places no
limitation on data amount.
 Can influence functionalities inherent in the relational database:
Often, relational database already comes with a host of
functionalities. ROLAP technologies, since they sit on top of the
relational database, can therefore leverage these functionalities.
ROLAP technology tends to have greater scalability than
MOLAP technology
 ROLAP Examples:
◦ Telecommunication startup: call data records (CDRs)
◦ ECommerce Site
◦ Credit Card Company
 Disadvantages:
 Performance can be slow: Because each ROLAP
report is essentially a SQL query (or multiple SQL
queries) in the relational database, the query time can
be long if the underlying data size is large.
 Limited by SQL functionalities: It is difficult to
perform complex calculations using SQL
 Development of middleware to facilitate the
development of multidimensional applications, that is
software that converts two dimensional relational into
multidimensional structure.
91
Hybrid OLAP (HOLAP)
 Combine ROLAP and MOLAP technology
 Allow large volumes of storing detailed data in RDBMS
and Storing aggregated data in MDBMS
 User access via MOLAP tools
 Best of both worlds greater data capacity of ROLAP
with superior processing capability of MOLAP
◦ Benefits from greater scalability of ROLAP
◦ Benefits from faster computation of MOLAP
 It stores data in both a relational database RDB and a
multidimensional database (MDD) and uses whichever
is suited to the type of processing desired
Hybrid OLAP (HOLAP)
 HOLAP tools deliver selected data directly from the DBMS

or via MOLAP server in the form of data cube, where it is
stored, analyzed, maintained locally
 Issues :
◦ The architecture results in significant data redundancy

and may cause problems for networks that support many
users
◦ Hybrid OLAP tools provide limited analysis capability
◦ Only a limited amount of data can be efficiently
maintained
 HOLAP Examples:
◦ Sales department of a multi-national company
◦ Banks and Financial Service Providers
Denormalization
As the name indicates, denormalization is the

reverse process of normalization.
It is the controlled introduction of redundancy in to
the database design.
It helps to improve the query performance as the
number of joins could be reduced.
Denormalization is the process of trying to
improve the read performance of a database, at the
expense of losing some write performance, by
adding redundant copies of data or by grouping data
94
Denormalization
A normalized design will often store
different but related pieces of information in
separate logical tables (called relations).
If these relations are stored physically as
separate disk files, completing a database
query that draws information from several
relations (a join operation) can be slow.
If many relations are joined, it may be
prohibitively slow
95
Denormalization
Solution is to denoramlize tables
Data is included in one table from another in
order to eliminate the second table which
reduces the number of JOINS in a query and
thus achieves performance.
It’s important to point out that you don’t need to
use denormalization if there are no performance
issues in the application.
Before going with it, consider other options, like
query optimization and proper indexing.
96
Denormalization Example 1
Example 2 normalized model
98
Example 2 normalized model
The user_account table stores data about users who login into our
application
The client table contains some basic data about our clients.
The product table lists products offered to our clients.
The task table contains all the tasks we have created. each task as a set of
related actions towards clients. Each task has its related calls, meetings,
and lists of offered and sold products.
The call and meeting tables store data about all calls and meetings and
relates them with tasks and users.
The dictionaries task_outcome, meeting_outcome and call_outcome
contain all possible options for the final state of a task, meeting or call.
The product_offered stores a list of all products that were offered to
clients on certain tasks while product_sold contains a list of all the
products that client actually bought.
The supply_order table stores data about all orders we’ve placed and the
products_on_order table lists products and their quantity for specific
orders.
The writeoff table is a list of products that were written off due to 99
Denormalized model
100
Denormalized model: product
 The only change in the product table is the addition of the
units_in_stock attribute. In a normalized model we could
compute this data as units ordered – units sold – (units
offered) – units written off. We would repeat the calculation
each time a client asks for that product, which would be
extremely time consuming. Instead, we’ll compute the
value up front; when a customer asks us, we’ll have it
ready. Of course, this simplifies the select query a lot. On
the other hand, the units_in_stock attribute must be
adjusted after every insert, update, or delete in the
products_on_order, writeoff, product_offered and
product_sold tables.
101
Denormalized model : task
In the modified task table, we find two new
attributes: client_name and user_first_last_name.
Both of them store values when the task was
created. The reason is that both of these values can
change during time. We’ll also keep a foreign key
that relates them to the original client and user ID.
There are more values that we would like to store,
like client address, VAT ID, etc.
102
Denormalized model
The denormalized product_offered table has two new
attributes, price_per_unit and price. The price_per_unit
attribute is stored because we need to store the actual price
when the product was offered. The normalized model would
only show its current state, so when the product price changes
our ‘history’ prices would also change. Our change doesn’t just
make the database run faster: it also makes it work better. The
price attribute is the computed value units_sold *
price_per_unit. I added it here to avoid making that calculation
each time we want to take a look at a list of offered products.
It’s a small cost, but it improves performance.
The changes made on the product_sold table are very similar.
The table structure is the same, but it stores a list of sold items.
103
Denormalized model
104
Denormalized model
The statistics_per_year table is completely new to our model. We
should look at it as a denormalized table because all its data can be
computed from the other tables. The idea behind this table is to
store the number of tasks, successful tasks, meetings and calls
related to any given client. It also handles the sum total charged per
each year. After inserting, updating, or deleting anything in the
task, meeting, call and product_sold tables, we should recalculate
this table’s data for that client and corresponding year. We can
expect that we’ll mostly have changes only for the current year.
Reports for previous years shouldn’t need to change.
Values in this table are computed up front, so we’ll spend less time
and resources at the moment we need the calculation result.
105
When to use denormalization
Maintaining history
Improving query performance
Speeding up reporting
Computing commonly-needed values up

front
106
Disadvantages of Denormalization
Disk space: As will have duplicate data
Data anomalies: We must update every piece of
duplicate data . That also applies to computed values and
reports. We can achieve this by using triggers,
transactions and/or procedures for all operations that must
be completed together.
Documentation: We must properly document every
denormalization rule that we have applied.
Slowing other operations: We can expect that we’ll slow
down data insert, modification, and deletion operations.
More coding: It will require additional coding, but at the
same time they will simplify some select queries a lot.
107

OLAP

Uploaded by

Copyright:

Available Formats

OLAP

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

OLAP

Uploaded by

Copyright:

Available Formats

What are some common applications of OLTP?

What are some common applications of OLTP?

What are some advantages and disadvantages of using denormalized data models?

What are some advantages and disadvantages of using denormalized data models?

Revision

Easy and best solution for online shoppers.

Features Operational Information

Features Operational Information

12/16/20 Data Mining: Concepts and Techniques 27

A cube is a visual representation of a

Category e.g Electrical Appliance

Sub Category e.g Kitchen

Product e.g Toaster

Category e.g Electrical Appliance

Sub Category e.g Kitchen

Product e.g Toaster

Benefits: Easy to understand, easy to define hierarchies, reduces no. of physical

Store Dimension Fact Table Time Dimension

Drawbacks: Time consuming joins, report generation slow

Multiple fact tables share dimension tables.

The main shortcoming of the fact

region Europe ... North_America

country Germany ... Spain Canada ... Mexico

city Frankfurt ... Vancouver ... Toronto

office L. Chan ... M. Wind

Report: The number of units sold in first quarter

Report: The number of units sold over time (per month)

January February March April

Jan Feb Mar Apr

Jan Feb Mar Apr

Jan Feb Mar Apr

City_ID Prod_ID Time_ID Units Rupees

1218 Swiss Roll 2

City_ID City Region Country

1 Mumbai West India

City Product Time Units Dollars

 HOLAP tools deliver selected data directly from the DBMS

◦ The architecture results in significant data redundancy

As the name indicates, denormalization is the

Improving query performance

Computing commonly-needed values up

You might also like