Unit2 Olap
Unit2 Olap
Unit2 Olap
Sales
Marketing
HR
SCM, etc.
It may pass through operational data store or other transformations before it is loaded to
the DW system for information processing.
A Data Warehouse is used for reporting and analyzing of information and stores both
historical and current data. The data in DW system is used for Analytical reporting,
which is later used by Business Analysts, Sales Managers or Knowledge workers for
decision-making.
In the above image, you can see that the data is coming from multiple heterogeneous
data sources to a Data Warehouse. Common data sources for a data warehouse
includes −
Operational databases
SAP and non-SAP Applications
Flat Files (xls, csv, txt files)
Data in data warehouse is accessed by BI (Business Intelligence) users for Analytical
Reporting, Data Mining and Analysis. This is used for decision making by Business
Users, Sales Manager, Analysts to define future strategy.
OLAP
Online Analytical Processing Server (OLAP) is based on the multidimensional data
model. It allows managers, and analysts to get an insight of the information through fast,
consistent, and interactive access to information. This chapter cover the types of OLAP,
operations on OLAP, difference between OLAP, and statistical databases and OLTP.
Relational OLAP
ROLAP servers are placed between relational back-end server and client front-end
tools. To store and manage warehouse data, ROLAP uses relational or extended-
relational DBMS.
ROLAP includes the following −
Multidimensional OLAP
MOLAP uses array-based multidimensional storage engines for multidimensional views
of data. With multidimensional data stores, the storage utilization may be low if the data
set is sparse. Therefore, many MOLAP server use two levels of data storage
representation to handle dense and sparse data sets.
Hybrid OLAP
Hybrid OLAP is a combination of both ROLAP and MOLAP. It offers higher scalability of
ROLAP and faster computation of MOLAP. HOLAP servers allows to store the large
data volumes of detailed information. The aggregations are stored separately in MOLAP
store.
OLAP Operations
Since OLAP servers are based on multidimensional view of data, we will discuss OLAP
operations in multidimensional data.
Here is the list of OLAP operations −
Roll-up
Drill-down
Slice and dice
Pivot (rotate)
Roll-up
Roll-up performs aggregation on a data cube in any of the following ways −
Here Slice is performed for the dimension "time" using the criterion time = "Q1".
It will form a new sub-cube by selecting one or more dimensions.
Dice
Dice selects two or more dimensions from a given cube and provides a new sub-cube.
Consider the following diagram that shows the dice operation.
The dice operation on the cube based on the following selection criteria involves three
dimensions.
The following are general optimization techniques for efficient computation of data
cubes which as follows −
Sorting, hashing, and grouping − Sorting, hashing, and grouping operations must be
used to the dimension attributes to reorder and cluster associated tuples. In cube
computation, aggregation is implemented on the tuples that share the similar set of
dimension values. Therefore, it is essential to analyse sorting, hashing, and grouping
services to access and group such data to support evaluation of such aggregates.
It can calculate total sales by branch, day, and item. It can be more effective to sort
tuples or cells by branch, and thus by day, and then group them as per the item name.
An effective performance of such operations in huge data sets have been widely
considered in the database research community.
Such performance can be continued to data cube computation. This method can also
be continued to implement shared-sorts (i.e., sharing sorting costs across different
cuboids when sort-based techniques are used), or to implement shared-partitions (i.e.,
sharing the partitioning cost across different cuboids when hash-based algorithms are
utilized).
Simultaneous aggregation and caching of intermediate results − In cube
computation, it is effective to calculate higher-level aggregates from earlier computed
lower-level aggregates, instead of from the base fact table. Furthermore, simultaneous
aggregation from cached intermediate computation results can lead to the decline of
high-priced disk input/output (I/O) operations.
It can compute sales by branch, for instance, it can use the intermediate results
changed from the computation of a lower-level cuboid including sales by branch and
day. This methods can be continued to implement amortized scans (i.e., computing as
several cuboids as possible simultaneously to amortize disk reads).
Aggregation from the smallest child when there exist multiple child cuboids −
When there exist several child cuboids, it is generally more effective to calculate the
desired parent (i.e., more generalized) cuboid from the smallest, formerly computed
child cuboid.
The Apriori pruning method can be explored to compute iceberg cubes
efficiently − The Apriori property in the context of data cubes, defined as follows: If a
given cell does not fulfil minimum support, therefore no descendant of the cell (i.e.,
more specific cell) will satisfy minimum support. This property can be used to largely
decrease the computation of iceberg cubes.
The description of iceberg cubes includes an iceberg condition, which is a constraint on
the cells to be materialized. A general iceberg condition is that the cells should satisfy a
minimum support threshold including a minimum count or sum. In this term, the Apriori
property can be used to shorten away the inspection of the cell’s descendants.