Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Unit 5 DWA

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 110

DATA WAREHOUSING IN TAMIL

NADU GOVERNMENT
Introduction
● Initially, the government was not keen on investing a large amount. But, due to the
advent of data analytics , the government sector also started investing in the IT sector.

● But with data analytics comes humongous amounts of data and the troubles in storing
such amounts of data and accessing them has led to the need of investing into a good
data warehouse

● There has been a drive for more efficiency and greater transparency among all the
levels of the government. As a result, every department wants to analyze information
using data analytics, store it using the data warehouses, and later share the public’s
insights.

● Data Warehousing can help institutions deal with this large variety of challenges
IMPORTANCE OF DATA WAREHOUSING IN GOVERNMENT SECTOR

● Data Warehousing help to structure the raw data into meaningful insights.
● Different data warehousing tools help them to answer ad-hoc questions.
● By properly studying the specific amount of data, one can create structured
reports and have dashboards to follow up on their goals and actions.
solutions that provide actionable insights
● Data warehousing helps align the realization of strategic goals within a
governmental institution to their financial budget.
Needs and benefits
● Information is one of the valuable assets to any Government. Governments deal with enormous
amount of data.
● To use information to its fullest potential, the planners and decision makers need instant access to
relevant data in a properly summarized form.
● A DW can deliver strategic intelligence to the decision makers and provide an insight into the
overall situation from the historical data.
● This greatly facilitates decision­makers in taking micro level decisions in a timely manner.
● The Government decision makers can be empowered with a flexible tool that enables them to
make informed policy decisions for citizen facilitation and accessing their impact over the
intended section of the population.
● They can obtain easily decipherable and comprehensive information without the need to use
sophisticated tools.
Data Warehouse used

● General Information Service Terminal of National Informatics Centre


(GISTNIC) Data Warehouse
● An initiative taken by National Informatics Center (NIC) to provide a
comprehensive information database by the government.
● GISTNIC Data warehouse is a web-enabled SAS software solution
● GISTNIC website has online data warehouse which includes data marts on
village amenities, rainfall, agriculture census data, essential commodity
prices etc
MAJOR DATA ANALYSIS RESOURCES

● Database: We have a centralized repository to store the bulk data. We use the database
to keep consolidated information to avoid long waiting times when you make a query.
● ETL tool: ETL stands for Extract, Transform and Load. With the help of ETL development
tools, we can make any adjustments and then make data available for the visualization
tools. We can then define the transformation and then load the final information after
processing the data.
● Visualization tool: Once the data has been processed, the user can create graphs,
charts, do additional calculations and even run complex algorithms for insightful data.
These tools also provide options to share the information with others.
Data Warehousing Applications for State and Local Government

● Educational reporting system

● Asset control and management

● Budgetary control and management

● Public expenditure disclosure dashboards

● Environmental monitoring & disaster management


Key Results

● Cost reduction in data warehouse functionality

● Ability to deliver data warehouse centric services

● Data consolidation

● Administrative cost reduction

● Directive on the re-use of public sector information

● Data Protection
Conclusion
Although there are hundreds of reasons in government for not building a proper
architecture, all of the excuses have a seed of truth but all of them pale in
comparison to the long-term need for efficient, accurate and cheap information in
the government.

However, the basic truth and bottom line responsibility for the government
remains that it is the steward of the public trust and must organize and maintain
the information it accumulates in an efficient, and accessible and meaningful
architecture for the future.
References
● https://thinklayer.com/role-of-business-intelligence-and-data-warehousing-in
-government-sector/
● https://joinup.ec.europa.eu/collection/cyprus-egovernment-initiatives/solutio
n/government-data-warehouse/about#:~:text=Government%20Data%20Ware
house%20(GDW)%20enables,making%20and%20for%20statistical%20purpos
es
.
● https://ojs.journals.cz/index.php/RJEBI/article/view/185
● https://www.techtarget.com/searchdatamanagement/news/2240033796/Data
-warehousing-for-government
DATA WAREHOUSING IN MINISTRY
OF COMMERCE
Ministry of
Commerce
The vision of the department is to mark India's role in world
trade and assume a role of leadership in the international
trade organizations commensurate with India's growing
importance.

It formulates and implements foreign trade policies and


responsibilities relating to commercial relations, state trading,
export measures and development of export oriented
industries
Objectives
A web enabled data warehouse

Ensure standard of
quality
Globalization of India's foreign trade

Attracting foreign Simplifying export


investment procedures

Scaling down tariff Streamlining collected


barriers data
GISTNIC
Data
Warehouse
An initiative taken by NIC to provide a
comprehensive database by the Government on
agriculture, economy, science and technology,
etc.

It is web-enabled SAS software with data marts


on commodity prices, import and export revenue,
trade policies, proposals for state trading, etc

It provides online information to key decision


makers in the Government sector enabling them
to make better strategic decisions for
administration
Infrastructur
e
Layou Component Man power
t s requirements
The basic infrastructure required is This includes the tasks necessary to The senior officials in the
based on the communication provide the technical basis for the M inistry sponsored the whole
infrastructure, hardware/ software/ warehouse, i.e. connectivity warehouse implementation and
tools and manpower. The policies between the legacy environment played active role as EXIM
on unilateral and bilateral trading and the new warehouse policy and business architect for
are also dealt separately. environment on a network as well data warehouse and also
as on database level. subject area specialists.
Export 1
.
Kandla Free Trade Zone,
Gandhidham

processing 2
.
Santacruz Electronics Export Processing
Zone, Bombay

zones
The Ministry of Commerce has been 3 Falta Export Processing Zone, West
. Bengal
regularly reviewing the data warehouse in
its board meetings. They are equipped 4 Madras Export Processing Zone,
with all analysis variables, reporting . Chennai
forms, zone performance and the
progress of exports in each zone of the 5 Cochin Export Processing
country . Zone

6 Noida Export Processing


. Zone

7 Vishakhapatanam Export Processing


. Zone
Key areas for
Analysis
Directions for unitwise, sectorwise Sectorwise , countrywise and zone wise
and countrywise imports and trends imports and export
exports

Occupancy details and growth of industrial Deployment and investments in


units infrastructure

Claims of reimbursement of central sales tax


of zones and DTA sales Deemed export
benefits

Comparative country wise import and Employment


export generation
Design of Analysis
The data model is prepared by the entire data availability and data requirements are
analyzed. The analysis variables for building the multidimensional cube are:

1.Employment generation with skill classification and zone/unit/industry


break-up

2. Investments in the zone

3. Performance of units and close monitoring during production

4. Deployment of infrastructure etc.


OLAP

Architecture● DBMSImplementation
capabilities
C onsists of 5 layers: Secretary of the department, Front
end, Server, Data marts, Warehouses for each zone
● All the 7 zones
have
DBMS/RDBMS
data for internal
management
of zone
activities and
been
forwarding the
MIS reports to
MOC. The
second layer is
located at the
MOC New Delhi
with large
metadata
Related tables
for Analysis
Variables
1.EPZ or EOU 11. Year of production commencement
2.Zone 12.Month of production
3.Type of approval commencement 13.Day of production
4. Type of industry commencement
5. State 14. C urrent status
6. District 15. Date of current status
7. Year of approval 16. Net foreign exchange percentage
8. Month of 17. Stipulated NFE
approval 18. Number of approvals
9. Day of approval 19. Number of units
10. Shipment mode 20. Bill entry
C onclusio
n
The data warehouse for the Ministry of Commerce scales large
volume of data along with seamless presentation of historical ,
projected and derived data. It helps the planners in what-if
analysis and planning without depending on the zones to
supply the data. More dimensions can be included with the
data collected from other offices. It evolves a data warehouse
model for better analysis of promotion of imports and exports
in the country. This will provide an excellent executive
information system to the secretaries of the Ministry.
DATA WAREHOUSING FOR THE
GOVERNMENT OF ANDHRA PRADESH
DATA WAREHOUSE FOR FINANCE DEPARTMENT

Responsibilities of the finance department:

The finance department of the government of andhra pradesh has the


following responsibilities.
1.Preparing a department-wise budget up to the sub-detail head and
submission to the legislature for its approval.
2.Watching out the government expenditure and revenue department-wise
3.Looking after development activities under various plan schemes
4.Monitoring other administrative matters related to all heads of the
department.
Old Andhra Pradesh
New Andhra Pradesh
Treasuries in Andhra Pradesh:
Money standing in the government account are kept either in treasuries or in banks.

Money deposited in the banks shall be considered as general fund held in the books of the banks on
behalf of the state.
Sub Treasuries
If the requirements of the public business make necessary the establishment of one or more sub-
treasures under a district treasury.

The accounts of receipts and payments at a sub-treasury must be included monthly in the accounts of the
district treasury.

Treasuries handle all the government receipts and payments.

Every transaction in the government is made through related departments


Sub Treasuries

Data warehouse technology provided to the department of treasuries by national informatics centre eliminates
major problems like slow and inefficiency by storing current historical data from disparate information
systems.
Data warehouse provide efficient analysis and monitoring of financial data of treasuries.
It also evaluates the internal and external business factors related to operational economic and financial
conditions of treasuries budget utilization.
Dimensions covered under finance data warehouse

Following are different dimensions taken for drill down approach against payments and receipts:

● Department

Various departments in finance like accounting, finance management

● DIstrict Treasury Office

In charge of handling district’s money

● Sub-Treasury Office

Branches of treasuries
Dimensions covered under finance data warehouse

● Drawing and DIsbursing Officer

Head of office (Secretary), drawing and disbursing money from fund

● Time (year/month/week/day)

Particular date and time for transactions

● Bank-wise

Different banks and their regulations


Dimensions covered under finance data warehouse

● Based on different forms (Bills)

Amount paid, amount received, amount range


What is COGNOS?

● IBM® Cognos® Business Intelligence is an


integrated business intelligence suite that
provides a wide range of functionality to
help you understand your organization's
data.

● Everyone in your organization can use IBM


Cognos BI to view or create business
reports, analyze data, and monitor events
and metrics so that they can make
effective business decisions.
COGNOS Graphic User Interface For Treasuries And
Data Warehouses

The features of COGNOS Powerplay Are as follows:

1. Impromptu-
● It is an interactive database reporting tool.
● Used for generating various kind of reports such as simple crosstab etc.
● Once report has been published on HTML,one can view the report using th Web browser
● A web browser is required but not Impromptu to view HTML reports.
● Once can view an HTML report on internet ;on a network;and that has been sent via e-mail.
2. Transformer

● In this model objects mai contain definition of queries, dimensions, measures, dimension views,as well as objects for
one or more cubes.
● Transformer stores models as files with the extensions.
● Once the model is ready , creating one or more cubes or cube groups based on the model content is possible.

3. PowerPlay

● COGNOS PowerPlay is used to populate


reports with drill-dou facility.
● Some of the popular reports are as follows-

Dynamic rank position

Financial report

Business trend

Comparative performance

Foreign currency report


4. Scheduler
● Scheduler coordinates the execution of automated processes, called asks, on a set date and time,
or at recurring intervals.
● Scheduler supports tasks that run once and tasks that rín repeatedly,
● Through Scheduler, Impromptu users can submit Impromptu report requests to be executed
either locally, or by an Impromptu Request Server.

5. Authenticator
● Authenticator is a user class
management system.
● It provides COGNOS client applications
with the ability to create and show data
based on user-authenticated access.
● It also serves as á repository for log-in
information, thus providing client applications
with auto-access to data sources and servers.
Data Warehousing in Hewlett Packard
Hewlett Packard established in 1939 is now one of the biggest
MNCs in the world.

Introduction
HP has established itself in a lot of technological fields like
Computer Hardware, Computer Software, IT services and IT
consultancy.

Having so much on plate it has to have a very powerful and cost-


efficient data warehouse to store, analyze and report data of all
the businesses.

In addition to its Home Business, Office Business and other sales


Business, the Customer Products Group includes HP’s Customer
Products Business Organization (CPBO), responsible for
worldwide retail sales and distribution of all of HP’s customer
targeted products.

Greg Stanley, former manager of HP CPBO’s Business Analysis


Group said, “To be successful in our business, we have to come
up with solutions for the reseller.”

For the second quarter of 2022, the market share of HPE is


18.8% with a net revenue of $16.5 billion.
Data Warehousing in Brief working of DataWarehouse
Hewlett Packard

Data warehouses serve as a


central repository for storing and
analyzing information to make
The main characteristics of a
better informed decisions.
data warehouse are as follows:
• Subject-Oriented
• Integrated
• Non-Volatile
• Time-Variant
An organization's data warehouse
receives data from a variety of
sources, typically on a regular
basis, including transactional
systems, relational databases, and
other sources.
Data Warehousing in
Hewlett Packard

Access to Information needed using Data Warehouse


Technology

• HP has done superb job of capturing & storing information its reseller need both from primary research and
from third parties.
• Data cannot communicate with one another which exists in repositories.
• Business Analysis Group decided need of a system that would provide market metric data to help field sales
force managers or account teams make brand and channel management decisions.
Data Warehousing in
Hewlett Packard

Timeline
Knosys recommended Friction caused between Go ahead from
Requirement of System
HP to adopt SQL Server HP and Knosys Symmetry Inc.
by HP Team
7.0 and ProClarity • Symmetry validated the

solution • Knosys was confident it will


project proposed by Knosys.
• Knosys helped to build the
help to move data fast with data flow algorithms with
advantage of low cost of MS Visual Basic
Group turned to Knosys Inc
maintenance and ownership. development system and
which developed developed SQL Server 7.0 Data
• Low in cost, • HP called Symmetry Inc. for
OLAP package called ProClarity Transformation Services.
• Requires low maintenance, their expertise in OLAP
which was fully developed from • From Access the data went
• Easy to use. consulting to evaluate the
ground for Microsoft SQL to MS SQL Server.
implementation proposed by
Server 7.0. • Visual Basic Application
Knosys.
and MS SQL provided
common ground.
Data Warehousing in
Hewlett Packard

Timeline
HOLAP Capabilities Diving into the details on Advantages of HP success
implementation. integrating ProClarity
• ProClarity provides HP • HP had expertise of
• HP used SQL Server 7.0's virtual decision makers with the proprietary & monolithic
• Clay Young - VP(Marketing) cubes and cube partitioning key to analysing masses of solutions.
said HP were impressed by capabilities. • Group wanted open, highly
data.
SQL Server 7.0 HOLAP • Cubes are databases with multiple flexible analytical
• ProClarity gives data
Capabilities. dimensions. application that could be
visualisation to decision
• HP's enormous sell-through deployed in variety of ways
• HP has at least 8 OLAP cubes, for makers.
data volumes would take such as PC client, Web
different decision makers. • ProClarity's powerful
long time to build analytical based client etc.
• Virtual cube allow decision analytical features take full
models with pure,
makers to cross analyse data. advantage of robust SQL
multidimensional OLAP.
• Virtual cubes also derive new Server 7.0 capabilities.
• Pure OLAP doesn't meets
business views from existing • It helps knowledge workers
query performance of HP
cubes making it easier for to understand complex data.
decision makers.
Business Analysis Group to
manage business views.
Data Warehousing in
Hewlett Packard

Technical Challenges

• The system requires numerous works to organize and integrate all subsystems into one single unit.

• Need for database knowledge and technology among implementers, designers and users of the
system as well.
• Current corporate mergers are also a hindrance to immediate implementation of such database
management systems.
• Implementing and running multiple data warehouses comes with high financial costs and can be
extremely expensive for the company.
Data Warehousing in
Hewlett Packard

Conclusion
When the SQL Server and
ProClarity system is fully Account representatives will log on Produce a report that shows
operational, the manager expects it to a web page and pull up last inventory problems in particular
to provide all the significant week’s sales and inventory levels. stores.
benefits.

Analysts will study these trends.


Suggest that the problem may be
New systems now used are more
Take our in-store audit data. that the product is displayed in the
accurate, detailed, timely data will
computer aisle.
make business more efficient.
DATA WAREHOUSING IN
LEVI STRAUSS
About Levi Strauss & Co.
Levi Strauss & Co. is an American clothing company known worldwide
for its Levi's brand of denim jeans. It was founded in May 1853 when
German immigrant Levi Strauss moved from Buttenheim, Bavaria, to
San Francisco, California, to open a west coast branch of his brothers'
New York dry goods business. Although the corporation is registered in
Delaware, the company's corporate headquarters is located in Levi's
Plaza in San Francisco.
INTRODUCTION
Here is how they explained their engineering decisions to their
customer.

They employed a standard star join schema for the following reasons:

•Many relational database management systems, including Oracle8.1,


were heavily optimized to execute queries against these schemata.

•This kind of schema had been proven to scale to the world's largest
data warehouses.
Data Mining Importance :

● It helps companies gather reliable information.


● It's an efficient, cost-effective solution
compared to other data applications.
● It helps businesses make profitable production and
operational adjustments.
● Data mining uses both new and legacy
systems.
● It helps businesses make informed decisions.
● Analyzing the sales figures of previous
Data Mining for decision years

making ●

Analyzing the variations in sales figures
Taking a look at the motivation levels of
its employees.
● Comparing the effect of factors like
pricing, brand promotion, salary hikes,
etc. on the production and sales.
AI- Powered
Design
AI-powered design now factors into the company’s
clothing.

The levi’s team, actually created a style transfer


algorithm. This is a powerful neural network that he
created. feeding thousands and thousands of art
images into that neural network.
A design of Levi’s trucker jacket with Vincent van Gogh’s Starry Night
imprinted on it. Image: Levi Strauss & Co
Data Warehousing Process
Star Schema
Database organizational structure optimized for use in a
data warehouse that uses a single large fact table to
● store transactional or measured data, and one or more
Define the star schema
smaller dimensional tables.
● Populate dimension tables
● Load the fact tables
● Arrange periodic updating of fact table
The Dimension
Tables
● Time_dimension
● Product_dimension
● Promotion_dimension
● Consumer_dimension
● User_experience_dimension
● Ship_to_dimension

time_dimension
promotion_dimension

user_experience_dimensio
product_dimension n

consumer_dimension

ship_to_dimension
The Fact Table
Granularity of the fact table is one order as billion-
row fact tables were manageable.

Defining the fact table


Populating the fact table.
Query Generation :The Commercial-Source Route

● They use Seagate Crystal Reports and Crystal Info to analyse their data.
● Security and social issues associated with allowing a SQL*Net connection from a Windows machine running
Crystal Reports out through the Levi's firewall to Oracle data warehouse on the Web.
● The ArsDigita Community System extended with a data warehouse query module that runs as a Web-only tool.
Query Generation :The Open-Source ACS Route

Goals of ‘dw’ module in ArsDigita Community:

● Naive users can build simple queries by themselves.


● Professional programmers can step in to help out the
naive users .
● A user with no skil can re-execute a saved query.
Queries
DATA WAREHOUSE IN THE WORLD BANK
Overview
Project Planning

● Identifying business opportunity or problem


● Perform feasibility study
● Gather user requirements
● Develop data and application models
● Select deployment hardware and software
● Code data models and applications
● Write documentation
● Deploy testing environment
● Deploy production system
● Maintain production system
Requirements Definition
We are using OLAP and SQL Server for the Use Case we have selected.

What is OLAP?
OLAP (OnLine Analytical Processing) is software for performing multidimensional analysis at high speeds
on large volumes of data from a data warehouse, data mart, or some other unified, centralized data store.
Most business data have multiple dimensions—multiple categories into which the data are broken down
for presentation, tracking, or analysis. For example, sales figures might have several dimensions related to
location (region, country, state/province, store), time (year, month, week, day), product (clothing,
men/women/children, brand, type), and more.
But in a data warehouse, data sets are stored in tables, each of which can organize data into just two of
these dimensions at a time. OLAP extracts data from multiple relational data sets and reorganizes it into a
multidimensional format that enables very fast processing and very insightful analysis.
What is OLAP Cube?
The core of most OLAP systems, the OLAP cube is an array-based multidimensional database that makes it
possible to process and analyze multiple data dimensions much more quickly and efficiently than a traditional
relational database.

A relational database table is structured like a spreadsheet, storing individual records in a two-dimensional, row-
by-column format. Each data “fact” in the database sits at the intersection of two dimensions–a row and a
column—such as region and total sales.

SQL and relational database reporting tools can certainly query, report on, and analyze multidimensional data
stored in tables, but performance slows down as the data volumes increase. And it requires a lot of work to
reorganize the results to focus on different dimensions.

This is where the OLAP cube comes in. The OLAP cube extends the single table with additional layers, each
adding additional dimensions—usually the next level in the “concept hierarchy” of the dimension. For example,
the top layer of the cube might organize sales by region; additional layers could be country, state/province, city
and even specific store.
What is SQL Server?
SQL Server is a relational database management system, or RDBMS, developed and marketed by Microsoft.

Similar to other RDBMS software, SQL Server is built on top of SQL, a standard programming language for
interacting with relational databases. SQL Server is tied to Transact-SQL, or T-SQL, the Microsoft’s
implementation of SQL that adds a set of proprietary programming constructs.

SQL Server works exclusively on the Windows environment for more than 20 years. In 2016, Microsoft made it
available on Linux. SQL Server 2017 became generally available in October 2016 that ran on both Windows and
Linux.
SQL Server Architecture
CONSTRUCTION OF DATA WAREHOUSE
Building a Data Warehouse –
Some steps that are needed for building any data warehouse are as following below:

● To extract the data (transnational) from different data sources:


For building a data warehouse, a data is extracted from various data sources and that data is stored in
central storage area. For extraction of the data Microsoft has come up with an excellent tool. When you
purchase Microsoft SQL Server, then this tool will be available at free of cost.
● To transform the transnational data:
There are various DBMS where many of the companies stores their data. Some of them are: MS Access,
MS SQL Server, Oracle, Sybase etc. Also these companies saves the data in spreadsheets, flat files, mail
systems etc. Relating a data from all these sources is done while building a data warehouse.
● To load the data (transformed) into the dimensional database:
After building a dimensional model, the data is loaded in the dimensional database. This process
combines the several columns together or it may split one field into the several columns. There are two
stages at which transformation of the data can be performed and they are: while loading the data into
the dimensional model or while data extraction from their origins.
● To purchase a front-end reporting tool:
There are top notch analytical tools are available in the market. These tools are provided by the several
major vendors. A cost effective tool and Data Analyzer is released by the Microsoft on its own.
DESIGN

Info Delivery
Data Storage
Data Acquisition- Combined
DEPLOYMENT
Deploying a data warehouse is different than deploying a transactional database. A data warehouse is usually
deployed incrementally, in different tiers. Each layer is tested after it is deployed so that if there are any
problems, they are identified and solved simultaneously. A data warehouse is not rolled out or made available to
all the users within an organization at once. The pace at which the deployment takes place, and the order in
which various groups get access to the data warehouse, depends on the requirements identified in the discovery
stage.

The deployment stage begins with putting infrastructure (servers, hardware, storage, etc.) in place. Then
software is installed and tested to ensure that it is ready for production. Next comes the design part where
transactional sources are connected, facts and dimensions are identified, entities are denormalized, and
dimensional models are created. Once relational and OLAP databases for data warehouse are set up, ETL
processes are brought online by specifying data load settings. The application and BI integration layers are
added last.
MAINTENANCE

Data warehousing is an increasingly important business intelligence tool, allowing organizations to:

1. Ensure consistency. Data warehouses are programmed to apply a uniform format to all collected data, which

makes it easier for corporate decision-makers to analyze and share data insights with their colleagues around the

globe.

2. Make better business decisions. Data warehousing improves the speed and efficiency of accessing different

data sets and makes it easier for corporate decision-makers to derive insights that will guide the business and

marketing strategies that set them apart from their competitors.

3. Improve their bottom line. Data warehouse platforms allow business leaders to quickly access their

organization's historical activities and evaluate initiatives that have been successful — or unsuccessful — in the

past.
HARBOR - A highly available Data Warehouse
Introduction

A data warehouse is a central repository of information that can be analyzed to make


more informed decisions. Data flows into a data warehouse from transactional systems,
relational databases, and other sources.
Data warehouses depend critically on various computational elements and networking
elements.
If any of the computational elements fail then it will result in the failure of the data
warehousing and OLAP services offered.
In other words a highly available data warehouse is needed to ensure a high-level user
satisfaction of the services.
Known Methods

Any highly available database system or data warehousing system will


use data replication to ensure that the data access continues with no or
very few interruptions, the latter may arise if some computational
system or component fails.
Approaches for high availability include: identical sites, identical
replicas and identical ways of storing the replicas and identical
mechanisms for distribution
What is HARBOR?

HARBOR (Highly Available Replication-Based Online Recovery) is more flexible and does not insist on
all these requirements of identical copies, as long as they represent logically the same data. With
this flexibility, it is possible to store data redundantly in different sort orders in various data
compression formats.

The different updatable materialized views are also possible so that a wider variety of queries can be
answered by the query optimizer and query processor.

A system called C-Store achieves an order of magnitude higher performance than is usual by storing
the data in different sort orders and different compression formats.

Data redundancy can, therefore, provide higher performance and also higher availability in HARBOR.
Fault/Failure Tolerance

A fault/failure tolerant system is defined to provide K-safety if upto K-sites fail and the system
will still continue to provide services for any query or transaction. The minimum number of
sites required for K-safety is K+1, where the K+1 workers store the same replicated data.

In HARBOR it is assumed that the database designer has replicated the data and structured the
database in such a way as to provide K-safety. The high availability in HARBOR guarantees
that K simultaneous failures can be tolerated and still bring the failed sites online.

The approach assumes reliable network transfers via TCP/IP. It also assumes fail stop failures'
and does not deal with corrupted data, incompletely written disk pages and network partitions.
Historical Query Processing
In HARBOR, the historical queries are processed in a definition, unique manner.

A historical query is a read-only query that returns a result, as if the query had been executed on the
database at time T. This updates time travel feature and enables the inspection of past states of the
database.

Such historical queries are supported in HARBOR using representation of data in which time stamps are
associated with each tuple Insertion time' and 'deletion time' are added on to the tuple as insertion time,
deletion time.

On doing this, the information necessary to answer historical queries is preserved. To answer a historical
query on a tuple at time 'T', it has to be confirmed whether the tuple was inserted at or before time T and
deleted T, or 'versioned' after T.
Recovery System
The algorithm for recovery consists of three phases and uses time stamps associated with tuples
to answer time-based range queries for the tuples inserted or deleted during a specified time
range.

In the First phase: The crashed site then uses the time stamps available for historical queries to
nun local update transactions to restore itself to the time of its last checkpoint.

In the Second phase: The site executes historical queries on other live sites that contain
replicated copies of its data in order to catch up with the changes made between the last
checkpoint and sometime closer to the present.

In the Third phase: The site executes the standard non-historical queries With read locks to catch
up with any committed changes between the start of the second phase and the current time.
Performance

Performance evaluation had shown that the recovery approach described in the
foregoing section works well for data warehouses and similar environments,
where the updated work loads consist primarily of insertions with relatively few
updates to historical data.
Conclusion

An advanced and effective methodology has been presented for ensuring


high availability of a data warehouse, HARBOR. Similar techniques can be
used for ensuring high availability for any data warehouse.
Customer Data Warehouse of the World’s First and
Largest Online Bank in the United Kingdom
Introduction

● Egg PLC, an internet bank, catered to more than 3.2 million members extending services like banking, insurance,
investments and mortgages in the UK
● World’s largest purely online bank
● Established in 1988, it pioneered online banking not only in the UK but also throughout the world
● According to a case study of 2006 [1], it didn’t have a branch network with physical buildings and almost all of the
user interaction was online and hence automated
● For several years, it relied on a number of customer information sources which resulted into inconsistent information
● In order to solve this problem, the bank decided to establish a Customer Data Warehouse (CDW) that would serve as
a consolidated and centralized information source
Background and motivation

● Handling more than 2.5 million transactions per day, Egg PLC required highly scalable but
reliable IT and internet infrastructure
● 85% of total transactions through www.egg.com
● Until 2001, only online transactions were provided and customer information was obtained by
outsourcing the data warehousing activity to Experian
● Delay between outsourcing and provision of data was a serious issue
● Hence, the bank built and maintained its own CDW using Oracle and SAS in addition to existing
Sun software and hardware infrastructure
The Customer Data Warehouse

● The first version of the CDW of Egg was built on Sun Fire 6800 server and later on Sun Fire 15K
● Egg’s data warehouse residing on Sun Fire 15K had 16 CPUs and 10 GB for core system
● Storage application was an EMC’s SAN (Storage Area Network)
● Oracle database 10g, designed for enterprise grid computing, supported Real Application
Cluster (RAC) providing better performance for load sharing
● 10g had a failover facility within the cluster of 2 or more servers
● Hence, Oracle’s features such as parallel query, materialized views, and partitioning provided
Egg with the speed of information that users were seeking
User Interface

● “If someone visits egg.com, applies for credit with us, is accepted for credit and issued a card, by the time
the card is issued, we would see that person in our data warehouse” said Egg’s Head of data Jay
Parmar. [1]
● Bank’s internal users used the technology as per their requirements
● Used Statistical Analysis System (SAS) to extract, join and mine data
● Other tools and modules of SAS used were Comms Builder, BASE, Connect, Share, SPDS and Start,
● Egg’s CDW was about 2 TB in size with about 10 GB added each month
● Customers not only got regular transactions (OLTP) on the banking applications but also used data
mining services along with OLAP and data warehousing services
● “If we didn’t have this CDW, we couldn’t do the 120 campaigns per month across 6 channels that we
currently conduct”, Jay Parmar continued.
Sources of Data

● Data was sourced from both internal and external input customer data channels - credit
cards, loans, insurance services etc.
● CDW enabled analyzing the possible ways in which the customer was engaged with the
bank
● Hence, it became possible to determine the propensity and potentiality of existing customer
to buy additional products and make right choices of such products for the customer
● This is called as Market Basket Analysis
Benefits of CDW

● Parmar says, “Previously, users were satisfied with a six-week data latency, and were doing
marketing campaigns using data that was six weeks behind. We’re now doing marketing
campaigns using data that is one week behind, but have the capability to conduct daily
campaigning with data only 24 hours old”
● Daily credit decisions with due daily risk assessment and risk management
● Campaigns developed on the basis of CDW resulted in greater sales
● Egg consequently headed towards an operational data store - a real-time data warehouse
Security and Version Management

● Egg also implemented many security controls to ensure that access to CDW
conforms to the internal policies and also external regulatory requirements
● The data warehousing team comprised of DBAs, meta data analysts and
Oracle developers responsible for data security
● CDW has to obey external regulatory acts such as the UK’s Data Protection
Act of 1998, Financial Services Authority and the Banking Act
Refresh Policy and Data Marts

● Data in the CDW was refreshed in real time


● For example, if a customer applied for a credit card, by the time she gets her card, her details had already
been entered in the CDW
● Data was cleansed, matched and updated in the core data warehouse
● Data Marts were published from CDW for the bank’s individual departments - back-end credit decisioning
data mart, materialized views for financial reporting and customer counting, marketing data mart refreshed
thrice a week
● Not all data inflows were in real time; some data refreshes came in batch mode from internal sources
Customer Benefits

● Egg’s customers ultimately benefit most from the CDW. “If we didn’t have a data
warehouse, we would be severely restricted in the service we provide to
customers,” says Parmar [1]
● The CDW played a crucial and strong role in providing information related to
finances necessary for the customers
● All of the services of Egg critically depended on CDW and thus, the customers
critically depended on it
Reliability

● Hot plugins were provided to cover any failure in hardware components


● Multiple fail-safes were made available with SAN
● Disaster recovery system was provided
● With the help of the original Sun Fire 6800 server, 90% uptime was maintained
with 10% downtime
● SUN systems found out to be reliable and never had a failure beyond the
engineers scope
Conclusion

● In this case study, we have seen how a large online bank, Egg of the UK
deploys data warehousing for their core functions using reliable IT
infrastructure
● Such deployment resulted in enhanced success in banking service delivery
to the customers
● Also, it helped the internal staff in better planning and marketing decisions
References

● https://tdwi.org/articles/2006/05/09/egg-bank-improves-customer-data.aspx
● https://www.ybs.co.uk/help/online/egg
● https://www.google.co.in/books/edition/DATA_WAREHOUSING/rv-Xb6Eg
O6AC?hl=en&gbpv=0
● https://en.wikipedia.org/wiki/Egg_Banking
Also,

In March 2011, the credit card accounts were bought by Barclaycard, and in July 2011, the
remaining savings and mortgage businesses were sold to Yorkshire Building Society, which
subsequently transferred all remaining customer accounts over from Egg.

Following the sale of its assets, Egg Banking plc, which remained under the ownership of Citigroup,
was renamed Canada Square Operations Limited and continues to handle matters relating to certain
Egg products from before the sale of assets and any assets that were not transferred to the new owners
A German Supermarket EDEKA's Data
Warehouse
The Edeka Group is the largest German supermarket corporation as of
2017, holding a market share of 20.3% Founded in 1907, it consists today of
several co-operatives of independent supermarkets all operating under the
umbrella organisation Edeka Zentrale AG & Co KG, with headquarters in
Hamburg. There are approximately 4,100 stores with the Edeka nameplate
that range from small corner stores to hypermarkets. On 16 November 2007,
Edeka reached an agreement with Tengelmann to purchase a 70% majority
stake in Tengelmann's Plus discounter store division, which was then merged
into Edeka's Netto brand, with some 4,200 stores by 2018.[Under all brands
the company operated a total of 13,646 stores at the end of 2017.
OBJECTIVES OF THE DATA WAREHOUSE

● Enables Historical Insight - a data warehouse can add context to this historical data
by listing all the key performance trends that surround this retrospective research. This
kind of efficiency cannot be matched by a legacy database.
● Provides Major Competitive Advantage - This is absolutely the bottom line benefit of
a data warehouse: it allows a business to more effectively strategize and execute against
other vendors in its sector.
● Scalability - The top key word in the cloud era is “scalable” and a data warehouse is a
critical component in driving this scale. A topflight data warehouse is itself scalable, and
also enables greater scalability in the business overall.
● With growing competition and fierce growth edeka felt that a data warehouse
would give it an advantage to analyse information on sales turnover and inventory
levels
● EDEKA engaged Melsungen, a company based in Germany, to implement the data
warehouse applications based on DB2 of IBM.
● Ability to adapt more quickly to changing business conditions, with 50% faster
responses to data warehouse queries
● ability to leverage existing skill base
● improved end-user productivity with more reliable application; reduced
administration costs;
● With the help of data warehouse they were able to fastly adapt to market growth
and were able to forecast information
SOFTWARE ENVIRONMENT
EDEKA used IBM's DB2 Universal Database DBMS for iSeries and DB2 Connect Softwares on
IBM hardware environment

DB2 UDB for iSeries: Features overview

the distinguishing characteristic of the iSeries server database manager is that it is


part of the operating system. In practice, this means that the large majority of your iSeries
server data is stored in the relational database. Although the iSeries server implements other
file systems in its design, the relational database on the iSeries server is the most commonly
used database by customers. Your relational data is stored in the database, along with typical
non-relational information, such as the source of your application programs
IBM Db2® Connect
It provides application enablement and a communication infrastructure
that lets you connect web, Microsoft Windows, UNIX, Linux and mobile applications to
IBM z/OS®, AS/400, iSeries and System data. The software offers data integration, a
secure application environment and centralized management for data servers and
clients. It is available in six editions to meet different application, development and
scalability needs. In each case, license charges are not affected by number of users,
Db2 Connect server size, or size of IBM System z® or i database servers.
HARDWARE ENVIRONMENT

EDEKA used the following hardware environment for its data warehouse

1. IBM Server iseries 830 (processor)

2. IBM TotalStorage" Enterprise Storage Server (disk storage)

3. IBM TotalStorage Enterprise Tape System 3590 (tape storage)


BUSINESS GROWTH AND THE DATA WAREHOUSE

● The Edeka Group is founded in 1907


● There are approximately 4,100 stores with the Edeka nameplate that range from small
corner stores to hypermarkets
● the data warehouse became so valuable that it couldn’t keep pace with EDEKA’s
business growth. Over the past two decades
● Over the past two decades, EDEKA has rapidly expanded to include 60 wholly owned
supermarkets and 1,000 retailers in the Hessen and Thuringian regions of Germany.
Today, the company has 6,100 employees and 2002 annual revenues exceeding 1.46
billion (US$1.66 billion).
● With the growth of business the data warehouse problems started to increase

Rapidly
● as additional users began requesting increasing amounts of analysis, the system
slowed down.
● With an eye toward future growth, EDEKA recently made a strategic decision to
move its data warehouse from its existing hardware platform to the fast, scalable
IBM iSeries 830 system
● DB2 running on an iSeries system provides us with a highly scalable and cost-
effective platform for our data warehouse,
● By migrating its data warehouse to DB2 for iSeries, EDEKA gains a business
intelligence infrastructure that delivers a wide variety of valuable marketing
information faster and more efficiently
● With the benefits of migrating to db2 edekas analysts are able to measure the volume
of returned goods, determine the reasons for changing shopping patterns and then
suggest corrective measures. Accordingly
● The employee productivity is increased multiple times
● reduces operating costs and optimizes IT and business resources.
● Direct queries to the new warehouse are now 50 percent faster, thanks to the speedier
transaction-processing capabilities of the iSeries
PERFORMANCE

● With the help of iSeries servers the transaction processing was made 50% faster.
● From their Microsoft ® Windows®- based PCs, users leverage IBM DB2 Connect and
business intelligence software from IBM Business Partner Business Objects to
access sales information and to track, understand and manage the wealth of
information stored in the data warehouse.
● The information in the data warehouse updates within an hour of sales activity,
assuring that the data is always fresh. “Through more timely business analysis
● EDEKA can anticipate market patterns much more accurately
● It allows our business managers to make decisions more quickly, thanks to the
rapid response to their queries.
● On the whole, EDEKA feels immensely satisfied with the present data warehouse as
it "empowers Edeka to make much more logical business decisions which in turn
result in improving profits and business objectives significantly
FORECASTING AND EXPANSION

● With all the data Edeka performed analysis to try and anticipate trends inorder to
expand and withstand market standards and to try and anticipate purchase
patterns
● This allowed them for example, to forecast demand growth and suggest corrective
measures in the event of a sudden surge in returned goods.
● Theretore, the data warehouse helped the company in responding quickly to
changing business conditions and competitive pressures
● This way they were able to distinguish themselves from other companies in market
● The warehouse is a repository for sales data that is uploaded from transaction
systems, also based on the iSeries system, located in 15 key cash-and-carry
supermarkets. The data warehouse also tracks wholesale goods that flow from
EDEKA’s 3 wholesale warehouses to more than 800 independent stores. Each of the
15 EDEKA supermarkets and the 3 wholesale warehouses has its own iSeries system
● EDEKA plans to include additional data storage capacity. In future, the data
warehouse will play a key role in addressing EDEKA's ongoing efforts to understand
better consumer shopping patterns
● This will help its executives keep the right products in the stores at the right time,
based on accurate and up-to date sales data from the data warehouse
conclusion
In this case study, we have examined how a German supermarket chain, EDEKA leverages the data
warehousing technology with reliable hardware to enhance its business objectives and profits, while faring
better against its competition.

References
https://www.ibm.com/products/db2-connect

https://www.redbooks.ibm.com/redbooks/pdfs/sg246092.pdf

https://www.datamation.com/big-data/top-10-benefits-of-a-data-warehouse/

You might also like