Critical Success Factors For DWH
Critical Success Factors For DWH
Critical Success Factors For DWH
Table of Contents
Introduction .................................................................................................................................... 1
Critical Success Factors (CSF) ..................................................................................................... 1
CSF: Sponsorship and Involvement .......................................................................................... 1
CSF: Business Requirements.................................................................................................... 2
CSF: Enterprise Architecture ..................................................................................................... 3
CSF: Data Warehouse Architecture and Design ....................................................................... 4
CSF: Data Warehouse Technology ........................................................................................... 9
CSF: Information Quality.......................................................................................................... 12
CSF: Development Environment ............................................................................................. 13
Summary ..................................................................................................................................... 15
- ALAN PERKINS
Introduction
A data warehouse is more than an archive for corporate data and more than a new way of
accessing corporate information. A data warehouse is a subject-oriented repository designed for
enterprise-wide information access. It provides tools to satisfy the information needs of
enterprise managers at all organizational levels not just for complex data queries, but as a
general facility for getting quick, accurate, and often insightful information. A data warehouse is
designed so that its users can recognize the information they want and access that information
using simple tools.
One of the principal reasons for developing a data warehouse is to integrate operational data
from various sources into a single and consistent structure that supports analysis and decisionmaking within the enterprise. Operational (legacy) systems create, update and delete production
data that "feed" the data warehouse.
A data warehouse is analogous to a physical warehouse. Operational systems create data
parts that are loaded into the warehouse. Some of those parts are summarized into information
components that are stored in the warehouse. Data warehouse users make requests and are
delivered information products that are created from the stored components and parts.
A data warehouse is typically a blending of technologies, including relational and
multidimensional databases, client/server architecture, extraction/transformation programs,
graphical user interfaces, and more.
Data warehousing is one of the hottest industry trends for good reason. A well-defined and
properly implemented data warehouse can be a valuable competitive tool.
A data warehouse has its own unique peculiarities and characteristics that make developing a
data warehouse unlike developing just another application. Not every enterprise is able to
successfully develop an effective data warehouse -- in fact there are many more failures than
successes.
Business Requirements
Enterprise Architecture
Information Quality
Development Environment
Management
Enterprise management must fully sponsor data warehouse development and usage.
Sponsorship includes ensuring sufficient resources are available. Sponsorship also
- 1 -
- ALAN PERKINS
Potential Users
All potential users of the data warehouse, even executives, from every organizational
unit and level, must be actively involved in data warehouse design, development, and
management. Data warehouse users will have the most influence on acceptance of the
warehouse, so it is imperative that their needs are addressed. They are also the
"owners" and "stewards" of operational data and thus are the best source for subject
matter expertise.
Strategic Plan
A strategic plan outlines an enterprises mission and purpose, goals, strategies and
performance measures (business requirements). Properly used, a strategic plan is the
tool with which effective managers guide their organizations and ensure corporate
success.
- 2 -
- ALAN PERKINS
An enterprises strategic plan not only provides a guide for effective management; it also
provides the guiding force for internal change and the guidelines for responding to
external change. Through the strategic planning process, the enterprise defines and
documents its purpose, goals, and objectives, along with strategies for achieving them.
Included in the process is an assessment of external opportunities and threats as well as
an assessment of internal strengths and weaknesses.
The most useful strategic plans are multi-dimensional, incorporating the enterprises
overall plan with the subordinate plans of every enterprise element, and including
performance measures for every critical outcome.
Performance Measures
Establishing the right performance measures is the key to successful enterprise
management. An enterprise must be able to tell whether progress is being made on its
critical goals and whether stakeholder expectations are being met.
The most effective and useful performance measures are cross-functional and are linked to
the appropriate strategies, objectives, and performance criteria. Management's targets and
thresholds for the measures, often based upon external benchmarks, form the structure for
an enterprise performance measurement system.
Performance measurement documentation should include not only the content of reports
and queries, but also document the path of the data from source to ultimate information
recipient. The combination of all the reports of all the performance measures becomes the
basis for a data warehouse and a Strategic Information System that is truly tailored to the
enterprise's requirements.
Executives and managers use the information produced from the data warehouse to
reinforce initiatives, reward behavior and change strategies. Employees use it to adjust
operations and respond to strategic needs. Linking timely accurate measures to specific
goals and objectives begins to make enterprise management more of a science and less of
an art.
- ALAN PERKINS
of use, subsets of the enterprise data architecture model should be established. These
subsets, or views, can represent functions, organizations, regions, systems, and any
other significant grouping of information.
components, and metadata should all be based upon internal information requirements -- not
specific technologies.
- 4 -
- ALAN PERKINS
- 5 -
- ALAN PERKINS
Data Warehouse
Components
Summarized
Data
Summarized
Data
Data Warehouse
Architecture
(Metadata)
Current
Detail
M/D
Operational
Systems of
Record
Integration/
Transformation
Programs
Archives
M/D
The heart of a data warehouse is its current detail. It is the place where the
bulk of data resides. Current detail comes directly from operational systems and
may be stored as raw data or as an aggregation of raw data. Current detail,
organized by subject area, represents the entire enterprise, rather than a given
application. Current detail is the lowest level of data granularity in the data
warehouse. Every data entity in current detail is a snapshot, at a moment in
time, representing the instance when the data are accurate. Current detail is
typically maintained for two to five years, but some enterprises may require detail
data for significantly longer periods. When initially implemented, a data
warehouse may include current detail more than two years old, but the often
questionable quality of older data must be considered and measures taken to
ensure its validity. Current detail refreshment occurs as frequently as necessary
to support enterprise requirements.
Lightly summarized data are the hallmark of a data warehouse. All enterprise
elements (department, region, function, etc.) do not have the same information
requirements, so effective data warehouse design provides for customized,
lightly summarized data for every enterprise element (see Data Mart, below). An
enterprise element may have access to both detailed and summarized data, but
typically much less than the total stored in current detail.
Highly summarized data are primarily for enterprise executives. Highly
summarized data can come from either the lightly summarized data used by
enterprise elements or from current detail. Data volume at this level is much less
than other levels and represents an eclectic collection supporting a wide variety
of needs and interests. In addition to access to highly summarized data,
executives also should have the capability of accessing increasing levels of detail
through a "drill down" process.
- 6 -
- ALAN PERKINS
Data warehouse archives contain old data (normally over two years old) of
significant, continuing interest and value to the enterprise. There is usually a
massive amount of data stored in the data warehouse archives that has a low
incidence of access. Archive data are most often used for forecasting and trend
analysis. Although archive data may be stored with the same level of granularity
as current detail, it is more likely that archive data are aggregated as they are
archived. Archives include not only old data (in raw or summarized form); they
also include the metadata that describes the old data's characteristics.
A system of record is the source of the best or "rightest" data that feed the data
warehouse. The "rightest" data are those which are most timely, complete,
accurate, and have the best structural conformance to the data warehouse.
Often the "rightest" data are closest to the source of entry into the production
environment. In other cases, a system of record may be one containing already
summarized data. Often, rightest data is created from diverse sources through
a reconciliation process.
- 7 -
- ALAN PERKINS
From what database and to what servers is the data moving? Is the data
moved one-way or bi-directionally?
How many replicas will be needed and are they all identical?
Network: The network issues that revolve around distribution and replication
include
Each application at each site must know where to find the data (preferred
site)
If the data is not available at the preferred site, how does the application
detect the problem?
Should the application have the ability to switch to an alternate site from
which to retrieve the data?
What events or conditions will trigger a dynamic data transfer? How much
data is transferred during a triggered data move?
How long will it take to perform the data transfer under various conditions?
Can the required time be minimized through better scheduling
- 8 -
- ALAN PERKINS
User Interface(s)
Data Warehouse users get useful information from the data warehouse and data marts
through user interfaces. It is these user interfaces that have the most impact on how
effective and useful the data warehouse will be perceived. Therefore, users must be
actively involved in selecting their own interface to the data warehouse. Two primary
criteria for selecting an effective user interface are ease of use and performance. For
ease of use, most enterprises turn to graphical user interfaces. For performance,
developers must ensure that the hardware/software platform fully supports and is
optimized for every chosen user interface. The most important selection criteria for user
interfaces are the information needs and the level of computer literacy of potential users
who will retrieve the information they need from the data warehouse. The following data
warehouse user categories are based on levels of literacy and information needs:
Information Systems Challenged - - data warehouse users who are totally
uninvolved with information systems. In management roles they rely on their
secretaries or assistants to retrieve information for them. These users need an
extremely easy to use and highly graphical interface or standard queries and
reports with a limited number of parameters.
Variance Oriented - users who are focused on the variances in numbers over
time. These users mainly want a set of standard reports that they can generate
or receive periodically so that they can perform their analyses.
Number Crunchers - users who are spreadsheet aficionados. They will take
whatever data are available and refine it, re-categorize it and derive their own
numbers for analyzing and managing the enterprise. Their needs can best be
met by providing a spreadsheet extract output format for any reports or ad hoc
queries provided.
- 9 -
- ALAN PERKINS
Technically Oriented - users who are either already familiar with computers or
have sufficient motivation to learn and use everything they can get their hands
on. These people want to have complete control over the way they retrieve and
format information. They are often business or systems analysts who have
moved into an enterprise function. They want to have all of the tools the data
warehouse development staff uses.
Most enterprises have all of these categories of individuals. This makes it advisable to
provide each type of data warehouse user interface.
The final user interface criterion is that it supports the access metadata designed for the
data warehouse. If a user interface is easy to use, allows all potential users to get the
information they need in the format they need, and does it in an acceptable amount of
time, it is the right interface.
Normalized structures
Relational
SuperRelational
MultiDimension
(logical)
MultiDimension
(physical)
9
9
ObjectRelational
9
9
Drill-down
Rotation
Multi-dimensional
structures
Data-dependent
operations
Hardware Platform(s)
The selection of one or more hardware platforms involves answering the following
questions: How much data will be in the data warehouse and how much can the platform
economically accommodate? How scaleable is the platform? Is it optimized for data
- 10 -
- ALAN PERKINS
warehouse performance? Will the platform support the software selected for the data
warehouse? How many users will simultaneously access the data warehouse? Will their
queries be simple or complex? These are the most important criteria for selecting
hardware to support a data warehouse. In answering these questions it is important to
consider all hardware platform characteristics; not just CPU speed and disk capacity, but
memory capacity and the input/output system capabilities as well. I/O capacity is often
the most critical to overall data warehouse performance. While increasing the number of
servers can usually increase memory and CPU capacity, increasing I/O capacity is not as
simple. Nevertheless, it is vital that the hardware platform(s) supporting a data
warehouse have sufficient capacity. This often requires multiple, independent I/O
channels or busses.
Data warehouse capacity planning is not an exact science. Underestimating is the rule
rather than the exception. Some experts advise doubling initial estimates of hardware
requirements because data warehouse users and query complexity increases
exponentially over the first few months after initial data warehouse implementation. Even
with sufficient initial capacity, it is critical to choose scaleable systems to support
inevitable but hard-to-quantify future growth.
LAN
SMP
Terabytes
Cluster
MPP
Despite the many vagaries of data warehousing and the relative youth of the field, early
adopters and vendors agree on a few general rules when estimating server capacity.
Small databases, simple queries -- LAN (local area network) servers, with a
single I/O bus are appropriate for data marts where the database is under 5GB.
Medium to large databases, more complex queries -- Response time is faster
on SMP (symmetric multiprocessing) systems than it is on uniprocessors, and
they tend to be more cost effective the systems where nothing is shared. Large
amounts of memory reduce outside seek time during queries, speeding
performance when querying large databases. Conventional wisdom suggests
that SMP machines begin to exceed capacity between 500GB and the low
terabyte range. Good performance for a medium-sized database also requires at
least two I/O channels. As the size of the database and complexity of queries
grows, more I/O channels are needed to maintain performance.
Very large databases, very complex queries -- very large data warehouses
(up to 5 terabytes) require clusters of SMP servers or MPP (massively parallel
processor) servers. The platforms with the best performance for very large
- 11 -
- ALAN PERKINS
databases, excluding MPPs tend to be the ones with a large number of I/O
channels.
Huge databases, extremely complex queries -- data warehouses that exceed
10 terabytes may need the processing power and I/O channels provided by
mainframe systems.
Choose
Requirements
Users
Support
Architecture
Server
DBMS
Departmental;
data analysis
Small; single
location
Minimal local;
average central
Consolidated;
turnkey package
Single
processor or
SMP
MDDB
Departmental;
analysis plus
informational
Large; analysts
at single
location;
dispersed
informational
users
Minimal local;
average central
Tiered; detail
central; summary
local
Cluster SMP
central; SP or
SMP local
RDBMS
central;
MDDB
local
Enterprise;
analysis plus
informational
Large;
geographically
dispersed
Strong central
Centralized
Clustered SMP
Objectrelational
; Web
support
Departmental;
exploratory
Strong central
Centralized
MPP
RDBMS
with
parallel
support
System Software
Concurrent with hardware selection is the selection of system software to support the
data warehouse. The operating systems must support the selected user interfaces, data
warehouse structure and warehouse engine.
Security
Data warehouse security includes both user access security and physical data security.
A data warehouse is a read-only source of enterprise information; therefore developers
need not be concerned with controlling create, update and delete capabilities through
access security. But, developers do need to address the trade off between protecting a
valuable corporate asset against unauthorized access and making the data accessible to
anyone within the enterprise who can put it to good use. The best solution is to allow
everyone in the enterprise to have access to the enterprise measure definitions and
derivations, but only allow access to the underlying detailed data on an approved, needto-know basis. Developers also need to provide sufficient data security, through backup,
off-site storage, replication, fault-tolerant and/or redundant hardware, etc., to protect the
data from loss due to power failures, equipment malfunction, sabotage, and so on.
- ALAN PERKINS
quality. It must be accurate, relevant, complete, and concise. It must be timely and current. It
must be presented in a way that is clear and understandable. A data warehouse that contains
trusted, strategic information, becomes a valuable enterprise resource for decision makers at all
organizational levels. If it's users discover that it contains bad data, the data warehouse will be
ignored and will fail. Worse, if it contains bad data, but its users never find out and make
decisions based upon the data, it is possible that the enterprise will fail.
- 13 -
- ALAN PERKINS
Development Methodology
Enterprises that consistently produce quality information systems rigorously use a full life
cycle development methodology. Such a methodology is characterized by a sequence
of interrelated steps beginning with determining business requirements and resulting in
system design, development, and implementation. The Software Engineering Institute,
which established the industry-standard, software development capability maturity model
(CMM), declares that a methodology is an absolute necessity in order to be an effective
software developer.
Having a strategically-driven, customer-focused, information-centric, model-based,
disciplined, rigorous, and repeatable methodology is absolutely essential for successful
data warehouse engineering.
Development Tools
A data warehouse is too complex and too massive to be developed using manual
methods. Development tools such as modeling tools, repositories and fourth/fifth
generation programming languages are useful for data warehouse engineering. In
addition, there are several Executive Information System (EIS) and Decision Support
System (DSS) tools that can help with data warehouse access. There are also many
special purpose data warehouse tools including middleware and data
integration/transformation tools.
Some combination of these tools is necessary to quickly and effectively develop and
maintain a data warehouse. The specific tool set will an enterprise uses will depend
upon its data warehousing needs. No matter what tools are used, it is important that the
tools work together and that they can be used within the enterprises chosen technology
environment.
- 14 -
- ALAN PERKINS
Summary
Data warehouse engineering is not like normal application development. Its scope is broader, its
visibility is greater, its user community is larger, and it is more prone to failure.
Before beginning a data warehouse project, an enterprise should evaluate whether it has
adequately addressed the critical success factors for data warehouse engineering.
Business Requirements
Strategic Plan
Performance Measures
Enterprise Architecture
Enterprise Information
Information Systems
Enterprise Technology
Information Quality
Operational Data Quality
Extract, Transform & Load
Development Environment
Project Teams
Methodology
Development Tools
Skills & Knowledge
Addressing the Critical Success Factors for data warehouse engineering will help you
deliver effective strategic information that exactly meets the needs of your enterprise -public or private, large or small -- to the right people, in the right place, at the right time, in
the right format.
- 15 -
- ALAN PERKINS
- 16 -
- ALAN PERKINS
Alan Perkins has been a Systems Analyst on the White House staff, Director of the US Army
Data Processing School in Germany, Vice President of R&D for a virtual corporation, Vice
President of Consulting for a software engineering tools company, and General Manager of a
high-tech consulting firm. He has provided information and enterpris e management consulting to
numerous companies, associations and government agencies.
Mr. Perkins specializes in Enterprise Architecture Engineering. He helps clients quickly engineer
enterprise architectures that are actionable and adaptable. His approach res ults in architectures
that enable and facilitate enterprise initiatives such as Corporate Port als, Enterprise Data
Warehouses, Enterprise Application Integration, Soft ware Component Engineering, etc.
- 17 -