CBEC4103 Data Warehousing
CBEC4103 Data Warehousing
CBEC4103 Data Warehousing
1.0 Introduction
In computing, a data warehouse or also known as an enterprise data warehouse, is a system
used for reporting and data analysis. Data warehouses are central repositories of integrated
data from one or more disparate sources (Wikipedia, 2014). They store current and historical
data and are used for creating analytical reports for knowledge workers throughout the
enterprise. Examples of reports could range from annual and quarterly comparisons and
trends to detailed daily sales analyses.
The data stored in the warehouse is uploaded from the operational systems, such as
marketing and sales. The data may pass through an operational data store for additional
operations before it is used in the data warehouses for reporting.
research community. The following is the discussion on the importance of decision making
and data utilization of data warehousing.
2.1
In challenging times, good decision-making becomes critical. The best decisions are made
when all the relevant data available is taken into consideration. The best possible source for
that data is a well-designed data warehouse. Joseph Guerra (2013) postulated that the concept
of data warehousing is deceptively simple. Data is extracted periodically from the
applications that support business processes and copied onto special dedicated computers.
There it can be validated, reformatted, reorganized, summarized, restructured, and
supplemented with data from other sources.
Besides that, the other important aspect of data warehousing according to John Price
(as cited in Dorobek, 1999) is that it provides information on budget and finance, travel,
procurements, property management and human resources for the organization. He also
claimed that before the data warehouse was implemented, it would often take weeks to get
answers, and staff would take weeks to create a report that would often be outdated by the
time it was presented. In fact, before the existence of data warehousing the executives were
relying on old information to make decisions (Dorobek, 1999).
2.2
One of the importance aspect of data utilization is that it helps in organizing the data mined
from the database. In any business, collecting all the information will not going to benefit the
management if they do not know what to do with it. This is true with book reports, travel
itineraries, business plans, and most other things in life. In fact, the same can be applied in
deciding on a retail management software system. It is important for the business that all their
technological tools be the right tools for their respective business, enabling the collection of
data and the optimal utilization of that information.
Besides the explanation above, another advantage of integrating all channels and
departments into an efficient system means that all data can be stored in one place for access
by any appropriate party who has the authority to do so (Dorobek, 1999). This data is
valuable because it is about the specific information of the business, and the mode of
collection, categories etc. need to be planned carefully.
Other than that, another important aspect of data utilization is that it allows questions
to be answered and brainstorming sessions to take place. During these meetings, project goals
and measurable milestones will be established, including defining the data that is essential
and visualizing how the data will be used or how to utilize that data to meet the business
goals (Joseph Guerra, 2013).
For these and many other reasons, the construction and maintenance of a data
warehouse is a very significant and challenging project that needs to be carefully managed.
The following is the discussion of how data warehousing can be used to solve problems in
decision-making as stated by Monica, Sukhdev and Sukhwinder (2013);
3.1
Tactical Query
A tactical query is a database operation that attempts to determine the best course of action
right now. A tactical query provides information to rank and file elements in the field that
need to respond quickly to a set of unfolding events. Tactical queries tend to produce a very
small result set. It is not uncommon for the result set to be less than a dozen rows. Usually the
result set is designed to fit into a single window on a display screen.
3.2
Strategic Query
A strategic query is a database operation that attempts to determine what has happened, why
it happened, and/or what will happen next. It typically accesses vast amounts of detailed data
from the warehouse and ranges in complexity from simple table scans to multi-way joins and
sub queries. Applications that generate strategic queries include; report generation, OLAP,
decision support, ad-hoc, data mining, etc
4.0 Analysis of How Data Warehouse Can Be Developed and Implemented in the
Organisation
According Kimballs Data Warehouse Toolkit (Kimball & Ross, 2002), to develop and
implement a data warehouse, there are six steps that need to be considered. These are the
analysis of the steps involved;
1.
a.
b.
c.
d.
2.
activities:
a. Interviewing a number of potential users to find out what they do, the information they
need and how they analyse it in order to make decisions. It is often helpful to analyse
some of the reports they currently use.
b. Interviewing information systems specialists to find out what data are available in
potential source systems, and how they are organised.
c. Analysing the requirements to establish those that are feasible given available data.
d. Running facilitated workshops that bring representative users and IT staff together to
build consensus about what is needed, what is feasible and where to start.
3.
Design; The goal of the design process is to define the warehouse components that
will need to be built. The architecture, data and application designs are all inter-related, and
are normally produced in parallel. There are three types of design involved. They are:
a. Achitecture Design;
i. how the components will work together;
ii. where they are located (geographically and on what platform);
iii. who uses them;
4.
parallel. That said, the most efficient sequence to begin construction is probably as follows:
a. Tool selection and installation- Selecting tools is best carried out as part of a pilot
exercise, using a sample of real data. This allows the development team to assess how
well competing tools handle problems specific to their organisation, and to test system
performance before committing to purchase.
b. Data staging system- This comprises the physical warehouse database, data feeds and any
associated data marts and aggregates. The following steps are typical:
i. Create target tables in the central warehouse database;
ii. Request initial and regular extracts from source systems;
iii. Write procedures to transform extract data ready for loading (optionally creating
interim tables in a data staging area);
iv. Write procedures to load initial data into the warehouse (using a bulk loader);
v. Create and populate any data marts;
vi. Write procedure to load regular updates into the warehouse;
vii. Develop special procedures for a once-off bulk load of historic data;
viii. Write validation/exception handling procedures;
ix. Write archiving & backup procedures;
x. Create a provisional set of aggregates;
c. Application development;
i. This step can begin once a sample or initial extract has been loaded, but it is usually
best to leave the bulk of application development until the underlying data mart (or
part of the central warehouse) and associated meta-data (especially object names)
are stable.
ii. It is a good idea to involve users in the development of reports and analytic
applications, preferably through prototyping, but at least by asking them to carry out
acceptance testing. Most modern business intelligence tools do not require
programming, so it is possible for non-IT staff to build some of their own reports as
well.
5.
ii.
iii.
Setting up a support organisation to deal with questions about the tools, the
applications and the data. However thoroughly the data were checked and
documented prior to publication, users are likely to spot anomalies requiring
investigation and to need assistance interpreting the results they obtain from the
warehouse and reconciling these with existing reports;
iv.
Providing more advanced tool training later, when users are ready, and assisting
potentialpower users to develop their first few reports.
6.
ii.
iii.
iv.
v.
Maintaining both feeds & meta-data as source systems change over time;
vi.
Tuning the warehouse for maximum performance (this includes managing indexes
and aggregates according to actual usage);
vii.
viii.
Then, there is the big data conundrum which is also typical enterprise data warehouse
operations deal with extremely large amounts of data. Kimball and Ross (2002) further noted
that missing one step in the process, or executing a step at the wrong time, can result in a
significant amount of wasted processing time or, in the worst case scenario, bad data. The
traditional approach to this problem is based on a very general two-step process:
First, determine the appropriate set of information sources to answer the query, and
generate subqueries for each of the sources;
Second, gather results from the information sources, combine them, and return the
final answer to the user. This approach is referred as a lazy or on-demand approach to data
integration, and often uses virtual view(s) technique to simplify query specication.
7.0 Summary
Data warehousing provides leverage for management in an organization. Effective decision
making is the major function of every management in an organization; data warehouses
facilitate meaningful research whichfacilitates effective management processes. With data
warehouse in place, each department in an organizationcan share data and though the costs of
operations will be reduced, this also allows users or management to perform extensive
analysis across all departments in the organization (Nwakanma Ifeanyi et.al, 2014).
8.0 Reference
Alan Perkins (2003). Data Warehouse Critical Factors. Unpublished reports. Retrieved from
http://www.visible.com/company/whitepapers/dwcsf.pdf
C. J. Dorobek (1999). Experts: A good data warehouse improves decision-making. Retrieved
From https://gcn.com/articles/1999/05/experts-a-good-data-warehouse-improvecision
making.aspx.
Jian Yang, K Karlapalem & Q Li (1997). Tackling the Challenges of Materialized View
Design in Data Warehousing Environment. Retrieved from https://pdfs.semanti
cscholar.org/3d74/852d1f7ebb4156f1ce27d61c890349e37047.pdf
Kimball, R and Ross, M (2002). The Data Warehouse Toolkit: The Complete Guide to
Dimensional Modelling, 2nd edn, John Wiley and Sons.
Joseph Guerra (2013). Why You Need a Data Warehouse. 700 West Johnson Avenue
Cheshire, CT 06410 800.775.4261. Retrieved from www.rapiddecision.net.
Malhotra (2006). Decision Environment Improvement using Data Warehouse for Efficient
Organizational Decisions Making. School of Information Technology, GGSIP
University Kashmiri Gate, New Delhi
Monica Pathak, Sukhdev Singh and Sukhwinder Singh Oberoi (2013). Impact of Data
Warehousing and Data Mining in Decision Making. International Journal of
Computer Science and Information Technologies, Vol. 4 (6) , 2013, 995-999
Nwakanma Ifeanyi et.al (2014). The Role of Data Warehousing Concept forImproved
Organizations Performance andDecision Making. Vol 3, Issue 10. International
Journal of Computer Science and Mobile Computing.
Wikipedia (2014). Data Warehousing. Retrieved from https://en.wikipedia.org/wiki
/Data_warehouse