Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Unit 4

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Basic process in data warehouse

Unit 4
Data warehouse
• A Data Warehouse stores a huge amount of data, which is typically collected
from multiple heterogeneous sources like files, DBMS, etc.
• It includes current and historical data to provide a historical perspective of
information.
• A data warehouse should be
1. Time – dependent
• There must be a connection between the information in the warehouse and the
time when it was entered.
• One of the most important aspect of the warehouse as it relates to data mining,
because information can then be sourced according to period.
2. Non-Volatile
• Data in a warehouse is never updated, but used only for queries.
• End-users who want to update data must use operational database.
• A data warehouse will always be filled with historical data.
3. Subject Oriented
• Not all the information in the operational database is useful for a data
warehouse.
• A data warehouse should be designed especially for decision support
and expert system with specific related data.
4. Integrated
• In an operational data, many types of information being used with
different names for same entity.
• In a data warehouse, all entities should be integrated and consistent i.e.
only one name must exist to describe each individual entity.
Architecture of data warehouse
Monitor
& OLAP Server
Other Metadata
sources Integrator

Operational Extract Analysis


DBs Transform Data Serve Query
Load
Refresh Warehouse Reports
Data mining

Data Marts

Data Sources Data Storage OLAP Engine Front-End Tools


Architecture of data warehouse
• Data sources
• A data warehouse system uses heterogenous sources of data either from
operational database or external sources
• Bottom tier
• The bottom tier of the architecture is the data warehouse database server.
• It is the relational database system or multidimensional model.
• Back end tools and utilities are used to feed data into the bottom tier.
• These back end tools and utilities perform some functions.
• Data Extraction - Involves gathering data from multiple heterogeneous sources.
• Data Cleaning - Involves finding and correcting the errors in data.
• Data Transformation - Involves converting the data from legacy format to warehouse format.
• Data Loading - Involves sorting, summarizing, consolidating, checking integrity, and building
indices and partitions.
• Refreshing - Involves updating from data sources to warehouse
Middle Tier:
• In the middle tier, we have the OLAP Server that can be implemented
in either of the following ways.
• By Relational OLAP (ROLAP), which is an extended relational database
management system. The ROLAP maps the operations on multidimensional
data to standard relational operations.
• By Multidimensional OLAP (MOLAP) model, which directly implements the
multidimensional data and operations.
• Top-Tier :
• This tier is the front-end client layer.
• This layer holds the query tools and reporting tools, analysis tools and data
mining tools.
Pros
• Predictions and trend analysis are made easier by storing past data.
• Guarantees data quality and consistency for trustworthy reporting
• Can be scalable
• Easy to retrieve.
• Can handle large volume of data.
• Data warehouses employ security protocols to protect the data
Cons
• Building a data warehouse can be expensive, requiring significant investments in
hardware, software, and personnel.
• maintenance and management can be costly
• Delayed data updates
• Building a data warehouse can take a significant amount of time, requiring
businesses to be patient and committed to the process.
• Data from different sources can be challenging to integrate, requiring significant
effort to ensure consistency and accuracy.
Applications of data warehouse
1. Business Intelligence and Reporting: Facilitates the creation of detailed
reports and dashboards for informed decision-making.
2. Data Mining and Analytics: Supports advanced analytics and predictive
modeling to uncover patterns and insights.
3. Customer Relationship Management (CRM): Enhances customer data
analysis to improve marketing strategies and customer service.
4. Financial Analysis: Provides historical and trend analysis for budgeting,
forecasting, and financial planning.
5. Healthcare Analytics: Improves patient care and operational efficiency
through comprehensive data analysis.
6. Supply Chain Management: Optimizes inventory, logistics, and supplier
relationships by analyzing supply chain data.
7. Retail and Sales Analysis: Analyzes sales trends, customer preferences,
and inventory management.
8. Telecommunications: Manages large volumes of call data for
performance monitoring and customer insights.
9. Government and Public Sector: Supports policy making, resource
allocation, and performance measurement.
10. Education: Analyzes student performance and operational data for
better educational outcomes.
11. Social Media Websites: The social networking websites like
Facebook, Twitter, LinkedIn, etc. are based on analyzing large data
sets. These sites gather data related to members, groups, locations,
etc., and store it in a single central repository. Being a large amount
of data, Data Warehouse is needed for implementing the same. etc

You might also like