Unit 4
Unit 4
Unit 4
Unit 4
Data warehouse
• A Data Warehouse stores a huge amount of data, which is typically collected
from multiple heterogeneous sources like files, DBMS, etc.
• It includes current and historical data to provide a historical perspective of
information.
• A data warehouse should be
1. Time – dependent
• There must be a connection between the information in the warehouse and the
time when it was entered.
• One of the most important aspect of the warehouse as it relates to data mining,
because information can then be sourced according to period.
2. Non-Volatile
• Data in a warehouse is never updated, but used only for queries.
• End-users who want to update data must use operational database.
• A data warehouse will always be filled with historical data.
3. Subject Oriented
• Not all the information in the operational database is useful for a data
warehouse.
• A data warehouse should be designed especially for decision support
and expert system with specific related data.
4. Integrated
• In an operational data, many types of information being used with
different names for same entity.
• In a data warehouse, all entities should be integrated and consistent i.e.
only one name must exist to describe each individual entity.
Architecture of data warehouse
Monitor
& OLAP Server
Other Metadata
sources Integrator
Data Marts