Ch3 Data Warehouse: Dr. Bernard Chen PH.D
Ch3 Data Warehouse: Dr. Bernard Chen PH.D
Ch3 Data Warehouse: Dr. Bernard Chen PH.D
Loosely speaking, a data warehouse refers to a database that is maintained separately from an organizations operational database Officially speaking:
Data WarehouseSubjectOriented
Provide a simple and concise view around particular subject issues by excluding data that are not useful in the decision support process
Data WarehouseIntegrated
Constructed by integrating multiple, heterogeneous data sources relational databases, flat files, on-line transaction records Data cleaning and data integration techniques are applied. Ensure consistency in naming conventions, encoding structures, attribute measures, etc. among different data sources E.g., Hotel price: currency, tax, breakfast covered, etc. When data is moved to the warehouse, it is converted.
The time horizon for the data warehouse is significantly longer than that of operational systems
Data WarehouseNonvolatile
A physically separate store of data transformed from the operational environment Operational update of data does not occur in the data warehouse environment
Does not require transaction processing, recovery, and concurrency control mechanisms
Major task of traditional relational DBMS Day-to-day operations: purchasing, inventory, banking, manufacturing, payroll, registration, accounting, etc.
Major task of data warehouse system Data analysis and decision making
User and system orientation: customer vs. market Data contents: current, detailed vs. historical, consolidated
DBMS tuned for OLTP: access methods, indexing, concurrency control, recovery Warehousetuned for OLAP: complex OLAP queries, multidimensional view, consolidation
A data warehouse is based on a multidimensional data model which views data in the form of a data cube
Data cube
A data cube, such as sales, allows data to be modeled and viewed in multiple dimensions Suppose ALLELETRONICS create a sales data
Data cube
A data cube, such as sales, allows data to be modeled and viewed in multiple dimensions Suppose ALLELETRONICS create a sales data warehouse with respect to dimensions
Practice Question
Star schema: A fact table in the middle connected to a set of dimension tables
It contains:
A large central table (fact table) A set of smaller attendant tables (dimension table), one for each dimension
Star schema
Snowflake schema: A refinement of star schema where some dimensional hierarchy is further splitting (normalized) into a set of smaller dimension tables, forming a shape similar to snowflake
However, the snowflake structure can reduce the effectiveness of browsing, since more joins will be needed
Snowflake schema
Fact constellations: Multiple fact tables share dimension tables, viewed as a collection of stars, therefore called galaxy schema or fact constellation
Fact constellations