Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
101
Introduction to Data Warehousing
          Fundamentals
Definition of a Data Warehouse
• A data warehouse is an enterprise
  structured repository of subject-oriented,
  time-variant data used for information
  retrieval and decision support. The data
  warehouse stores atomic and summary
  data.
Typical Data Warehousing Process
 Phase I: STRATEGY
 Identify business requirements.
 Define objectives and purpose of DW.
   Phase II: DEFINITION
   Project scoping and planning: Using building block
   approach
              Phase III: ANALYSIS
              Information requirements are defined.
                      Phase IV: DESIGN
                      Database structures to hold base data and
                      summaries are created. Translation
                      mechanisms are designed.
                             Phase V: BUILD AND DOCUMENT
                             The warehouse is built and documentation is
                             developed.
                               Phase VI: POPULATE, TEST, AND TRAIN
        Iterative              The warehouse is populated and tested. The users
                               are trained on system and tools.
                                   Phase VII: DISCOVERY AND EVOLUTION
                                   The warehouse is monitored and adjustments are
                                   applied, or future extensions are planned.
Data Warehouse Compared to OLTP
Property         OLTP                    Data Warehouse
Activities       Processes               Analysis
Response Time    Subseconds              Seconds to hours
                 to seconds
Operations       DML                     Primarily read-only
Nature of Data   Current                 Snapshots over time

Data Organized   By application          By subject, time
Size             Small to large          Large to very large
Data Sources     Operational, internal   Operational, internal,
                                         external
Data Warehouse Compared
             with Data Mart
Property         Data Warehouse    Data Mart
Scope            Enterprise        Department
Subjects         Multiple          Single-subject, line
                                   of business (LOB)
Data Source      Many              Few
Size (typical)   See notes below   See notes below
Implementation   Months to years   Months
Time
Independent Versus Dependent Marts
                        Data                          Data
Sources                 marts   Sources               marts




                                            Ware-
                                            house




          Independent                     Dependent
Independent Data Mart
Operational
systems


                Flat files



                             Sales or
                             marketing
                             data mart




External data
Dependent Data Mart
Operational                  Data warehouse   Data mart
systems


                Flat files
                                              Marketing


                               Marketing
                               Sales
                               Finance          Sales
                               Human
                               Resources


                                               Finance
External data
Purpose of an Enterprise Model
 Extract                Transform/Load                                 Publish       Subscribe
                                                          Federated data warehouse
    Flat files
                                      TL                  Dependent data marts



                 Staging areas
                                                                   L




                                                                                        Access layers
                                                                                                        Portal
                                 Transformations
 Operational
                                                                           B2C
             E

RDBMS                                                                      B2B

    External                                       Enterprise
                                                   model               Clickstream
Server log                                         (atomic data)
files


                 Metadata repository
Extract, Transform, Load (ETL)
              Processes
– Extract source data.            – Load data into warehouse.
– Transform/clean data.           – Detect changes.
– Index and summarize.            – Refresh data.




                          Programs

                          Gateways

Operational systems       Tools               Warehouse
                                  ETL
ETL Processes
  – Must result in data that is relevant, useful, high-
    quality, accurate, and accessible
  – Require a large proportion of warehouse
    development time and resources

                                                  Relevant
                        Clean up                  Useful

                        Consolidate               Quality

Operational systems     Restructure   Warehouse   Accurate

                            ETL                   Accessible
Possible Reasons for ETL Failure
– A missing source file
– A system failure
– Inadequate metadata
– Poor mapping information
– Inadequate storage planning
– A source structural change
– No contingency plan
– Inadequate data validation
Typical Warehousing Development
              Tasks
                 Define source metadata
Source           Define staging area metadata
                 Map source to staging area
to               Deploy database structures
staging          Deploy mappings
                 Extract data into staging tables
                 Define enterprise model (warehouse) metadata
Staging          Map staging area to enterprise model
to               Deploy database structures
warehouse        Deploy mappings
                 Extract data into the enterprise model
                 Define data mart metadata (cubes, dimensions)
Warehouse        Map enterprise model to data marts
to               Deploy database structures
data marts       Deploy mappings
                 Extract data into the data mart
                 Refresh warehouse and data mart
Administration
                 Maintain warehouse and data mart
Visit more self help tutorials

• Pick a tutorial of your choice and browse
  through it at your own pace.
• The tutorials section is free, self-guiding and
  will not involve any additional support.
• Visit us at www.dataminingtools.net

More Related Content

Oracle: Fundamental Of DW

  • 1. 101 Introduction to Data Warehousing Fundamentals
  • 2. Definition of a Data Warehouse • A data warehouse is an enterprise structured repository of subject-oriented, time-variant data used for information retrieval and decision support. The data warehouse stores atomic and summary data.
  • 3. Typical Data Warehousing Process Phase I: STRATEGY Identify business requirements. Define objectives and purpose of DW. Phase II: DEFINITION Project scoping and planning: Using building block approach Phase III: ANALYSIS Information requirements are defined. Phase IV: DESIGN Database structures to hold base data and summaries are created. Translation mechanisms are designed. Phase V: BUILD AND DOCUMENT The warehouse is built and documentation is developed. Phase VI: POPULATE, TEST, AND TRAIN Iterative The warehouse is populated and tested. The users are trained on system and tools. Phase VII: DISCOVERY AND EVOLUTION The warehouse is monitored and adjustments are applied, or future extensions are planned.
  • 4. Data Warehouse Compared to OLTP Property OLTP Data Warehouse Activities Processes Analysis Response Time Subseconds Seconds to hours to seconds Operations DML Primarily read-only Nature of Data Current Snapshots over time Data Organized By application By subject, time Size Small to large Large to very large Data Sources Operational, internal Operational, internal, external
  • 5. Data Warehouse Compared with Data Mart Property Data Warehouse Data Mart Scope Enterprise Department Subjects Multiple Single-subject, line of business (LOB) Data Source Many Few Size (typical) See notes below See notes below Implementation Months to years Months Time
  • 6. Independent Versus Dependent Marts Data Data Sources marts Sources marts Ware- house Independent Dependent
  • 7. Independent Data Mart Operational systems Flat files Sales or marketing data mart External data
  • 8. Dependent Data Mart Operational Data warehouse Data mart systems Flat files Marketing Marketing Sales Finance Sales Human Resources Finance External data
  • 9. Purpose of an Enterprise Model Extract Transform/Load Publish Subscribe Federated data warehouse Flat files TL Dependent data marts Staging areas L Access layers Portal Transformations Operational B2C E RDBMS B2B External Enterprise model Clickstream Server log (atomic data) files Metadata repository
  • 10. Extract, Transform, Load (ETL) Processes – Extract source data. – Load data into warehouse. – Transform/clean data. – Detect changes. – Index and summarize. – Refresh data. Programs Gateways Operational systems Tools Warehouse ETL
  • 11. ETL Processes – Must result in data that is relevant, useful, high- quality, accurate, and accessible – Require a large proportion of warehouse development time and resources Relevant Clean up Useful Consolidate Quality Operational systems Restructure Warehouse Accurate ETL Accessible
  • 12. Possible Reasons for ETL Failure – A missing source file – A system failure – Inadequate metadata – Poor mapping information – Inadequate storage planning – A source structural change – No contingency plan – Inadequate data validation
  • 13. Typical Warehousing Development Tasks Define source metadata Source Define staging area metadata Map source to staging area to Deploy database structures staging Deploy mappings Extract data into staging tables Define enterprise model (warehouse) metadata Staging Map staging area to enterprise model to Deploy database structures warehouse Deploy mappings Extract data into the enterprise model Define data mart metadata (cubes, dimensions) Warehouse Map enterprise model to data marts to Deploy database structures data marts Deploy mappings Extract data into the data mart Refresh warehouse and data mart Administration Maintain warehouse and data mart
  • 14. Visit more self help tutorials • Pick a tutorial of your choice and browse through it at your own pace. • The tutorials section is free, self-guiding and will not involve any additional support. • Visit us at www.dataminingtools.net