9.1-ETL Process
9.1-ETL Process
9.1-ETL Process
Data Extraction
• This function has to deal with numerous data sources. You have to
employ the appropriate technique for each data source.
• Source data may be from different source machines in diverse data
formats.
• Part of the source data may be in relational database systems.
• Some data may be on other legacy network and hierarchical data
models.
• Many data sources may still be in flat files.
• You may want to include data from spreadsheets and local
departmental data sets.
Data Extraction
• Data extraction may become quite complex.
• More frequently, data warehouse implementation teams extract the
source into a separate physical environment from which moving the
data into the data warehouse would be easier.
• In the separate environment, you may extract the source data into
group of flat files, or a data-staging relational database
Data Transformation
• You perform a number of individual tasks as part of data
transformation.
• First, you clean the data extracted from each source.
• Cleaning may just be correction of miss spellings, or may deal with
providing default values for missing data elements, or elimination of
duplicates when you bring in the same data from multiple source
systems.
Data Transformation
• Standardization of data elements forms a large part of data
transformation. You standardize the data types and field lengths for
same data elements retrieved from the various sources.
• When the data transformation function ends, you have a collection of
integrated data that is cleaned, standardized, and summarized.
• You now have data ready to load into each data set in your data
warehouse.
Data Loading
• Two distinct groups of tasks form the data loading function.
• When you complete the design and construction of the data
warehouse and go live for the first time, you do the initial loading of
the data into the data warehouse storage.
• The initial load moves large volumes of data using up substantial
amounts of time.
• As the data warehouse starts functioning, you continue to extract the
changes to the source data and feed the incremental data on an
ongoing basis.
Data Movements to the Data Warehouse