Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
62 views24 pages

Data Extraction, Cleanup and Transformation Tools: T.R.Lekhaa Ap-It Snsce

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 24

Data Extraction, Cleanup and

Transformation Tools
T.R.Lekhaa
AP-IT
SNSCE

UNIT -1 DATA EXTRACTION, CLEANUP AND


7/16/2019 1
TRANSFORMATION TOOLS
UNIT -1 DATA EXTRACTION, CLEANUP AND
7/16/2019 2
TRANSFORMATION TOOLS
Tool Requirements
• Data transformation from one format to another based on
differences between source and target platforms
• Data transformation & calculation based on application of
business rules that force certain transformations
• Data consolidation & integration – combining several source
records into single record to be loaded in DW
• Metadata synchronization & management – included storing,
updating metadata definitions about source data files

UNIT -1 DATA EXTRACTION, CLEANUP AND


7/16/2019 3
TRANSFORMATION TOOLS
Criteria that affects tools ability to transform,
consolidate, integrate and repair data

• Ability to identify data in data source environment


• Support for flat files, indexed files
• Capability to merge data from multiple data stores
• Ability to read information from data dictionaries
• Capability to create summarization, aggregation and
derivation records and fields

UNIT -1 DATA EXTRACTION, CLEANUP AND


7/16/2019 4
TRANSFORMATION TOOLS
Vendor Approaches
• Task of capturing data from source data system,
cleaning and transforming it & then loading into
target data system carried out either by separate
products or by single integrated solution.
– Code generators
– Database data replication
– Rule-driven dynamic transformation engines or data mart
builders

UNIT -1 DATA EXTRACTION, CLEANUP AND


7/16/2019 5
TRANSFORMATION TOOLS
Vendor Approaches – Code Generators
• It creates transformation programs based on
– source and target data definitions,
– and data transformation and enhancement rules defined by the
developer.
• These products employ DML Statements to capture a set of
the data from source system.
• Capture changes to source data by processing the recovery
log files of source system
• These are used for data conversion projects, and for building
an enterprise-wide data warehouse,
– when there is a significant amount of data transformation to be done
involving a variety of different flat files, non-relational, and relational
data sources.

UNIT -1 DATA EXTRACTION, CLEANUP AND


7/16/2019 6
TRANSFORMATION TOOLS
Vendor Approaches – Database
data replication tools
• Employ recovery log to capture changes to a single data
source on one system and apply the changes to a copy of the
data source data located on a different system.
• These point-to-point tools are used for
– disaster recovery and
– to build an operational data store, a data warehouse, or a
data mart
• when the number of data sources involved are small and a limited
amount of data transformation and enhancement is required.

UNIT -1 DATA EXTRACTION, CLEANUP AND


7/16/2019 7
TRANSFORMATION TOOLS
Rule-driven Dynamic Transformation Engines
(Data Mart Builders)
• Capture data from a source system at User-defined intervals,
transform data, and then send and load the results into a
target environment, typically a data mart.
• Data to be captured from source system is usually defined
using query language statements, and data transformation
and enhancement is done on a script or a function logic
defined to the tool.

UNIT -1 DATA EXTRACTION, CLEANUP AND


7/16/2019 8
TRANSFORMATION TOOLS
Access to legacy data
• To manage the interaction between the new
applications & growing DW -> organizations use
middleware solutions
• Apertus Corporation -> developed Enterprise/Access
Provides 3 tier • Designed for scalability and manageability in DW
architecture -> • Development tool -> enables rapid development of
defines how transparent, production quality client/server
applications are interfaces to legacy applications
partitioned to
meet integration &
migration
objectives

UNIT -1 DATA EXTRACTION, CLEANUP AND


7/16/2019 9
TRANSFORMATION TOOLS
Access to legacy data – 3 tier architecture
Data Layer -> provides data access & transformation services for
management of corporate data assets
Manages data & enforces the business rules for data integrity

Process Layer -> provides services to manage automation & support for
center business processes

User Layer -> manages user interaction with process and data layer
services

UNIT -1 DATA EXTRACTION, CLEANUP AND


7/16/2019 10
TRANSFORMATION TOOLS
Two virtual database models for
Enterprise/Access are applicable to the
development of DW
1. Enterprise/Access acts as a virtual DB
– Host interfaces & business transactions are defined
entirely in Enterprise/Access
– This model is used when building a generalized
architecture & modernizing applications
2. Enterprise/Access & open Gateway use SQL
server as a virtual DB
– Host interfaces are located in Enterprise/Access while
business logic is implemented via stored procedures
– This model used when migrating legacy system
functionality to relational DB like SYBASE

UNIT -1 DATA EXTRACTION, CLEANUP AND


7/16/2019 11
TRANSFORMATION TOOLS
UNIT -1 DATA EXTRACTION, CLEANUP AND
7/16/2019 12
TRANSFORMATION TOOLS
Enterprise/Access
• Enterprise/Access developers build services that
communicate with legacy applications & map application-
specific messages, such as terminal screens or reports into
client server interface.
• These services are stored in central repository -> so
simultaneously shared by multiple client applications & easily
reused or enhanced.
• It provides consistent API -> allows developers to access
relational DB like SYBASE, ORACLE -> used to integrate new DB
system & to interchange DB system without changes
• Provides broad communication support, enabling access to
virtually any legacy application or database
UNIT -1 DATA EXTRACTION, CLEANUP AND
7/16/2019 13
TRANSFORMATION TOOLS
Enterprise/Access
• Enterprise/Access services can be accessed via any
tool supporting SYBASE SQL Server API, Microsoft
ODBC like Microsoft Access & VB, Microsoft Visual
C++.

UNIT -1 DATA EXTRACTION, CLEANUP AND


7/16/2019 14
TRANSFORMATION TOOLS
Vendor Solutions
• Prism Solutions
• SAS Institute
• Carleton corporation’s PASSPORT & Metacenter
• Vality corporation
• Evolutionary Technologies – ETI – Extract Tool Suite
• EDA (External Databases Access)/SQL from Information
Builders

UNIT -1 DATA EXTRACTION, CLEANUP AND


7/16/2019 15
TRANSFORMATION TOOLS
Vendor Solutions - Prism Solutions
• Provides solution for DW by mapping source data to
target DBMS to be used as a warehouse.
• Warehouse manager generates code to extract &
integrate data, create & manage metadata, build a
subject-oriented, historical base.

UNIT -1 DATA EXTRACTION, CLEANUP AND


7/16/2019 16
TRANSFORMATION TOOLS
Vendor Solutions – SAS Institute
• Its data repository function can act to build
informational DB.

UNIT -1 DATA EXTRACTION, CLEANUP AND


7/16/2019 17
TRANSFORMATION TOOLS
Carleton corporation’s PASSPORT &
Metacenter
• PASSPORT & Metacenter provides solution for
migrating data to new DW
• PASSPORT:
– Consists of 2 components

Mainframe based Workstation based

Collects file, record, table Used to create meta data


layouts for required inputs & directory from which it
outputs & converts them to builds COBOL programs to
PDL actually create the extracts

UNIT -1 DATA EXTRACTION, CLEANUP AND


7/16/2019 18
TRANSFORMATION TOOLS
PASSPORT highlights
• Data Access – data-dictionary driven
• Data Analysis & auditing – provides audit reports
• Language & design – supports predefined
calculations, arrays, loops
• PDL – free form command structure with English
• Run time environment – supports dynamic fields
• Report writing – supports unlimited no. of line
formats

UNIT -1 DATA EXTRACTION, CLEANUP AND


7/16/2019 19
TRANSFORMATION TOOLS
Metacenter
• Designed to put user in control of DW
• Capabilities of Metacenter:
– Data extraction & transformation
– Event management & notification
– Data mart subscription

UNIT -1 DATA EXTRACTION, CLEANUP AND


7/16/2019 20
TRANSFORMATION TOOLS
Vality corporation
• Integrity data reengineering tool used to investigate
standardize, transform & integrate data from
multiple operational systems

UNIT -1 DATA EXTRACTION, CLEANUP AND


7/16/2019 21
TRANSFORMATION TOOLS
Evolutionary technologies - ETI
• Tool for extraction and transformation
• Supports data collection, conversion & migration
from variety of platforms, OS, DBMS
• Automatically generates & executes programs in
appropriate languages for source & target platforms
• Provides powerful metadata facility that allows to
track information about stored data
• Provides graphical interface that allows users to
indicate how to move data

UNIT -1 DATA EXTRACTION, CLEANUP AND


7/16/2019 22
TRANSFORMATION TOOLS
ETI – 2 productivity tools
Master toolset Data conversion toolset

Set of interactive editors that Set of tools that provides


allows a system graphical point & click
programmer(master user) to define interface for defining mapping
meta store database of data between various
source data systems & target
data systems

UNIT -1 DATA EXTRACTION, CLEANUP AND


7/16/2019 23
TRANSFORMATION TOOLS
Assessment

UNIT -1 DATA EXTRACTION, CLEANUP AND


7/16/2019 24
TRANSFORMATION TOOLS

You might also like