Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (1 vote)
158 views

ETL Testing or Data Warehouse Testing Tutorial

ETL Testing

Uploaded by

Asad Hussain
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (1 vote)
158 views

ETL Testing or Data Warehouse Testing Tutorial

ETL Testing

Uploaded by

Asad Hussain
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

1/10/2019 ETL Testing or Data Warehouse Testing Tutorial

(https://www.guru99.com/)

Home (/) Testing

SAP Web Must Learn! Big Data

Live Projects AI Blog (/blog/)

ETL Testing or Data Warehouse Testing Tutorial


Before we learn anything about ETL Testing its important to learn about Business
Intelligence and Dataware. Let’s get started –

What is BI?
Business Intelligence is the process of collecting raw data or business data and turning it
into information that is useful and more meaningful.  The raw data is the records of the daily
transaction of an organization such as interactions with customers, administration of
finance, and management of employee and so on.  These data’s will be used for “Reporting,
Analysis, Data mining, Data quality and Interpretation, Predictive Analysis”.

What is Data Warehouse?


A data warehouse is a database that is designed for query and analysis rather than for
transaction processing. The data warehouse is constructed by integrating the data from
multiple heterogeneous sources.It enables the company or organization to consolidate data
from several sources and separates analysis workload from transaction workload.  Data is
turned into high quality information to meet all enterprise reporting requirements for all
levels of users.

(/images/ETL_Testing/ETLTesting_1.jpg)
https://www.guru99.com/utlimate-guide-etl-datawarehouse-testing.html 1/14
1/10/2019 ETL Testing or Data Warehouse Testing Tutorial
What is ETL?
ETL stands for Extract-Transform-Load and it is a process of how data is loaded from the
source system to the data warehouse.  Data is extracted from an OLTP database,
transformed to match the data warehouse schema and loaded into the data warehouse
database.  Many data warehouses also incorporate data from non-OLTP systems such as text
files, legacy systems and spreadsheets. 

Let see how it works

For example, there is a retail store which has different departments like sales, marketing,
logistics etc.  Each of them is handling the customer information independently, and the
way they store that data is quite different. The sales department have stored it by
customer’s name, while marketing department by customer id.

Now if they want to check the history of the customer and want to know what the different
products he/she bought owing to different marketing campaigns; it would be very tedious.

The solution is to use a Datawarehouse to store information from different sources in a


uniform structure using ETL. ETL can transform dissimilar data sets into an unified
structure.Later use BI tools to derive meaningful insights and reports from this data. 

The following diagram gives you the ROAD MAP of the ETL process

(/images/ETL_Testing/ETLTesting_2.png)

1. Extract

 Extract relevant data

2. Transform
https://www.guru99.com/utlimate-guide-etl-datawarehouse-testing.html 2/14
1/10/2019
 Transform data to DW (Data Warehouse) format
ETL Testing or Data Warehouse Testing Tutorial

Build keys  - A key is one or more data attributes that uniquely  identify an
entity. Various types of keys are primary key, alternate key, foreign key,
composite key, surrogate key. The datawarehouse owns these keys and
never allows any other entity to assign them.
 Cleansing of data :After the data is extracted, it will move into the next
phase, of cleaning and conforming of data. Cleaning does the omission in
the data as well as identifying and fixing the errors.  Conforming means
resolving the conflicts between those data’s that is incompatible, so that
they can be used in an enterprise data warehouse. In addition to these, this
system creates meta-data that is used to diagnose source system problems
and improves data quality.

3. Load

 Load data into DW ( Data Warehouse)


Build aggregates - Creating an aggregate is summarizing and storing data
which is available in fact table in order to improve the performance of end-
user queries.

What is ETL Testing?


ETL testing is done to ensure that the data that has been loaded from a source to the
destination after business transformation is accurate. It also involves the verification of data
at various middle stages that are being used between source and destination. ETL stands for
Extract-Transform-Load.

ETL Testing Process


Similar to other Testing Process, ETL also go through different phases. The different phases
of ETL testing process is as follows

https://www.guru99.com/utlimate-guide-etl-datawarehouse-testing.html 3/14
1/10/2019 ETL Testing or Data Warehouse Testing Tutorial

(/images/ETL_Testing/ETLTesting_3.png)

ETL testing is performed in five stages

1.  Identifying data sources and requirements


2. Data acquisition
3. Implement business logics and dimensional Modelling
4. Build and populate data
5. Build Reports

(/images/ETL_Testing/ETLTesting_4.jpg)

https://www.guru99.com/utlimate-guide-etl-datawarehouse-testing.html 4/14
1/10/2019 ETL Testing or Data Warehouse Testing Tutorial
Types of ETL Testing

Types Of Testing Testing Process

Production Validation Testing  “Table balancing” or “production


reconciliation” this type of ETL  testing is
done on data as it is being moved into
production systems.  To support your
business decision, the data in your
production systems has to be in the correct
order.  Informatica (/informatica-
tutorials.html)Data Validation Option
provides the ETL testing automation and
management capabilities to ensure that
production systems are not compromised
by the data.

Source to Target Testing (Validation Testing) Such type of testing is carried out to
validate whether the data values
transformed are the expected data values.

Application Upgrades Such type of ETL testing can be


automatically generated, saving substantial
test development time. This type of testing
checks whether the data extracted from an
older application or repository are exactly
same as the data in a repository or new
application.

Metadata Testing Metadata testing includes testing of data


type check, data length check and
index/constraint check. 

Data Completeness Testing To verify that all the expected data is loaded
in target from the source, data
completeness testing is done. Some of the
tests that can be run are compare and
validate counts, aggregates and actual data
between the source and target for columns
with simple transformation or no
transformation. 

Data Accuracy Testing This testing is done to ensure that the data
is accurately loaded and transformed as
expected.

Data Transformation Testing Testing data transformation is done as in


many cases it cannot be achieved by
writing one source SQL (/sql.html)query
and comparing the output with the target. 
Multiple SQL queries may need to be run for
each row to verify the transformation rules.
https://www.guru99.com/utlimate-guide-etl-datawarehouse-testing.html 5/14
1/10/2019 ETL Testing or Data Warehouse Testing Tutorial
Data Quality Testing Data Quality Tests includes syntax and
reference tests.  In order to avoid any error
due to date or order number during
business process Data Quality testing is
done. Syntax Tests: It will report dirty data, 
based on invalid characters, character
pattern, incorrect upper or lower case order
etc. Reference Tests: It will check the data
according to the data model.  For example:
Customer ID Data quality testing includes
number check, date check, precision check,
data check , null check etc. 

Incremental ETL testing This testing is done to check the data


integrity of old and new data with the
addition of new data.  Incremental testing
verifies that the inserts and updates are
getting processed as expected during
incremental ETL process.

GUI/Navigation Testing This testing is done to check the navigation


or GUI aspects of the front end reports.

How to create ETL Test Case


ETL testing is a concept which can be applied to different tools and databases in information
management industry.  The objective of ETL testing is to assure that the data that has been
loaded from a source to destination after business transformation is accurate.  It also
involves the verification of data at various middle stages that are being used between source
and destination.

While performing ETL testing, two documents that will always be used by an ETL tester are

1.  ETL mapping sheets :An ETL mapping sheets contain all the information of
source and destination tables including each and every column and their look-
up in reference tables. An ETL testers need to be comfortable with SQL queries
as ETL testing may involve writing big queries with multiple joins to validate
data at any stage of ETL. ETL mapping sheets provide a significant help while
writing queries for data verification.
2. DB Schema of Source, Target: It should be kept handy to verify any detail in
mapping sheets.

ETL Test Scenarios and Test Cases


https://www.guru99.com/utlimate-guide-etl-datawarehouse-testing.html 6/14
 
1/10/2019 ETL Testing or Data Warehouse Testing Tutorial

Test Scenario Test Cases

Mapping doc validation Verify mapping doc whether corresponding


ETL information is provided or not.  Change
log should maintain in every mapping doc.

Validation
1. Validate the source and target table
structure against corresponding
mapping doc.
2. Source data type and target data type
should be same
3. Length of data types in both source and
target should be equal
4. Verify that data field types and formats
are specified  
5. Source data type length should not less
than the target data type length
6. Validate the name of columns in the
table against mapping doc.

Constraint Validation Ensure the constraints are defined for


specific table as expected

Data consistency issues  


1. The data type and length for a particular
attribute may vary in files or tables
though the semantic definition is the
same.
2. Misuse of integrity constraints

Completeness Issues
1. Ensure that all expected data is loaded
into target table.
2. Compare record counts between source
and target.
3. Check for any rejected records
4. Check data should not be truncated in
the column of target tables
5. Check boundary value analysis
6. Compares unique values of key fields
between data loaded to WH and source
data

https://www.guru99.com/utlimate-guide-etl-datawarehouse-testing.html 7/14
1/10/2019 ETL Testing or Data Warehouse Testing Tutorial
Correctness Issues
1. Data that is misspelled or inaccurately
recorded
2. Null, non-unique or out of range data

Transformation Transformation

Data Quality
1. Number check: Need to number check
and validate it
2. Date Check: They have to follow date
format and it should be same across all
records
3. Precision Check
4. Data check
5. Null check

Null Validate Verify the null values, where “Not Null”


specified for a specific column.

Duplicate Check
1. Needs to validate the unique key,
primary key and any other column
should be unique as per the business
requirements are having any duplicate
rows
2. Check if any duplicate values exist in any
column which is extracting from
multiple columns in source and
combining into one column
3. As per the client requirements, needs to
be ensure that no duplicates in
combination of multiple columns within
target only

Date Validation Date values are using many areas in ETL


development for

1. To know the row creation date


2. Identify active records as per the ETL
development perspective
3. Identify active records as per the
business requirements perspective
4. Sometimes based on the date values the
updates and inserts are generated.

https://www.guru99.com/utlimate-guide-etl-datawarehouse-testing.html 8/14
1/10/2019 ETL Testing or Data Warehouse Testing Tutorial
Complete Data Validation
1. To validate the complete data set in
source and target table minus a query in
a best solution
2. We need to source minus target and
target minus source
3. If minus query returns any value those
should be considered as mismatching
rows
4. Needs to matching rows among source
and target using intersect statement
5. The count returned by intersect should
match with individual counts of source
and target tables
6. If minus query returns of rows and count
intersect is less than source count or
target table then we can consider as
duplicate rows are existed.

Data Cleanness Unnecessary columns should be deleted


before loading into the staging area.

Types of ETL Bugs

(/images/ETL_Testing/ETLTesting_5.png)

https://www.guru99.com/utlimate-guide-etl-datawarehouse-testing.html 9/14
1/10/2019 ETL Testing or Data Warehouse Testing Tutorial
Type of Bugs Description

User interface bugs/cosmetic bugs


 Related to GUI of application
 Font style, font size, colors, alignment,
spelling mistakes, navigation and so on

Boundary Value Analysis (BVA) related bug


Minimum and maximum values

Equivalence Class Partitioning (ECP) related


bug  Valid and invalid type

Input/Output bugs
Valid values not accepted
 Invalid values accepted

Calculation bugs
Mathematical errors
Final output is wrong

Load Condition bugs


Does not allows multiple users
Does not allows customer expected load

Race Condition bugs


System crash & hang
System cannot run client platforms

Version control bugs


No logo matching
 No version information available
This occurs usually in Regression Testing
(/regression-testing.html)

H/W bugs
Device is not responding to the
application

https://www.guru99.com/utlimate-guide-etl-datawarehouse-testing.html 10/14
1/10/2019 ETL Testing or Data Warehouse Testing Tutorial
Help Source bugs
Mistakes in help documents

Di erence between Database testing and ETL testing


 

ETL Testing Data Base Testing

Verifies whether data is moved as expected The primary goal is to check if the data is
following the rules/ standards defined in
the Data Model

Verifies whether counts in the source and Verify that there are no orphan records and
target are matching Verifies whether the foreign-primary key relations are
data transformed is as per expectation maintained

Verifies that the foreign primary key Verifies that there are no redundant tables
relations are preserved during the ETL and database is optimally normalized

Verifies for duplication in loaded data Verify if data is missing in columns where
required

Responsibilities of an ETL tester


Key responsibilities of an ETL tester are segregated into three categories

Stage table/ SFS or MFS


Business transformation logic applied
Target table loading from stage file or table after applying a transformation. 

Some of the responsibilities of an ETL tester are

Test ETL software


Test components of  ETL datawarehouse
Execute backend data-driven test
Create, design and execute test cases, test plans and test harness
Identify the problem and provide solutions for potential issues
Approve requirements and design specifications
Data transfers and Test flat file
https://www.guru99.com/utlimate-guide-etl-datawarehouse-testing.html 11/14
1/10/2019
Writing SQL queries3 for various scenarios like count test
ETL Testing or Data Warehouse Testing Tutorial

ETL Performance Testing and Tuning


ETL Performance Testing (/performance-testing.html) is a confirmation test to ensure that
an ETL system can handle the load of multiple users and transactions.  The goal of
performance tuning is to optimize session performance by eliminating performance
bottlenecks. To tune or improve the performance of the session, you have to identify
performance bottlenecks and eliminate it. Performance bottlenecks can be found in source
and target databases, the mapping, the session and the system. One of the best tools used
for Performance Testing is Informatica.

Automation of ETL Testing


The general methodology of ETL testing is to use SQL scripting or do “eyeballing” of data..
These approaches to ETL testing are time-consuming, error-prone and seldom provide
complete test coverage. To accelerate, improve coverage, reduce costs, improve Defect
(/the-unconventional-guide-to-defect-management.html)detection ration of ETL testing in
production and development environments, automation is the need of the hour. One such
tool is Informatica.

Best Practices for ETL Testing

1. Make sure data is transformed correctly


2.  Without any data loss and truncation projected data should be loaded into the data
warehouse
3.  Ensure that ETL application appropriately rejects and replaces with default values and
reports invalid data
4.  Need to ensure that the data loaded in data warehouse within prescribed and expected
time frames to confirm scalability and performance
5.  All methods should have appropriate unit tests regardless of visibility
6. To measure their effectiveness all unit tests should use appropriate coverage techniques
7. Strive for one assertion per test case
8.  Create unit tests that target exceptions

Checkout - ETL Testing Interview Questions & Answers (/etl-testing-interview-


questions.html)

https://www.guru99.com/utlimate-guide-etl-datawarehouse-testing.html 12/14
 Prev (/learn-sap-testing-create-your-first-sap-test-case.html)
1/10/2019 ETL Testing or Data Warehouse Testing Tutorial
Report a Bug

Next  (/data-testing.html)

 (https://www.facebook.com/guru99com/) 
(https://twitter.com/guru99com) 
(https://www.youtube.com/channel/UC19i1XD6k88KqHlET8atqFQ)

(https://forms.aweber.com/form/46/724807646.htm)

About
About US (/about-us.html)
Advertise with Us (/advertise-us.html)
Write For Us (/become-an-instructor.html)
Contact US (/contact-us.html)

Career Suggestion
SAP Career Suggestion Tool (/best-sap-module.html)
Software Testing as a Career (/software-testing-career-
complete-guide.html)
Certificates (/certificate-it-professional.html)

Interesting
Books to Read! (/books.html)
Suggest a Tutorial
Blog (/blog/)
Quiz (/tests.html)
eBook (/ebook-pdf.html)

Execute online
Execute Java Online (/try-java-editor.html)
Execute Javascript (/execute-javascript-online.html)
Execute HTML (/execute-html-online.html)
Execute Python (/execute-python-online.html)
https://www.guru99.com/utlimate-guide-etl-datawarehouse-testing.html 13/14
1/10/2019 ETL Testing or Data Warehouse Testing Tutorial

© Copyright - Guru99 2019


        Privacy Policy (/privacy-policy.html)

https://www.guru99.com/utlimate-guide-etl-datawarehouse-testing.html 14/14

You might also like