Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Ab Initio - DQE and Its Inclusion With MDHub v1.0 PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 24
At a glance
Powered by AI
The key takeaways are that Ab Initio's Data Quality Environment allows users to access data sources, write validation tests, detect issues, and publish results to the Metadata Hub. It also integrates with the Metadata Hub to provide data quality reporting and lineage.

Data quality refers to the completeness, validity, consistency, timeliness and accuracy of data for a specific use. Aspects of data quality include accuracy, completeness, update status, relevance, consistency across sources, reliability, appropriate presentation and accessibility.

Using the Ab Initio Data Quality Engine (DQE), users can access data sources, join sources, compile lookups, write validation tests, run applications to detect issues, compute metrics and profiles, and publish results to the Metadata Hub.

1

Ab Initio
DQE and its inclusion with
MDHub
Autor: Ankit Jain
2
Contents

Data Quality – Concepts


Express IT & DQE
Metadata HUB – concepts
Importing results from Express IT DQE into MDHub

Capgemini Insights & Data – Ab Initio CoE | Jan 2018 © Capgemini 2017. All rights reserved | 2
3
Data Quality

Definition : The state of completeness, validity, consistency,


timeliness and accuracy that makes data appropriate for a specific
use.

Aspects of data quality include:

 Accuracy
 Completeness
 Update status
 Relevance
 Consistency across data sources
 Reliability
 Appropriate presentation
 Accessibility

Capgemini Insights & Data – Ab Initio CoE | Jan 2018 © Capgemini 2017. All rights reserved | 3
4
Ab Initio Data Quality

Ab Initio’s Data Quality Environment is an integrated data quality solution that is


essential to enterprise-level data processing and data management systems.

Functional details
Using the Ab Initio Data Quality Engine (DQE), business users can do the following:
Access data sources from files or databases, join data sources for subsequent data quality analysis, and
compile lookup files for use in data quality tests
Write validation tests that can detect null or blank values, valid and invalid values, data patterns, invalid
data relationships, and the uniqueness of key values
Run the data quality application to compile lists of issues in the data source, compute data quality
metrics, profile the input data, and publish the results to the Metadata Hub
Unload reference data — domain code sets and other information — from the Metadata Hub.

Capgemini Insights & Data – Ab Initio CoE | Jan 2018 © Capgemini 2017. All rights reserved | 4
5
Architecture

DETECT DQ
Data In Data Out
PROBLEMS

DQ DQ
REPORTING ISSUE
SYSTEM ARCHIEVE

Copyright © 2017 Capgemini. All Rights Reserved 5


6

DQ Processing Workflow

1 E 2
IN Validation Clean Up OUT
A Rules Rules
3 B
Compute DQ 4
Stats /
Profiles
A L E R T
Problem
Stats in EME Records
Achieve
Profiles in / Other
EME Work
flows
5
Compute
History A L E R T

Capgemini Insights & Data – Ab Initio CoE | Jan 2018 © Capgemini 2017. All rights reserved | 6
7
Hands on with DQE in Express IT

 Express IT Details :
 Open Express IT with your UNIX id and password - http://10.102.22.111:6561/appconf
Private Project should have public projects - STDENV and DataQuality included during
checkout.

 GDE Details :
 Current APP_HUB Path: /usr/local/abinitio/abinitio-app-hub
 Order of Project Check Out : stdenv -> common_io -> dataquality -> dq-examples ->
private project (DQE_trn201609)

 Connection Screen shots :

Capgemini Insights & Data – Ab Initio CoE | Jan 2018 © Capgemini 2017. All rights reserved | 7
8
Validation Rules

Pattern Search – S*<required pattern> eg. Valid Pincode starting with “4” - S”4….”

Lookup Match – Using “Create Lookup”, first create a configuration of Lookup file and
publish it. Use - L”Lookup Configuration Name” within – eg - L"create_cust_lkp"

Format Check – Date Format such as Julian Date - date("YYMMDD",century="1900")

Rollup – Check “Validate Dataset Using Rollup”. Then navigate to “Validate Dataset Using
Rollup.” and create variables. Once done, then open “Rollup Computations” . Notice, the
list of Rollup functions under – Keywords and Functions -> Rollup Functions

Capgemini Insights & Data – Ab Initio CoE | Jan 2018 © Capgemini 2017. All rights reserved | 8
9
Validation Rule set

Create user defined Validation rule set


• Rule
• Disposition
• Issue Code
• Details
• Field Value

Capgemini Insights & Data – Ab Initio CoE | Jan 2018 © Capgemini 2017. All rights reserved | 9
10
Validation Rule Value
Metrics of each rule is for stored for tracking purpose

Capgemini Insights & Data – Ab Initio CoE | Jan 2018 © Capgemini 2017. All rights reserved | 10
11
Inclusion with MD Hub

Integration of issue count, metrics and dataset in MDHub

Login into UNIX using putty. Go to MHUB config directory.

Source import.profile

Run below command


mh-import dq-load \
-issue-counts-file $OUTPUT_DIRECTORY/dq-issue-count.dat \
-metric-scores-file $OUTPUT_DIRECTORY/dq-metric-score.dat \
-ds-info-file $OUTPUT_DIRECTORY/dq-dataset-info.dat \
-a $MHUB_URL-u $USERNAME -p $PASSWORD

Capgemini Insights & Data – Ab Initio CoE | Jan 2018 © Capgemini 2017. All rights reserved | 11
12
How to see DQ in MH?

Login Metadata hub portal – URL - http://10.102.22.111:6261/COE


Click on Data Quality from portal

Capgemini Insights & Data – Ab Initio CoE | Jan 2018 © Capgemini 2017. All rights reserved | 12
13

Expand “Reports” and then click on “Datasets with Data Quality”

Capgemini Insights & Data – Ab Initio CoE | Jan 2018 © Capgemini 2017. All rights reserved | 13
14

Select data set for which you want to see DQ

Capgemini Insights & Data – Ab Initio CoE | Jan 2018 © Capgemini 2017. All rights reserved | 14
15

Click on respective data set and then click on “Data Quality”

Capgemini Insights & Data – Ab Initio CoE | Jan 2018 © Capgemini 2017. All rights reserved | 15
16

See Data Quality Metric Trends for a selected dataset

Capgemini Insights & Data – Ab Initio CoE | Jan 2018 © Capgemini 2017. All rights reserved | 16
17
Variations of Reports in MD Hub
1. DQ Detection and Reporting –
 Based on Filters – such as Errors, Issues, Fields, analyze
DQ.
 Create Graphs, Pie Charts, etc. for graphical usage.

Capgemini Insights & Data – Ab Initio CoE | Jan 2018 © Capgemini 2017. All rights reserved | 17
18
Variations of Reports in MD Hub (Cont’d.)
2. DQ Metrics–
 Based on Metrics – such as Stability, Accuracy, etc. analyze
Input Data.
 Create Graphs, Pie Charts, etc. for graphical usage.

Capgemini Insights & Data – Ab Initio CoE | Jan 2018 © Capgemini 2017. All rights reserved | 18
19
Variations of Reports in MD Hub (Cont’d.)

3. DQ Aggregated Metrics–
 Create History and ultimately start getting more insights of
data.

Capgemini Insights & Data – Ab Initio CoE | Jan 2018 © Capgemini 2017. All rights reserved | 19
20
Reporting - Lineage in MD Hub with EME

Expanded lineage diagram in the EME

Capgemini Insights & Data – Ab Initio CoE | Jan 2018 © Capgemini 2017. All rights reserved | 20
21
Reporting – Data Profiler

Capgemini Insights & Data – Ab Initio CoE | Jan 2018 © Capgemini 2017. All rights reserved | 21
22
Reporting – Data Profiler @ Field Level

Capgemini Insights & Data – Ab Initio CoE | Jan 2018 © Capgemini 2017. All rights reserved | 22
23

Questions?

Capgemini Insights & Data – Ab Initio CoE | Jan 2018 © Capgemini 2017. All rights reserved | 23
24

Thank You…

Capgemini Insights & Data – Ab Initio CoE | Jan 2018 © Capgemini 2017. All rights reserved | 24

You might also like