Ab Initio - DQE and Its Inclusion With MDHub v1.0 PDF
Ab Initio - DQE and Its Inclusion With MDHub v1.0 PDF
Ab Initio - DQE and Its Inclusion With MDHub v1.0 PDF
Ab Initio
DQE and its inclusion with
MDHub
Autor: Ankit Jain
2
Contents
Capgemini Insights & Data – Ab Initio CoE | Jan 2018 © Capgemini 2017. All rights reserved | 2
3
Data Quality
Accuracy
Completeness
Update status
Relevance
Consistency across data sources
Reliability
Appropriate presentation
Accessibility
Capgemini Insights & Data – Ab Initio CoE | Jan 2018 © Capgemini 2017. All rights reserved | 3
4
Ab Initio Data Quality
Functional details
Using the Ab Initio Data Quality Engine (DQE), business users can do the following:
Access data sources from files or databases, join data sources for subsequent data quality analysis, and
compile lookup files for use in data quality tests
Write validation tests that can detect null or blank values, valid and invalid values, data patterns, invalid
data relationships, and the uniqueness of key values
Run the data quality application to compile lists of issues in the data source, compute data quality
metrics, profile the input data, and publish the results to the Metadata Hub
Unload reference data — domain code sets and other information — from the Metadata Hub.
Capgemini Insights & Data – Ab Initio CoE | Jan 2018 © Capgemini 2017. All rights reserved | 4
5
Architecture
DETECT DQ
Data In Data Out
PROBLEMS
DQ DQ
REPORTING ISSUE
SYSTEM ARCHIEVE
DQ Processing Workflow
1 E 2
IN Validation Clean Up OUT
A Rules Rules
3 B
Compute DQ 4
Stats /
Profiles
A L E R T
Problem
Stats in EME Records
Achieve
Profiles in / Other
EME Work
flows
5
Compute
History A L E R T
Capgemini Insights & Data – Ab Initio CoE | Jan 2018 © Capgemini 2017. All rights reserved | 6
7
Hands on with DQE in Express IT
Express IT Details :
Open Express IT with your UNIX id and password - http://10.102.22.111:6561/appconf
Private Project should have public projects - STDENV and DataQuality included during
checkout.
GDE Details :
Current APP_HUB Path: /usr/local/abinitio/abinitio-app-hub
Order of Project Check Out : stdenv -> common_io -> dataquality -> dq-examples ->
private project (DQE_trn201609)
Capgemini Insights & Data – Ab Initio CoE | Jan 2018 © Capgemini 2017. All rights reserved | 7
8
Validation Rules
Pattern Search – S*<required pattern> eg. Valid Pincode starting with “4” - S”4….”
Lookup Match – Using “Create Lookup”, first create a configuration of Lookup file and
publish it. Use - L”Lookup Configuration Name” within – eg - L"create_cust_lkp"
Rollup – Check “Validate Dataset Using Rollup”. Then navigate to “Validate Dataset Using
Rollup.” and create variables. Once done, then open “Rollup Computations” . Notice, the
list of Rollup functions under – Keywords and Functions -> Rollup Functions
Capgemini Insights & Data – Ab Initio CoE | Jan 2018 © Capgemini 2017. All rights reserved | 8
9
Validation Rule set
Capgemini Insights & Data – Ab Initio CoE | Jan 2018 © Capgemini 2017. All rights reserved | 9
10
Validation Rule Value
Metrics of each rule is for stored for tracking purpose
Capgemini Insights & Data – Ab Initio CoE | Jan 2018 © Capgemini 2017. All rights reserved | 10
11
Inclusion with MD Hub
Source import.profile
Capgemini Insights & Data – Ab Initio CoE | Jan 2018 © Capgemini 2017. All rights reserved | 11
12
How to see DQ in MH?
Capgemini Insights & Data – Ab Initio CoE | Jan 2018 © Capgemini 2017. All rights reserved | 12
13
Capgemini Insights & Data – Ab Initio CoE | Jan 2018 © Capgemini 2017. All rights reserved | 13
14
Capgemini Insights & Data – Ab Initio CoE | Jan 2018 © Capgemini 2017. All rights reserved | 14
15
Capgemini Insights & Data – Ab Initio CoE | Jan 2018 © Capgemini 2017. All rights reserved | 15
16
Capgemini Insights & Data – Ab Initio CoE | Jan 2018 © Capgemini 2017. All rights reserved | 16
17
Variations of Reports in MD Hub
1. DQ Detection and Reporting –
Based on Filters – such as Errors, Issues, Fields, analyze
DQ.
Create Graphs, Pie Charts, etc. for graphical usage.
Capgemini Insights & Data – Ab Initio CoE | Jan 2018 © Capgemini 2017. All rights reserved | 17
18
Variations of Reports in MD Hub (Cont’d.)
2. DQ Metrics–
Based on Metrics – such as Stability, Accuracy, etc. analyze
Input Data.
Create Graphs, Pie Charts, etc. for graphical usage.
Capgemini Insights & Data – Ab Initio CoE | Jan 2018 © Capgemini 2017. All rights reserved | 18
19
Variations of Reports in MD Hub (Cont’d.)
3. DQ Aggregated Metrics–
Create History and ultimately start getting more insights of
data.
Capgemini Insights & Data – Ab Initio CoE | Jan 2018 © Capgemini 2017. All rights reserved | 19
20
Reporting - Lineage in MD Hub with EME
Capgemini Insights & Data – Ab Initio CoE | Jan 2018 © Capgemini 2017. All rights reserved | 20
21
Reporting – Data Profiler
Capgemini Insights & Data – Ab Initio CoE | Jan 2018 © Capgemini 2017. All rights reserved | 21
22
Reporting – Data Profiler @ Field Level
Capgemini Insights & Data – Ab Initio CoE | Jan 2018 © Capgemini 2017. All rights reserved | 22
23
Questions?
Capgemini Insights & Data – Ab Initio CoE | Jan 2018 © Capgemini 2017. All rights reserved | 23
24
Thank You…
Capgemini Insights & Data – Ab Initio CoE | Jan 2018 © Capgemini 2017. All rights reserved | 24