Informatica MDM Training 2
Informatica MDM Training 2
Informatica MDM Training 2
Management (MDM)
1
Topic 4: Load Process
2
Objectives
3
Trust
4
Trust
100 62
Doug McDougal Grp 56
1-555-901-4670
Winners Survive:
71McDougal Group 37
201-10810
5
Trust
75
50
Base Object cells only updated where new data has
higher trust weighting:
ROWID_O Name Phone
BJECT
100 DMcD Group 1-555-901-4670
6
Trust
7
8
Trust
Trust Demo
9
Validation Rules
endif;
• Validation check can be done on any column in a base object and Downgrade can be
applied to any other columns in the base object
1
Validation Rules
middle_name is null
• Downgrade trust on Address Line 1, City, State, Zip, and Valid Address Ind if
valid_address_ind = ‘False’
1
Validation Rules
1
Relationships
1
Relationship
One-to-many Relationship
• One table (the child) contains a foreign key column, which matches a unique key
column of another table (the parent)
• One-to-many relationships are always defined from the child table in the relationship
(i.e. the referencing table rather than the referenced table).
1
Relationship
Many-to-many Relationship
• A base object acts as an intersection table between another two base objects
• The intersection table has a one-to-many relationship with the other two base objects
1
1
Lookups
Automatic Lookups
• MRM automatically handles lookups loading/updating the primary key of a Base Object
Customer Cross-Reference
1
Lookups
Staging Table for Address data from CRM Customer Base Object
System (C_STG_CRM_ADDR):
ROWID_OBJECT FULL_NAME
PKEY_SRC_OBJECT CRM_ID
Customer Cross-Reference
1
Lookups
1
Lookups
PKEY_SRC_
CUST_ID 10810 JOHN J HANCOCK
OBJECT
Address Cross-Reference
ROWID_ ROWID_ PKEY_SRC_
CUST_ID S_CUST_ID ADDRESS
OBJECT SYSTEM OBJECT
2
Load Process
2
Load Process
Tokenize
STRIP_ON_LOAD_IND= 0
Process Inserts End
LOAD
job
STRIP_ON_LOAD_IND = 1
Tokenize
2
Load Process
Updates
• Load job applies updates for existing records whose
2
Load Process
Inserts
• Load job applies inserts for records that do not exist in the XREF table
• ROWID_OBJECT values are generated for the new records
• New records are inserted into base object and XREF with CONSOLIDATION_IND = 4
• If history flag is switched on for the Base Object, then the insert process writes to the
history tables of Base Object and XREF
2
Load Process
Rejects
• Referential Integrity is maintained among base objects in the consolidated data model
• Rejects will occur in the load process if any records violate the RI constraint
• Parent records do not exist
• Child records are loaded before the parent records
• Lookup has been defined incorrectly
staging_table_name_REJ
2
Topic 5: Match Process
2
Objectives
2
Match & Search Strategy
Match Process
2
Match & Merge Overview
2
Match & Merge Overview
• To merge or link records, MRM needs to know which records are likely duplicates of
each other
• Match rules tell MRM how to identify likely duplicates
• Match rules also tell MRM if two matching records are similar enough to automatically
merge/link, or if they should be reviewed by a data steward
3
Match & Merge Overview
3
Match/Search Strategy
Exact
• Does not allow for any variations in the data in the match columns
• Very simple match process, therefore fast
Fuzzy
• Allows for variations in spelling, formats, word order, nicknames, synonyms, etc.
• More complex match process, therefore slower
3
Match/Search Strategy
Fuzzy
Register Fuzzy or Generate Search for Match
MATCH Exact? Keys Candidates
job
Exact
End
MATCH
job
3
Match Path
Match Path
• A Match Path represents the base object which will provide data for matching purpose
• Traverse the hierarchy between records across multiple base objects or within a single
base object
• Foreign Key Relationships between tables are used to traverse the relationships
• Parent-to-child or child-to-parent relationships can be specified
• By default, MDM does an inner join between the base objects defined in the Match
Path
• The join therefore excludes rows that don’t have corresponding rows in the joined
tables
• To include those records, switch on “Check for Missing Children” – MDM will then do
an outer join instead of an inner join
3
Match Path
3
Match Path
3
Match Column
Generation
Address
Phone
• Provider column(s) is the base object columns that provide the data for the match
column:
• Can be a single column or a concatenation of columns
• Must be a VARCHAR / CHAR column to concatenate
• Date column is also supported for matching
3
Match Column
CUSTOME Address
R_ROWID
3
Exact Match/Search Strategy
3
Exact Match/Search Strategy
Match Columns
• A match column contains an identifying characteristic of the base object record to be
consolidated
• Exact Match Columns:
• Does not make allowance for any variations in data content
• Records will match if they have identical values in the match columns used in match
rules
4
Exact Match/Search Strategy
4
4
Exact Match/Search Strategy
• NULL Matches non-NULL: Use In the above example the effects of Null
Matching on the Generation column are
this flag to specify the match shown
columns in a match rule that
should be regarded as matches
when one of the values being
compared is NULL and the other
is not
4
Exact Match/Search Strategy
would/would NOT have 550 IND Doug McDougal Y
matched without Non-equal 560 IND D McDougall Y
match
570 ORG Doug McDougal
• If using non-equal match, then
MUST switch on Validate
Matches property in Base • If non-equal match is used on the CRM_FLAG column to
prevent 2 records from the CRM system from matching
Object Advanced Properties each other, then –
• NULL=Y is a match
• NULL=NULL is a match
• Y=Y is not a match
4
Exact Match/Search Strategy
4
Exact Match/Search Strategy
Match Rule
4
Fuzzy Match/Search Strategy
4
Fuzzy Match/Search Strategy
Population
• Population is intended to addresses the name distribution problem
• Common family names in each population skew the data and query performance
e.g. Smith, Williams in English-speaking populations
• Each population also has a large number of uncommon names that tend to have the
most error and variability
• Match needs to account for both of these situations in the way that the keys are built,
to give optimal search performance for both
• Defines how to identify matches within a particular population and language
• Defines how to build keys and perform searches on name and address
• Supports a specific set of match purposes
4
Fuzzy Match/Search Strategy
Population
4
Fuzzy Match/Search Strategy
Match Key
• Match key is used to search for match candidates
• It is a fixed-length, compressed, and encoded value
• Built from a combination of the words and numbers in a name or address
• For one name or address, multiple SSA match keys are generated
• Match Key Properties:
• Key Type
• Key Width
• Path Component
• Match Column Contents
5
Fuzzy Match/Search Strategy
5
Fuzzy Match/Search Strategy
Determines the degree of variance that will be supported in the key values
Represents tradeoff between match precision and the space used by match key
records
Key Width Description
• Aims for balance between Limited and Extended i.e. balance between
Standard
disk usage/performance and search completeness
5
Fuzzy Match/Search Strategy
5
Fuzzy Match/Search Strategy
5
Fuzzy Match/Search Strategy
Match Key
5
Fuzzy Match/Search Strategy
Match Column
• A match column contains an identifying characteristic of the base object record to be
consolidated
• Can be a fuzzy column or an exact column
• Fuzzy Match Column
• The column name you choose defines the type of data that the match expects that
column to contain
• Examples: Person Name, Address Part 1, Address Part 2, etc.
5
Fuzzy Match/Search Strategy
Match Column
5
Fuzzy Match/Search Strategy
5
Fuzzy Match/Search Strategy
Typical • The appropriate level of search level for typical data sets
5
Fuzzy Match/Search Strategy
6
Fuzzy Match/Search Strategy
6
Fuzzy Match/Search Strategy
Match Rule
• Determines what constitutes a match during match process
• Fuzzy Match Rule Properties:
• Match Purpose
• Match Level
• Accept Limit Adjustment
6
Fuzzy Match/Search Strategy
6
Fuzzy Match/Search Strategy
Match Rule
6
Fuzzy Match/Search Strategy
Symbol Description
Column_1 (Fuzzy) Indicates that Column_1 is a fuzzy match column
Column_1 (Fuzzy) (+2) Indicates that the fuzzy column, Column_1, has had its weighting in the rule manually
increased
Column_4 (≠) Indicates that non-equal match (anti-match) is switched on for Column_4. Can be combined
with null match: Column_4 (≠ Ø)
6
Match Server Architecture
6
Topic 6: Merge Process
6
Objectives
6
Merge Process
Merge Process
6
Merge Process
Merge
• Consolidation process of two matched records in the Base Object
• Merge can be Auto-Merge or Manual-Merge depending on the degree of matching
Distinct Systems
Records from source marked as Distinct will not merge amongst themselves
7
Merge Process
Un-Merge Process
• By default, unmerging parent records does not unmerge associated child records
• Unmerge Child When Parent Unmerges option allows you to specify what happens if
records in the parent base object are unmerged
• Pre-Requisites for enabling this option are:
• The parent-child relationship must already be configured in the child base object
• The foreign key column in the child base object must be a match-enabled column
7
Topic 7: Batch Process
7
Objectives
7
Batch Process
Batch Viewer
• Shows job completion status (Success / Failure / Warning) with associated message
• Useful for starting the run of a single job, or running jobs that don’t often need to run
(e.g. Synchronize Trust job after changing Trust settings)
7
Batch Process
Batch Viewer
7
Executing Stored Procedures
Stored Procedures
• All public MRM batch processes can be executed through stored procedures
• Can easily be integrated with any job scheduling software – Tivoli, CA Unicenter etc.
• The full list of public batch processes per user-defined object can be found in
C_REPOS_TABLE_OBJECT_V
7
Job Status & Job Statistics
7
Scheduling Considerations
Stage Jobs
• If cleanse server machine has enough CPU and memory to handle multiple cleanse
servers, then parallelize stage jobs
Load Jobs
• Easiest way to schedule Load jobs is in serial
• If large number of Loads run for a short batch window, then need to Load separate
targets in parallel and check all dependencies before each Load starts
Match/Merge Jobs
• Determine whether to run match-merge once per object per batch window, or after
every source load
• Consider whether to tokenize after load. Can switch off the STRIP_ON_LOAD indicator
so that the strip process does not run as part of the load
7
Batch Group
Batch Group
• A batch group is a collection of individual batch jobs (e.g. Stage, Load, Match, etc.) that
can be executed with a single command
• Each batch job in a group can be executed sequentially or in parallel to other jobs
• Group Levels – Jobs in a particular Group Level are executed in parallel
7
Batch Group
Batch Group