Data Migration Roadmap Guidance: Version 3.2 7/9/2019 Final
Data Migration Roadmap Guidance: Version 3.2 7/9/2019 Final
Data Migration Roadmap Guidance: Version 3.2 7/9/2019 Final
Final
Data Migration Roadmap Guidance Document Version Control
Table of Contents
SECTION 1. INTRODUCTION ....................................................................................................................................... 1
1.1. BACKGROUND AND PURPOSE ................................................................................................................................. 1
1.2. BENEFITS OF A DATA MIGRATION ROADMAP ............................................................................................................. 1
1.3. INTENDED AUDIENCE ............................................................................................................................................. 1
1.4. REFERENCE DOCUMENTS / APPLICABLE PROJECT DOCUMENTS ................................................................................. 1
SECTION 2. EXECUTIVE SUMMARY .......................................................................................................................... 3
SECTION 3. DATA MIGRATION ROADMAP ............................................................................................................... 5
3.1. INTRODUCTION ...................................................................................................................................................... 5
3.1.1. Data Migration Project Lifecycle ................................................................................................................. 6
3.2. DATA MIGRATION PLANNING PHASE ........................................................................................................................ 7
3.2.1. Planning Overview ...................................................................................................................................... 7
3.2.2. Data Migration Planning Tasks & Subtasks ................................................................................................ 8
3.2.3. Planning Data Migration Project ................................................................................................................. 9
3.2.3.1. Establish Scope .................................................................................................................................. 9
3.2.3.2. Identify Risk / Constraints / Dependencies / Assumptions .................................................................. 9
3.2.3.3. Develop Data Migration Risk Mitigation Plan .................................................................................... 10
3.2.3.4. Develop Data Migration Communications Plan ................................................................................. 12
3.2.3.5. Critical Success Factors for Data Migration Planning ....................................................................... 13
3.2.4. Determine Data Migration Requirements .................................................................................................. 14
3.2.4.1. Determine Business Requirements and Expectations ...................................................................... 14
3.2.4.2. Determine Technology and IT Infrastructure Requirements ............................................................. 15
3.2.4.3. Identify Relevant Best Practices ....................................................................................................... 15
3.2.5. Assess Current Environment .................................................................................................................... 16
3.2.5.1. Identify and Collect Existing Data related Architecture ..................................................................... 16
3.2.5.2. Blueprint Current Stat of the Data Architecture ................................................................................. 17
3.2.5.3. Determine Data Migration Technology.............................................................................................. 17
3.2.6. Develop Data Migration Plan .................................................................................................................... 17
3.2.6.1. Determine Data Migration Method .................................................................................................... 17
3.2.6.2. Determine Data Conversion Plan...................................................................................................... 18
3.2.6.3. Determine Data Integration Plan ....................................................................................................... 18
3.2.6.4. Plan Parallel Operation ..................................................................................................................... 19
3.2.6.5. Develop Migration Data Quality Plan ................................................................................................ 19
3.2.6.6. Develop Data Archival Strategy ........................................................................................................ 20
3.2.6.7. Develop Data Migration Test Plan .................................................................................................... 20
3.2.7. Define and Assign Roles & Responsibilities ............................................................................................. 20
3.2.7.1. Define Migration Roles and Responsibilities ..................................................................................... 20
3.2.8. Data Migration Planning Deliverables ....................................................................................................... 21
3.2.9. Data Migration Planning Checklist ............................................................................................................ 22
3.3. DATA MIGRATION ANALYSIS AND DESIGN ............................................................................................................... 22
3.3.1. Analysis and Design Overview ................................................................................................................. 22
3.3.2. Data Migration Analysis and Design Tasks and Subtasks ........................................................................ 22
3.3.3. Perform Data Migration Analysis .............................................................................................................. 23
3.3.3.1. Analyze Current Environment ........................................................................................................... 23
3.3.3.2. Evaluate Data Migration Technology ................................................................................................ 23
3.3.3.3. Evaluate Data Quality ....................................................................................................................... 24
3.3.3.4. Perform Data Profiling ....................................................................................................................... 24
3.3.3.5. Critical Success Factors for Data Migration Analysis & Design ........................................................ 25
3.3.4. Determine Data Security Controls ............................................................................................................ 26
3.3.4.1. Determine Enterprise Management and Operational Security Controls............................................ 26
3.3.5. Design Data Migration Environments ....................................................................................................... 26
3.3.5.1. Design Staging Area ......................................................................................................................... 26
3.3.5.2. Design Target Data Architecture ....................................................................................................... 27
3.3.5.3. Correlate Migration Data (Source/Staging/Target) ............................................................................ 27
3.3.6. Design Data Migration Procedures ........................................................................................................... 27
3.3.6.1. Design Data Staging Procedures ...................................................................................................... 27
3.3.6.2. Design Data Cleansing Procedures .................................................................................................. 27
3.3.6.3. Design Data Conversion Procedures ................................................................................................ 28
List of Figures
Figure 2-1: Data Migration Project Lifecycle .................................................................................................................. 3
List of Tables
Table 1-1: Reference Documents .................................................................................................................................. 2
Table 3-1: Data Migration Lifecycle with High-Level Tasks Identified ............................................................................ 7
Table 3-2: Data Migration Planning ................................................................................................................................ 8
Table 3-3: Assumptions, Constraints, and Risks .......................................................................................................... 10
Table 3-4: Data Migration Risk Probability and Impact Levels ..................................................................................... 11
Table 3-5: Data Migration Risk Mitigation Matrix. ......................................................................................................... 12
Table 3-6: Data Migration Roles and Responsibilities .................................................................................................. 21
Table 3-7: Data Migration Analysis and Design ........................................................................................................... 23
Table 3-8: Data Migration Implementation ................................................................................................................... 31
Table 3-9: Data Migration Closeout.............................................................................................................................. 36
Section 1. Introduction
1.1. Background and Purpose
Federal Student Aid is engaged in a long-term effort to integrate its processes, data and systems. To
better support these business objectives and to emphasize data as an enterprise asset, Federal Student
Aid has established the Enterprise Data Services (EDS) team. The goal of the EDS is to consistently
define data and make standardized data available across the enterprise by providing information services
and data technology expertise to business owners, project managers and architects.
This document outlines a roadmap and provides checklists to assist with mitigating some of these
challenges. Comments or suggestions for improvement to this roadmap are encouraged and should be
reported back to the Project Manager for Enterprise Data Management.
DOCUMENT
D O C U M E N T ID DOCUMENT TITLE
VERSION
Enterprise Data Management (Operations) Statement of Work August 25, 2006
Burry, Christopher & Mancusi, David. “How to plan for data
May 21, 2004
migration.” ComputerWorld
Softek, Inc. The Hidden Costs of Data Migration (White Paper). April 1, 2006
KnowledgeStorm.com: Softek, Inc.
Peipert, Glenn & Cohen, Lori. Strategic Approach to Data July 2005
Migration (White Paper). WWW: Conversion Services,
International, Inc.
In addition, research found that the most successful projects were ones that maximized opportunities and
mitigated risks. The following critical success factors were identified:
• Perform data migration as an independent project 1
• Establish and manage expectations throughout the process
• Understand current and future data and business requirements
• Identify individuals with expertise regarding legacy data. 2
1
Microsoft CRM Data Migration Framework, page 6
2
Microsoft CRM Data Migration Framework, page 8
This document is organized according to the four primary phases: Data Migration Planning, Data
Migration Analysis and Design, Data Migration Implementation, and Data Migration Closeout and
contains a detailed description of each phase (including tasks and subtasks). In addition, common pitfalls
are identified and described. Finally, this document contains a Data Migration Review Checklist, which
serves as a tool to help launch and manage data migration projects
3
Microsoft CRM Data Migration Framework, page 7
4
Strategic Approach to Data Migration, page 3
It is common to use a staging area as an interim data store to facilitate testing and validation of these
modifications/transformations. In addition, a staging area can serve as a storage area for integration
projects, which pull data from multiple source systems.
A review of best practices produced the following two principles inherent in successful data migration:
Perform data migration as a project dedicated to the unique objective of establishing a new (target) data
store.
D AT A MIGRATION D AT A MIGRATION
D AT A MIGRATION D AT A MIGRATION
ANALYSIS & DESIGN IMPLEMENTATION
P L A N N I N G P H AS E C L O S E O U T P H AS E
P H AS E P H AS E
Requirements Learned
Assess Current Design Data Environment Cleanse Data Perform Knowledge
Environment Transfer
Develop Data Migration Design Migration Convert Transform Communicate Data
Plan Procedures Data (as needed) Migration Results
Define and Assign Team Validate Data Quality Migrate Data
Roles and Responsibilities (trial/deployment)
Validate Migration
Results (iterative)
Validate Post-
Migration Results
Table 3-1: Data Migration Lifecycle with High-Level Tasks Identified
During the lifecycle of a data migration project, the team moves the data through the activities shown in
Figure 2-4.
The team will repeat these data management activities as needed to ensure a successful data load to the
new target data store.
To ensure that both the data migration project and the larger development project are successful, it is
good practice to execute data migration as an independent project. Thorough planning is the foundation
for consistent success in any process, and data migration is no exception. Also, a successful data
migration effort requires the mitigation of issues and risks to the business/ organization.
The Data Migration Plan details the information that should be included for each step in the plan. Other
results, such as risks and/or critical success factors, may simply be documented in the plan. All steps
within subsequent phases of the migration are included in the plan. Also included are the way in which
the steps should be performed with respect to rules, parameters, and procedures, and so forth.
In addition, the development program describes the general project deliverables, to which all projects
must adhere, such as the Quality Plan, Change Management Plan, and Communications Plan.
D AT A MIGRATION PL AN NING
ARTIFACTS
D AT A MIGRATION PL AN NING
CHECKLIST
Table 3-2: Data Migration Planning
A S S U M P T I O N S /D E P E N D E N C I E S CONSTRAINTS RISKS
Sufficient resources are available for Time/Schedule dictates what must Unexpected delay and/or
all aspects of data migration. be completed and when. downtime might occur.
Sufficient expertise is available for all Funding might limit access to The team might
aspects of data migration. resources that can be devoted to the encounter complex:
effort. i. Processes
ii. Environments
All environments (legacy, staging, Personnel and equipment might be iii. Configuration
and target) are fully documented, limited or unavailable. issues related to
available, and accessible as planned data volumes
during necessary steps of migration.
A S S U M P T I O N S /D E P E N D E N C I E S CONSTRAINTS RISKS
Data requirements and definitions Misunderstanding or
The team has access to: might require clarification by subject misinterpretation of
Subject matter experts for current matter experts. requirements might result
source system and data in a flawed design.
Note that this Risk Mitigation Plan needs to align with the Risk Mitigation Plan of the overall development
project.
A successful data migration effort requires the mitigation of issues and risks to the business/ organization.
Identifying these challenges (such as dependencies on other teams within Federal Student Aid, minimal
migration expertise within the organization, or insufficient understanding of data and source systems) and
opportunities (such as the identification of the most appropriate data migration method) early in the
project allows for proper management and less later disruption. In order to decrease risks, project leads
should consider the following actions:
5
The Hidden Costs of Data Migration, page 6
Data quality issues are not identified Medium Critical Quality review sessions will be
until late in the project, thus causing conducted throughout each release so
delays and cost overruns. that data quality issues may be identified
early and addressed accordingly.
Necessary database personnel are Low Critical In case access to Federal Student Aid
not available during migration. resources is very limited, the contractor
should consider hiring a short-term
consultant to develop the databases to
support the target data.
Issues arising during the life cycle of the data migration project need to be reported, documented, and
resolved as soon as they arise.
The Data Migration Risk Management Plan should be reviewed on a regular basis to ensure appropriate
monitoring of risks.
3.2.3.4. Develop Data Migration Communications Plan
The Data Migration Communications Plan identifies all data management aspects (what, who, when,
where, how, about) of the data migration project to stakeholders, Data Migration Team members, and (if
needed) external personnel. This plan outlines the recipient, title of communication, content, format, and
schedule of each document prepared and shared as a result of the data migration project. Only
communications relevant to the migration effort are discussed. The Data Migration Communications Plan
should cover the following:
• Status reports (weekly, monthly)
• Deliverables and their distribution list including approval authority
• Escalation procedures
• Data profiling findings
6
A Strategic Approach to Data Migration, page 1
• Document and discuss any anticipated issues or risks with stakeholders and/or business
owners.
• Perform migration as an independent project. 7
• Establish and manage expectations throughout the process.
• Understand current and future data and business requirements.
• Identify individuals with expertise regarding legacy data. 8
• Collect available documentation regarding legacy system(s).
• Clearly define data migration project roles & responsibilities 9.
• Prepare a comprehensive overview of data content, data quality, and data
structure. 10
• Determine the importance of business data and data quality with business owners
and stakeholders.
7
Microsoft CRM Data Migration Framework, page 6
8
Microsoft CRM Data Migration Framework, page 8
9
Microsoft CRM Data Migration Framework, page 7
10
Strategic Approach to Data Migration, page 3
11
Microsoft CRM Data Migration Framework, page 6
12
How to Plan for Data Migration (ComputerWorld, May 21, 2004)
15
Information Systems management in Practice
13
Taking the Pain Out of Data Migration, page 1
For example, ETL software might execute slightly differently against an Oracle database under a UNIX
operating system than it does against a Microsoft SQL Server database under a Windows operating
system. Such software might not be compatible with the architecture of the staging area. For these and
many other reasons, a description of the technology involved in any migration should be prepared.
3.2.4.3. Identify Relevant Best Practices
Best Practices often include an activity that is technically outside the scope of the actual data migration.
Considering the requirements of the target data store with regard to longevity and future activity can
significantly affect decisions made about and during the migration. It is generally reasonable for the MMT
to consider future requirements of the target data store such as durability, migratability, reusability,
scalability, future anticipated capacity Determine Data Security and Privacy Requirements.
14
2006 Best Practices for Data Migration, page 7
The MMT must review and follow the processes and roadmaps outlined in the Handbook for Information
Assurance Security Policy Information Assurance Program March 31, 2006 for protection of all data at
each source, as well as during the migration of the data between sources.
• Information about any known issues or concerns regarding the quality of the available
documentation, such as whether the documentation was outdated (e.g. documentation was
prepared when the original project started 5 years ago). This information can only be
collected through interviews
• Information about any known and identified gaps/missing information that should be resolved
after the data migration, which may or may not be documented as a business or technical
requirement. This information can only be collected through interviews
The MMT will work closely with the DMT to identify the IT infrastructure required to implement the data
migration efforts, and any infrastructure affected by the execution of the proposed data migration.
Existing technology, such as the source-data storage devices and software, are already in place,
requiring no decision, but rather must be captured as part of the Legacy (or baseline 15) Data Architecture.
However, the tools in place to access the legacy data during the migration (such as ETL software or
custom programming 16) should be evaluated to ensure migration requirements are met. All technical
aspects of the staging area (if used), target environment, actual movement, and validation of the data
must be defined and documented by the TMT.
As part of this analysis, the DMT will determine whether the IT infrastructure in place will support the
planned data migration effort and, if not, what solution to recommend resolving any shortcomings.
15
A Practical Guide to Federal Enterprise Architecture, February 2001 (CIO Council), page 5
16
The Complete Data Migration Methodology, page 6
17
How to Plan for Data Migration (ComputerWorld, May 21, 2004)
the application must be taken out of service (offline) during the actual migration. The basic methods
are 18:
• Offline: back up & restore; restore from backup tapes; ftp transfer, and
• Online: array-based replication; volume management or replication; and host-based mirroring
In many cases, a hybrid of these methods is required to satisfy the requirements of a major migration
effort. The Data Migration Project Manager, in close collaboration with the Federal Student Aid EDS
Team, must determine the method and tools to be used to perform the activities of the data migration.
The method can differ based on the legacy systems involved. The method and tools chosen, and the
factors contributing to the determination, should be included in the Data Migration Plan. Such factors
may include (but are not limited to):
• Distribution (location) of data stores
• Funding constraints
• Available expertise in current and target storage environment (e.g., whether planning is
limited to specific options simply because of available expertise)
• Performance (qualitative and quantitative) of procedures/tools
• Source data protection/recovery
• Homogeneous versus heterogeneous storage requirements
• Multi-vendor environment
• Dependencies on external business partners
• Allowable downtime
• Time (schedule) constraints
• Volume of data
• Personnel constraints (availability)
• Complexity of storage and processing environment
• Physical re-location
• Data storage format incompatibilities (DBMS/DBMS, DBMS/OS)
• Configuration issues related to data volume
3.2.6.2. Determine Data Conversion Plan
Migration requirements may require a change to the legacy data during the migration process. There
may be changes to form, value, or volume. A strategic approach to data migration that analyzes legacy
data at the source will mitigate this risk by allowing analysis both at the source and at each step of the
migration process. The MMT, and specifically the data stewards, must define the form and business
function of the target data. Transformation of the data values, constraints, and/or format occurs during
the migration process, through thoroughly tested rules and procedures.
3.2.6.3. Determine Data Integration Plan
Data migration may require drawing data from more than one legacy data source. The Integration Plan
describes how conflicts and duplication in source data and data structures will be resolved. The plan also
determines how to move the data from the source system(s) to the target system. There are two options:
• Load the data sources sequentially in to the staging area until all source data has been
loaded. Then, perform the integration of all source data in the staging area. Finally, move
the integrated data to the target data store.
• In some cases, the volume of data or time restrictions may not support the above-described
option, and may result in sequential individual data migrations (one for each source system).
The staging area could serve as an integration environment to simulate loading the new data
set into an environment already populated with operational data.
18
Simplifying Technology Refresh with Data Migration Software, page 6
Staging areas are an optional interim data source, which can serve the purpose of mirroring the ultimate
target system. Best practices demonstrate the benefits of establishing a staging area. It allows
validation, cleansing, and/or conversion of the integrated data prior to movement into the target location.
These trial migrations can be repeated multiple times until the data migration procedures are perfected
without affecting the configuration and readiness of the final target system.
The need to integrate multiple legacy-data sources mandates such a staging area.
3.2.6.4. Plan Parallel Operation
Migrating financial systems often require the old and the new system to run in parallel for a pre-defined
period of time to ensure the reliability and accuracy of the newly implemented target system. Federal
Student Aid follows this principle when planning whether the legacy systems should continue operation
for a set time after a successful migration. The legacy system may even serve as a long-term data
source for the target system (which is often the case when migrating data from an operational, or
transactional, system). However, some legacy systems may be scheduled for complete shutdown upon
successful migration of the data to a target system. Others may already be out of operation, which may
be the leading factor facilitating the migration.
While different purposes are served by shutting down or continuing operation of legacy systems, the two
scenarios have one issue in common: both require that the data contained in the source and target
systems remain synchronized to some degree as long as both systems are in operation. In the case of a
transactional system feeding a data warehouse, the source data is often derived and/or aggregated over
a particular time period when being moved into the warehouse. These rules and algorithms shall be
developed as part of the procedures for populating the target data store.
A third scenario involves maintaining the legacy data store for a period of time while the operation of the
new system is validated. While this is generally considered a post-migration task, the full operation of the
new system may reveal errors in the migrated data, requiring revisions to some part of the data migration
(data quality remediation, data migration procedures, etc.). Roadmaps for operating the two systems in
parallel, monitoring and comparing the performance of each system, and resolving issues as they arise
should be established and included in the Data Migration Plan.
3.2.6.5. Develop Migration Data Quality Plan
The Data Quality Plan concentrates on the quality of the legacy data. It requires multiple efforts that can
be performed in parallel. All outcomes will determine the overall data quality of the source system(s) to
be migrated. In addition, the Data Quality Metrics and Data Loss Tolerance information will be used as
benchmarks to determine the fitness of the data for deployment.
Define Data Quality Metrics: A Proof of Concept, which simulates a full data migration by operating on a
sampling of data supporting a single event, such as a single transaction or single concept 19 may be
performed if a commercial data migration (such as ETL software) or data profiling tool is used. The Proof
of Concept provides a field-level and/or record-level view of the legacy data and helps identify anomalies.
The document will validate the compatibility of the technology selected to perform the data migration, and
will provide data quality metrics based on the sample data that may be used to project the level of effort
required to perform full data remediation. If custom software or procedures are planned for the data
migration, then data metrics must still be established for measuring the quality and integrity of the data
before, during, and after each migration stage.
Define Data Loss Tolerance: If all data stores that participate in a data migration effort (legacy, staging,
and target) have the same basic specifications (e.g., a relational database using version X of RDBMS Y
on operating system Z, etc.), it is reasonable to expect that all data will transfer without loss. However, on
occasion obsolete data structure or formats may not translate 100% into a modern environment. In this
scenario, the MMT must consult the business stakeholders to determine the tolerance level for data loss.
Establish Data Quality Remediation Plan: The creation of a “zero-defect” data quality policy is optimal
prior to data migration. Such a policy can be put into place by performing error correction, including
19
A Strategic Approach to Data Migration, page 3
passive remediation (at the source) or, through active remediation, which corrects data errors during the
migration process. If this cannot be accomplished, then fixing known or discovered errors in the legacy
data should be the first post-migration step 20. Once metrics are established, the MMT must determine at
what stage of the migration, and by what means, the identified data quality issues shall be remedied, and
lay these decisions out in the Data Quality Remediation Plan.
If a passive remediation plan is chosen that affects the content of the source (legacy) data, notification to
dependent systems of changes must be included in the Communications Plan.
3.2.6.6. Develop Data Archival Strategy
The plan for managing data once it is no longer necessary for immediate access and use is called the
Data Archival Strategy. The MMT must interview stakeholders and formulate strategies regarding what
data to retain, how long, where and how. A thorough architecture of the legacy system may already
include a Data Archival Strategy, but it likely only covers the retention and management of data during the
operational life of the legacy system.
If the strategy in place for the legacy system is sufficient to address the retention of data once the system
is removed from operation, then the full strategy may be adopted as part of the Data Archival Strategy
within the Data Migration Plan. If, however, the strategy does not address system shutdown or does not
exist at all, then the MMT must establish a strategy for retaining and retrieving the legacy data after the
legacy system is taken out of operation.
3.2.6.7. Develop Data Migration Test Plan
The MMT must establish a plan for testing the migration procedures at each step. All data movement
procedures, transformation/conversion procedures, data cleansing procedures, and data validation
procedures must be accounted for in the context of the Migration Data Architecture. The data migration
procedures must be able to successfully satisfy the requirements set forth in the data requirements (as
demonstrated in the test plan) before proceeding to the full migration.
20
A Strategic Approach to Data Migration, page 1
21
Microsoft CRM Data Migration Framework, page 7
22
PMP In Depth, page 8
23
2006 Best Practices for Data Migration, page 6
• Content profiling: assessment and examination of the data content. The results of this
assessment reflect the quality of the content of the data captured, and identify issues that will
be resolved through data cleansing.
The assessment is a process whereby the team examines the data available in an existing database and
collects statistics and information about that data. The purpose of these statistics is to:
• Give metrics on data quality, including whether the data conforms to company standards
• Assess the risk involved in integrating data for new applications,
• Monitor and track data quality
• Assess whether metadata accurately describes the actual values in the source database
Profiling activities should follow the following three steps in order presented:
24
Wikipedia: Wikipedia- Data Profiling
be foreign keys (but that might not have constraints to enforce integrity), and to identify other
areas of data redundancy. Example: redundancy analysis could provide the analyst with the
fact that 80% of the time, the ZIP field in table A contained the same values as the
ZIP_CODE field in table B.
Column profiling provides critical metadata, which is required in order to perform dependency profiling,
and as such must be executed before dependency profiling. Similarly, dependency profiling must be
performed before redundancy profiling.
The use of automated data profiling tools is a Best Practice in the data profiling step in data migration 25.
This step partially overlaps with blueprinting the state of the legacy architecture. Automated profiling tools
might be used to facilitate the procedures, but the procedures might also be done manually in the
absence of automated software. Once completed, a Data Profile contributes to Data Conversion and
Data Quality Remediation. The documented results of the analysis become a resource for the design of
the staging and target architectures. The Data Profile Assessment needs to be distributed and presented
for approval as outlined in the Communications Plan to discuss and determine the criticality of the findings.
In addition, Federal Student Aid must analyze which of the identified Data Quality issues can be resolved
through Data Cleansing. The questions below help in determine the most appropriate Data Cleansing
approach and responsibilities:
• Where do the identified data issues originate?
There are two possibilities: They were introduced by Federal Student Aid applications or
through data received from their business partners as part of the data exchange.
It might be possible to implement strong validation rules at the front end (GUI) and prevent
the entry of invalid data at the point of data entry; or validation rules can be implemented at
the database level.
• Do the data issues refer to historical data only (e.g. data older than 5-10 years)?
It is possible that improved business and validation rules have been implemented after
detection of these data issues and that newer data is in better condition. A decision needs to
be made as to whether it is worthwhile and/or necessary to repair the historical data records.
• Are the identified data issues caused by missing validation rules?
Design and implementation of proper business rules could repair these issues.
• Who owns the data and is responsible for the data quality?
Usually, the business owners are responsible to ensure high data quality from a business
perspective. The business owners, in collaboration with the Data Steward(s), should define the
necessary steps for data cleansing and prepare a plan and timeline for implementation.
3.3.3.5. Critical Success Factors for Data Migration Analysis & Design
Best practices recommend establishing the following goals while designing target data structures
and data migration procedures:
• Understand data requirements (architecture and business rules),
• Design comprehensive data migration procedures upon understanding of data
content, quality, and structure 26, and
• Leverage standardized data structures (Enterprise Data Architecture).
25
A Strategic Approach to Data Migration, page 2
26
Strategic Approach to Data Migration, page 3
employees will perform the data cleansing. This situation should be treated as a potential risk with
respect to the timely completion of the task.
3.3.6.3. Design Data Conversion Procedures
The TMT designs the necessary procedures to convert the source data to the proper values and formats
required in the staging and target data store. Sources of input for this task include:
• Blueprint of the existing data architecture
• Design of the staging area and the target data store
• Data correlation report
• Any identified rules, translations, or transformations that the data must undergo to meet the
staging/target data structure requirements
3.3.6.4. Design Target Data Migration Procedures
The TMT designs the necessary procedures to extract the data from the staging or source data stores as
appropriate, transport the data to the target data store, and in case no staging area is used, perform any
necessary translations or transformations prior to populating the target data store.
3.3.6.5. Design Data Validation Procedures
The TMT designs the necessary procedures to validate the integrity of the data content at each stage of
the migration. Validation procedures must also support validation of the completeness and accuracy of
the migrated data.
3.3.6.6. Design Data Quality Remediation Procedures
The TMT designs the procedures to be used to remediate identified data quality issues through the data
profiling. These procedures will be developed in close collaboration with Federal Student Aid business
owners to meet current (and future) business requirements, and in collaboration with the EDS Team with
regard to compliance with Federal Student Aid data standards.
3.3.6.7. Refine Data Migration Test Plan
All individual test cases must be consolidated into the overall Data Migration Test Plan. The TMT uses
this detailed information to update and refine the original plan through sequencing the execution of these
test cases and identification of any dependencies.
3.4. Implementation
3.4.1. Implementation Overview
The steps of Implementation and Validation are logically interdependent, much like the steps of Planning
and Analysis and Design. Although not every activity may occur for every migration, the steps that do
occur should generally follow the sequence shown below. There is one exception: As discussed in
several Best Practice articles, a decision should be made during Planning as to the most efficient and
effective time to do Data Cleansing and Data Conversion.
If the data migration covers multiple source systems, repeat the steps of extracting the data and loading it
in to the staging area until all source data has been loaded. The staging area supports the integration of
all data. Figure 2-5 shows the high-level process flow of the Implementation Phase.
DEVELOP
D AT A
MIGRATION ST AGE CLE ANSE CONVERT/ MIGRATE POST-
PROCEDURES D AT A D AT A TRANSFORM D AT A D AT A MIGR ATION
Configure Create Cleanse Data Convert/ Transform Perform Trial Operate Legacy
Resources Staging according to Data Migration and Target
Area Data Environment in
Remediation Parallel
Plan
Develop and Populate Validate Validate Converted/ Validate Results Validate Parallel
Test Data Staging Cleansed Transformed Data of Trial Migration Operation
Migration Area Data
Procedures
Develop and Integrate Obtain Approval Release Data
Test Validation Staged for Full Migration Environments
Procedures Data into Production
Environment
Develop and Validate Perform Full
Test Data Staged Migration
Cleansing Data (Deployment)
Procedures
Develop and Validate Results
Test Data of Full Migration
Conversion/
Transformation
Procedures
Establish
Access to
Staging and
Target Area
Critical Success
Factors for
Implementation
Table 3-8: Data Migration Implementation
Using the Data Migration Data Quality Plan (here, using the Data Loss Tolerance and Data Quality Metrics
as a benchmark), the TMT must execute the validation procedures to measure the success of each stage
of migration, and must also validate any stage of the implementation that changes the data (such as
format, content, or location). Any time the resulting data fails the validation; the procedures of that
migration stage should reverse, or “roll back,” the data and make any needed corrections. The trial
migrations should be repeated until acceptance and/or success criteria are met and the migration
procedures and environments are ready for deployment.
• Request or acquire the data access required by the technical team to implement the various
procedures (if not already done)
The following technical steps must now occur:
• Establish test environments
• Make test environments available to the technical team
• Install and/or configure all software involved in the migration 27
These steps carry a potential risk and dependency because both the creation of the test environment and
the configuration and implementation of the software will most likely be performed by staff outside the
immediate project.
3.4.3.2. Develop and Test Data Migration Procedures
The TMT develops and tests the data migration procedures in accordance with the Data Migration Plan,
the Data Migration Test Plan, and the documented Migration Data Architecture.
3.4.3.3. Develop and Test Data Validation Procedures
The TMT (with input from stakeholders and business users) develops and tests the data validation
procedures in accordance with the Data Migration Plan, the Data Migration Test Plan, and the Migration
Data Architecture; as well as in compliance with all applicable business rules and processes. These
procedures must be capable of confirming that data loss and accuracy is within allowable parameters
during each stage of the migration. If 100% migration of data between different environments is not
feasible, the MMT must consult with business stakeholders and revise tolerances for data loss and
update the data loss tolerance in the Data Migration Plan with the revised tolerances. The validation
procedures at each stage must take these tolerances into account.
3.4.3.4. Develop and Test Data Cleansing Procedures
The TMT develops and tests the data cleansing procedures in accordance with the Data Migration Plan,
and the Data Migration Test Plan, specifically the Data Quality Metrics and Data Loss Tolerance. If the
procedures designed for data cleansing cannot be scripted or saved (possibly because they are steps to
follow in a commercial software package or require significant manual operation), then they should be
thoroughly documented and tested at this stage.
3.4.3.5. Develop and Test Data Conversion/Transformation Procedures
The TMT develops the data conversion and/or transformation procedures in accordance with the Data
Migration Plan, and the Data Migration Test Plan, specifically the Data Conversion Plan. As mentioned
before, if the procedures designed for data conversion cannot be scripted or saved (possibly because
they are steps to follow in a commercial software package or require significant manual operation), then
they should be thoroughly documented and tested at this stage.
3.4.3.6. Establish Access to Staging and Target Areas
The TMT must coordinate with the proper points of contact at Federal Student Aid to gain all necessary
access to the software and establish the user accounts that are required to perform all operations of the
data migration. Omitting or delaying this task can result in a delay of the data migration project.
There is a potential risk and dependency because Federal Student Aid staff outside of the immediate
project will perform this task.
3.4.3.7. Critical Success Factors for Implementation
Best Practices recommend the following activities based on critical success factors during migration of
data from one or multiple source data stores to a target data store:
• Execution of a thorough and detailed Migration Test Plan
27
Microsoft CRM Data Migration Framework, page 8
28
Microsoft CRM Data Migration Framework, page 8
29
Strategic Approach to Data Migration, page 3
This step carries both potential risk and dependency, as Federal Student Aid staff most likely performs
the data cleansing.
3.4.5.2. Validate Cleansed Data
The TMT must execute the validation procedures to measure the success of the cleansing stage of the
data migration
3.4.8. Post-Migration
3.4.8.1. Operate Legacy & Target Environments in Parallel
Once the full migration has been executed, validated, and approved, the business area stakeholders may
desire that the legacy system and the target system operate in parallel for a period of time. This decision,
the duration of parallel operation, and the criteria for concluding the parallel operation should be captured
in the Data Migration Plan, specifically in the Parallel Operation Plan.
3.4.8.2. Validate Parallel Operation
During the “trial” period of parallel operation if errors are detected in the operation of the target system
then (1) the legacy system may continue to operate according to original requirements, and (2) the data
migration may be revisited, repaired, and repeated.
3.4.8.3. Release Data Environments
Upon confirmation that the target environment is operating in a satisfactory manner, the source data
environment may be released. If the source environment is to be retired from operations, retirement must
follow a system retirement plan and, may occur at this time.
Any staging area, if used, would generally be taken out of operation at this point in accordance with the
archival strategy developed as part of the overall Data Migration Plan. However, it is important that the
environment not be purged entirely, in case errors in the data migration are discovered. It may be
possible to restore the staging environment and re-populate the target environment without having to
repeat the data migration procedures.
If the staging area is part of a long-term solution – such as an interim data store between an operational
data store and a data warehouse – it should be approved and placed into full-time operation. The target
environment should also be ready to begin operation as a full-time application data store.
30
IPM Data Management Plan
31
2006 Best Practices for Data Migration, page 8
• A complete set of Data Migration Documents and Artifacts in electronic and/or paper format.
32
2006 Best Practices for Data Migration, page 8
Appendix B - Glossary
TERM DEFINITION
Best Practice 33: A management idea asserting a technique, method, process, activity, incentive or reward
as more effective at delivering a particular outcome than any other technique, method,
process, etc.
Column A set of data values of the same type collected and stored in the rows of a table.
Database A set of table spaces and index spaces.
Data Conversion 34: The [transition] of one form of computer data to another.
Data Element A generic term for an entity/class, table, attribute, or column in a conceptual, logical, and
physical data model.
Data Migration 35: The transferring of data between storage types, formats, or computer systems.
Enterprise One of the initial components of Enterprise Data Architecture. The first enterprise level
Conceptual Data data model developed. The ECDM identifies groupings of data important to Lines of
Model (ECDM) Business, Conceptual Entities, and defines their general relationships. The ECDM provides
a picture of the data the enterprise needs to conduct its business. (Reference: U.S.
Department of Education Enterprise Data Architecture – Enterprise Data Standards and
Roadmaps.)
Enterprise Data One of the initial components of Enterprise Data Architecture. The EDD lists metadata
Dictionary (EDD) objects and a complete description of the object at a sufficient level of detail to ensure that
they are discrete and clearly understood. Such descriptions shall include, at a minimum,
labels (names, titles, etc.) and definitions (or text descriptions), but may include additional
descriptive metadata such as object type, classifications, content data type, rules
(business, validation, etc.), valid and default values, etc. The EDD is the definitive source
for the meaning of metadata objects. (Reference: FSA-EDM)
Enterprise Logical A component of a maturing Enterprise Data Architecture. The second enterprise level data
Data Model (ELDM) model developed. It is the result of merging application level data model information into
the existing Enterprise Conceptual Data Model (ECDM). The ELDM extends the ECDM
level of detail. (Reference: U.S. Department of Education Enterprise Data Architecture –
Enterprise Data Standards and Roadmaps)
Enterprise Data A component of a maturing Enterprise Data Architecture; rules and recommendations for
Standards and the creation and updating of metadata objects and structures as well as for creating
Roadmaps (EDSG) conceptual and physical models and schemas at both the enterprise and application level.
(Reference: FSA-EDM)
Management (a.k.a “Management fad”) A change in philosophy or operations that sweeps through
Idea 36: businesses and institutions, and then disappears when enthusiasm for it wanes.
Operational Data An operational data store (ODS) is a type of database often used as an interim area for a
Store (ODS): data warehouse. Unlike a data warehouse, which contains static data, the contents of the
ODS are updated through the course of business operations. An ODS is designed to
quickly perform relatively simple queries on small amounts of data (such as finding the
status of a customer order), rather than the complex queries on large amounts of data
typical of the data warehouse.
33
Derived from Wikipedia “Best Practice” (http://en.wikipedia.org/wiki/Best_practice)
34
Derived from Wikipedia “Data Conversion” (http://en.wikipedia.org/wiki/Data_conversion)
35
Derived from Wikipedia “Data Migration” (http://en.wikipedia.org/wiki/Data_Migration)
36
Derived from Wikipedia “Management fad” (http://en.wikipedia.org/wiki/Management_fad)
TERM DEFINITION
Schema (Data): Any diagram or textual description of a structure for representing data. (Reference: FSA-
EDM)
Table: A set of related columns and rows in a relational database.
Table Space: A portion of a database reserved for where a table will go. Table structure is the mapping
of tables into table spaces.
Target Data Store The data store where the migrated and/or transformed data will be moved.
Uniform Resource The addressing technology for identifying resources on the Internet or a private intranet.
Identifier (URI)
Table B-1: Glossary
P C A N/A
1. Overall
P C A N/A
2.2.3. Determine Technology
2.2.3.1. Data Storage Distribution
2.2.3.2. Physical re-location Requirements
2.2.3.3. Target Hardware Configuration
2.2.3.4. Target Software Configuration
2.2.3.5. Homogeneous vs. Heterogeneous Storage
2.2.3.6. Multi-vendor Storage Environment
2.2.3.7. Target Data Capacity
2.2.3.8. Other
2.2.4. Determine Stakeholder Requirements
2.2.5. Determine User Expectations
2.2.6. Determine Data Security Requirements
2.2.6.1. Source Data Protection (Recoverability)
2.2.6.2. Access to Migrating Data
2.2.6.3. Access to Migration Environment
2.2.6.4. Access to Documentation
2.2.6.5. Access to Legacy Environment
2.2.6.6. Access to Interim Environment
2.2.6.7. Access to Target Environment
2.2.6.8. Other
2.2.7. Consider Future Requirements
1.2.1.1. Future Capacity Growth
1.2.1.2. Durability of Target Data Storage
1.2.1.3. Migratability of Target Data Storage
1.2.1.4. Re-usability of Target Data Storage
1.2.1.5. Scalability of Target Data Storage
1.2.1.6. Other
2.3 Current Environment
2.3.1 Existing Data Related Artifacts
2.3.1.1 Availability of Data Architecture Documents
2.3.2 Blueprint Current Data Architecture
2.3.2.1 Availability of Current Data Storage
2.3.3 Profile Legacy Data
2.3.4 Determine IT Infrastructure Requirements
2.4 Develop Data Migration Plan
2.4.1 Identify Technology Options
2.4.2 Determine Data Migration Method
2.4.3 Plan Data Content Management Strategy
2.4.3.1 Data Conversion Plan
2.4.3.2 Data Quality Metrics
2.4.3.3 Data Loss Tolerance
2.4.3.4 Data Quality Remediation Plan
2.4.3.5 Data Integration / Reconciliation Plan
2.4.3.6 Data Archival Strategy
2.4.3.7 Other
2.4.4 Plan Parallel Operation
2.4.5 Plan Data Security Strategy
2.4.6 Develop Data Migration Plan
P C A N/A
3. Analysis & Design
3.1 Analysis
3.1.1 Analyze Current Environment
3.1.2 Evaluate Data Migration Technology
3.1.3 Evaluate Data Quality (Data Profiling)
3.1.4 Correlate Data to Business Processes
2.1.5 Identify Critical Success Factors for Analysis & Design Phase
3.2 Determine Security Controls
3.2.1 Design Enterprise Management Controls
3.2.2 Design Operational Security Controls
3.2.2.1 Access to Migration Environment
3.2.2.2 Access to Documentation
3.2.3 Design Technical Security Controls
3.2.3.1 Access to Migrating Data
3.2.3.2 Access to Legacy Environment
3.2.3.3 Access to Interim Environment
3.2.3.4 Access to Target Environment
3.3 Design Data Environment
3.3.1 Design Security Data Architecture
3.3.2 Design Staging Area
3.3.3 Design Target Data Architecture
3.3.4 Correlate Migration Data (Source / Stage / Target)
3.3.4.1 Legacy to Staging
3.3.4.2 Legacy to Target
3.3.4.3 Staging to Target
3.3.5 Correlate Migration Data to Procedures
3.3.6 Determine Technology Configuration (ETL, RDBMS, etc.)
3.3.7 Design Migration IT Infrastructure
3.4 Design Migration Procedures
3.4.1 Design Data Security Procedures
3.4.2 Design Data Staging Procedures
3.4.3 Design Data Cleansing Procedures
1.4.4 Design Data Conversion Procedures
3.4.5 Design Target Data Migration Procedures
3.4.6 Design Data Validation Procedures
3.4.7 Design Data Quality Remediation Procedures
3.4.8 Refine Data Migration Test Plan
3.4.9 Design Data Quality Reports and Process for Reconciliation
Implementation
P C A N/A
Procedures
3.5.6 Establish Access to Staging and Target Area
3.5.7 Identify Critical Success Factors for Implementation Phase
3.6 Stage Data
3.6.1 Create Staging Area
3.6.2 Populate Staging Area
3.6.3 Integrate Staged Data
3.6.4 Validate Staged Data
3.7 Cleanse Data
3.7.1 Cleanse Data
3.7.2 Validate Cleansed Data
3.8 Convert / Transform Data
3.8.1 Convert / Transform Data
3.8.2 Validate Converted / Transformed Data
3.9 Migrate Data
3.9.1 Perform Trial Migration
3.9.2 Validate Results of Trial Migration
3.9.3 Obtain Approval for Full Migration
3.9.4 Perform Full Migration
3.9.5 Validate Full Migration
3.10 Post-Migration
3.10.1 Operate legacy and target environment in parallel
3.10.2 Validate parallel operation and data
3.10.3 Release Data Environments
4. Data Migration Closeout
4.1 Document Data Migration Results
4.2 Identify Critical Success Factors for Data Migration Closeout Phase
4.3 Document Data Migration Lessons Learned
4.4 Perform Knowledge Transfer
4.5 Communicate Data Migration and Lessons Learned
Legend: P – Planned
C – Completed
A – Accepted
N/A- Not applicable
1.2. Data Migration Plan outlining how the data migration from the
Plan data source to the target system is planned for.
It includes the plan for ensuring that post-
migration data content satisfies the requirements
of the target data environment. This document
supplements the overall Project Plan and
consists of multiple deliverables:
1.2.1. Data The plan for converting the form and content of
Conversion source data into satisfactory target data.
Plan
1.2.2. Data Quality Documentation describing the standards of
Metrics measurement to be applied to the content data
before, during, and after migration. This
document includes information regarding the
acceptable level(s), if any, of lost data content or
meaning as a result of source data undergoing a
change in content or form.
1.2.3. Data Quality The plan for correcting any identified data quality
Remediatio issues impacting or resulting from the data
n Plan migration. (RE: 3.4.4.4.3 Data Migration Plan)
1.2.4. Integration / The plan for integrating and reconciling all the
Reconciliati different source data system into one set of
on satisfactory target data.
1.2.5. Data The plan for validating the successful movement
Migration of source data to the new target data store.
Test Plan
1.2.6. Data The strategy for archiving historical data that is
Archival no longer needed.
Strategy
1.3. Change Documentation showing the process by which
Management version control of project documentation and
Plan configuration management will be performed
such that all results meet the highest reasonable
expectations of quality.
1.4. QA Plan Documentation showing the process by which
the project team shall ensure that all activities
are performed such that all results meet the
highest reasonable expectations of quality.
1.5. Communications Documentation showing how information shall be
3. Implementation
3.1. Fully developed Documentation and code of the data migration
and tested Data procedures and related test results.
Migration
Procedures
3.2. Fully developed Documentation and code of the data validation
and tested Data procedures and related test results.
Validation
Procedures
3.3. Fully developed Documentation and code of the data cleansing
and tested Data procedures and related test results.
Cleansing
Procedures
3.4. Fully developed Documentation and code of the data
and tested Data conversion/transformation procedures and
Conversion/Tran related test results.
sformation
Procedures
3.5. Data Cleansing Documentation of data cleansing findings.
Report
3.6. Data Conversion Documentation of the executed data conversion.
Report
3.7. Trial Migration Documentation of the executed trial data
Results migration(s).
3.8. Acceptance / Documentation demonstrating the stakeholder
Approval approval of the data migration procedures
Documentation readiness for deployment to production.
3.9. Full Migration Documentation of the results of the full data
Results migration.
3.10. Parallel Documentation outlining the results of the
Operations parallel operation of the old and new data store.
Report This report enables stakeholders to decide
whether the parallel operation can be ended.
4. Close-out
4.1. Data Migration Documentation describing
Results • Statistics of the data migration such as
actual data quality and volume
measurements, downtime, data loss,
etc.;
• Unresolved issues;
• [Others]
The actual content of this artifact shall be
determined by the overall scope of the data
migration. Complex migration efforts would
naturally require more extensive reporting, while
more basic migrations may require less content.
Please note: Based on the disposition
instruction within the NARA General Records
Schedule, data validation artifacts
(Test/Acceptance Plan) should be kept for 5
years AFTER a system is superseded by a new
iteration, or is terminated, defunded, or no longer
needed for agency/IT administrative purposes.
Legend: P – Planned
C – Completed
A – Accepted
N/A- Not applicable