AWS Database Migration Service Best Practices
AWS Database Migration Service Best Practices
AWS Database Migration Service Best Practices
Best Practices
August 2016
August 2016
2016, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Notices
This document is provided for informational purposes only. It represents AWSs current product
offerings and practices as of the date of issue of this document, which are subject to change
without notice. Customers are responsible for making their own independent assessment of the
information in this document and any use of AWSs products or services, each of which is
provided as is without warranty of any kind, whether express or implied. This document does
not create any warranties, representations, contractual commitments, conditions or assurances
from AWS, its affiliates, suppliers or licensors. The responsibilities and liabilities of AWS to its
customers are controlled by AWS agreements, and this document is not part of, nor does it
modify, any agreement between AWS and its customers.
Page 2 of 17
August 2016
Contents
Abstract
Introduction
Instance Class
Storage
Multi-AZ
Source Endpoint
Target Endpoint
Task
Migration Type
LOB Controls
Enable Logging
Monitoring Your Tasks
10
Host Metrics
10
10
Table Metrics
10
Performance Expectations
11
Increasing Performance
11
11
11
11
12
12
12
13
10
13
August 2016
13
How Much Load Will the Migration Process Add to My Source Database?
14
14
14
14
15
15
15
16
When Should I Use a Native Replication Mechanism Instead of the DMS and the AWS Schema
Conversion Tool?
16
What Is the Maximum Size of Database That DMS Can Handle?
16
17
Conclusion
17
Contributors
17
Abstract
Today, as many companies move database workloads to Amazon Web Services (AWS), they are
often also interested in changing their primary database engine. Most current methods for
migrating databases to the cloud or switching engines require an extended outage. The AWS
Database Migration Service helps organizations to migrate database workloads to AWS or
change database engines while minimizing any associated downtime. This paper outlines best
practices for using AWS DMS.
Introduction
AWS Database Migration Service allows you to migrate data from a source database to a target
database. During a migration, the service tracks changes being made on the source database so
that they can be applied to the target database to eventually keep the two databases in sync.
Although the source and target databases can be of the same engine type, they dont need to
be. The possible types of migrations are:
1. Homogenous migrations (migrations between the same engine types)
2. Heterogeneous migrations (migrations between different engine types)
Page 4 of 17
August 2016
At a high level, when using AWS DMS a user provisions a replication server, defines source and
target endpoints, and creates a task to migrate data between the source and target databases. A
typical task consists of three major phases: the full load, the application of cached changes, and
ongoing replication.
During the full load, data is loaded from tables on the source database to tables on the target
database, eight tables at a time (the default). While the full load is in progress, changes made to
the tables that are being loaded are cached on the replication server; these are the cached
changes. Its important to know that the capturing of changes for a given table doesnt begin
until the full load for that table starts; in other words, the start of change capture for each
individual table will be different. After the full load for a given table is complete, you can begin
to apply the cached changes for that table immediately. When ALL tables are loaded, you begin
to collect changes as transactions for the ongoing replication phase. After all cached changes are
applied, your tables are consistent transactionally and you move to the ongoing replication
phase, applying changes as transactions.
Upon initial entry into the ongoing replication phase, there will be a backlog of transactions
causing some lag between the source and target databases. After working through this backlog,
the system will eventually reach a steady state. At this point, when youre ready, you can:
AWS DMS will create the target schema objects that are needed to perform the migration.
However, AWS DMS takes a minimalist approach and creates only those objects required to
efficiently migrate the data. In other words, AWS DMS will create tables, primary keys, and in
some cases, unique indexes. It will not create secondary indexes, non-primary key constraints,
data defaults, or other objects that are not required to efficiently migrate the data from the
source system. In most cases, when performing a migration, you will also want to migrate most
or all of the source schema. If you are performing a homogeneous migration, you can
accomplish this by using your engines native tools to perform a no-data export/import of the
schema. If your migration is heterogeneous, you can use the AWS Schema Conversion Tool
(AWS SCT) to generate a complete target schema for you.
Page 5 of 17
August 2016
Instance Class
Some of the smaller instance classes are sufficient for testing the service or for small migrations.
If your migration involves a large number of tables, or if you intend to run multiple concurrent
replication tasks, you should consider using one of the larger instances because the service
consumes a fair amount of memory and CPU.
Storage
Depending on the instance class, your replication server will come with either 50 GB or 100 GB
of data storage. This storage is used for log files and any cached changes that are collected
during the load. If your source system is busy or takes large transactions, or if youre running
multiple tasks on the replication server, you might need to increase this amount of storage.
However, the default amount is usually sufficient.
Note All storage volumes in AWS DMS are GP2 or General Purpose SSDs. GP2
volumes come with a base performance of three I/O Operations Per Second
(IOPS), with abilities to burst up to 3,000 IOPS on a credit basis. As a rule of
thumb, check the ReadIOPS and WriteIOPS metrics for the replication instance
and be sure the sum of these values does not cross the base performance for
that volume.
Page 6 of 17
August 2016
Multi-AZ
Selecting a Multi-AZ instance can protect your migration from storage failures. Most migrations
are transient and not intended to run for long periods of time. If youre using AWS DMS for
ongoing replication purposes, selecting a Multi-AZ instance can improve your availability should
a storage issue occur.
Source Endpoint
The change capture process, used when replicating ongoing changes, collects changes from the
database logs by using the database engines native API, no client side install is required. Each
engine has specific configuration requirements for exposing this change stream to a given user
account (for details, see the AWS Key Management Service documentation). Most engines
require some additional configuration to make the change data consumable in a meaningful way
without data loss for the capture process. (For example, Oracle requires the addition of
supplemental logging, and MySQL requires row-level bin logging.)
Target Endpoint
Whenever possible, AWS DMS attempts to create the target schema for you, including
underlying tables and primary keys. However, sometimes this isnt possible. For example, when
the target is Oracle, AWS DMS doesnt create the target schema for security reasons. In MySQL,
you have the option through extra connection parameters to have AWS DMS migrate objects to
the specified database or to have AWS DMS create each database for you as it finds the
database on the source.
Note For the purposes of this paper, in Oracle a user and schema are
synonymous. In MySQL, schema is synonymous with database. Both SQL Server
and Postgres have a concept of database AND schema. In this paper, were
referring to the schema.
Page 7 of 17
August 2016
Task
The following section highlights common and important options to consider when creating a
task.
Migration Type
Migrate existing data. If you can afford an outage thats long enough to copy your
existing data, this is a good option to choose. This option simply migrates the data from
your source system to your target, creating tables as needed.
Migrate existing data and replicate ongoing changes. This option performs a full data
load while capturing changes on the source. After the full load is complete, captured
changes are applied to the target. Eventually, the application of changes will reach a
steady state. At that point, you can shut down your applications, let the remaining
changes flow through to the target, and restart your applications to point at the target.
Replicate data changes only. In some situations it may be more efficient to copy the
existing data by using a method outside of AWS DMS. For example, in a homogeneous
migration, using native export/import tools can be more efficient at loading the bulk
data. When this is the case, you can use AWS DMS to replicate changes as of the point in
time at which you started your bulk load to bring and keep your source and target
systems in sync. When replicating data changes only, you need to specify a time from
which AWS DMS will begin to read changes from the database change logs. Its important
to keep these logs available on the server for a period of time to ensure AWS DMS has
access to these changes. This is typically achieved by keeping the logs available for 24
hours (or longer) during the migration process.
Page 8 of 17
August 2016
prep mode is set to do nothing, any data that exists in the target tables is left as is. This can be
useful when consolidating data from multiple systems into a single table using multiple tasks.
AWS DMS performs these steps when it creates a target table:
The source database column data type is converted into an intermediate AWS DMS data
type.
The AWS DMS data type is converted into the target data type.
This data type conversion is performed for both heterogeneous and homogeneous migrations.
In a homogeneous migration, this data type conversion may lead to target data types not
matching source data types exactly. For example, in some situations its necessary to triple the
size of varchar columns to account for multi-byte characters. We recommend going through the
AWS DMS documentation on source and target data types to see if all the data types you use
are supported. If the resultant data types arent to your liking when youre using AWS DMS to
create your objects, you can pre-create those objects on the target database. If you do precreate some or all of your target objects, be sure to choose the truncate or do nothing options
for target table preparation mode.
LOB Controls
Due to their unknown and sometimes large size, large objects (LOBs) require more processing
and resources than standard objects. To help with tuning migrations of systems that contain
LOBs, AWS DMS offers the following options:
Page 9 of 17
Dont include LOB columns. When this option is selected, tables that include LOB
columns are migrated in full, however, any columns containing LOBs will be omitted.
Full LOB mode. When you select full LOB mode, AWS DMS assumes no information
regarding the size of the LOB data. LOBs are migrated in full, in successive pieces, whose
size is determined by the LOB chunk size. Changing the LOB chunk size affects the
memory consumption of AWS DMS; a large LOB chunk size requires more memory and
processing. Memory is consumed per LOB, per row. If you have a table containing three
LOBs, and are moving data 1,000 rows at a time, an LOB chunk size of 32 k will require
3*32*1000 = 96,000 k of memory for processing. Ideally, the LOB chunk size should be
set to allow AWS DMS to retrieve the majority of LOBs in as few chunks as possible. For
example, if 90 percent of your LOBs are less than 32 k, then setting the LOB chunk size
to 32 k would be reasonable, assuming you have the memory to accommodate the
setting.
Limited LOB mode. When limited LOB mode is selected, any LOBs that are larger than
max LOB size are truncated to max LOB size and a warning is issued to the log file. Using
limited LOB mode is almost always more efficient and faster than full LOB mode. You
can usually query your data dictionary to determine the size of the largest LOB in a
table, setting max LOB size to something slightly larger than this (dont forget to account
for multi-byte characters). If you have a table in which most LOBs are small, with a few
August 2016
large outliers, it may be a good idea to move the large LOBs into their own table and use
two tasks to consolidate the tables on the target.
LOB columns are transferred only if the source table has a primary key or a unique index on
the table. Transfer of data containing LOBs is a two-step process:
1. The containing row on the target is created without the LOB data.
2. The table is updated with the LOB data.
The process was designed this way to accommodate the methods source database engines
use to manage LOBs and changes to LOB data.
Enable Logging
Its always a good idea to enable logging because many informational and warning messages are
written to the logs. However, be advised that youll incur a small charge, as the logs are made
accessible by using Amazon CloudWatch.
Find appropriate entries in the logs by looking for lines that start with the following:
You can use grep (on UNIX-based text editors) or search (for Windows-based text editors) to find
exactly what youre looking for in a huge task log.
Host Metrics
You can find host metrics on your replication instances monitoring tab. Here, you can monitor
whether your replication instance is sized appropriately.
Table Metrics
Individual table metrics can be found under the table statistics tab for each individual task.
These metrics include: the number of rows loaded during the full load; the number of inserts,
updates, and deletes since the task started; and the number of DDL operations since the task
started.
Page 10 of 17
August 2016
Performance Expectations
There are a number of factors that will affect the performance of your migration: resource
availability on the source, available network throughput, resource capacity of the replication
server, ability of the target to ingest changes, type and distribution of source data, number of
objects to be migrated, and so on. In our tests, we have been able to migrate a terabyte of data
in approximately 1213 hours (under ideal conditions). Our tests were performed using source
databases running on EC2, and in Amazon RDS with target databases in RDS. Our source
databases contained a representative amount of relatively evenly distributed data with a few
large tables containing up to 250 GB of data.
Increasing Performance
The performance of your migration will be limited by one or more bottlenecks you encounter
along the way. The following are a few things you can do to increase performance.
Page 11 of 17
August 2016
host statistics of your replication server can help you determine whether this
might be a good option.
Important In LOB processing, LOBs are migrated using a two-step process: first,
the containing row is created without the LOB, and then the row is updated
with the LOB data. Therefore, even if the LOB column is NOT NULLABLE on the
source, it must be nullable on the target during the migration.
Note Using batch optimized apply will almost certainly violate referential
integrity constraints. Therefore, you should disable them during the migration
process and enable them as part of the cutover process.
Page 12 of 17
August 2016
Perform an Assessment
In an assessment, you determine the basic framework of your migration and discover things in
your environment that youll need to change to make a migration successful. The following are
some questions to ask:
Page 13 of 17
August 2016
Page 14 of 17
August 2016
intended to be used with one of these methods to perform a complete migration of your
database.
Modernization. The customer wants to use a modern framework or platform for their
application portfolio, and these platforms are available only on more modern SQL or
NoSQL database engines.
License fees. The customer wants to migrate to an open source engine to reduce license
fees.
Page 15 of 17
August 2016
Page 16 of 17
August 2016
Generic EC2 Classic to VPC Migration Guide: Migrating from a Linux Instance in EC2Classic to a Linux Instance in a VPC
Specific Procedures for RDS: Moving a DB Instance Not in a VPC into a VPC
Conclusion
This paper outlined best practices for using AWS DMS to migrate data from a source database to
a target database, and offers answers to several frequently asked questions about migrations.
As companies move database workloads to AWS, they are often also interested in changing their
primary database engine. Most current methods for migrating databases to the cloud or
switching engines require an extended outage. The AWS DMS helps to migrate database
workloads to AWS or change database engines while minimizing any associated downtime.
Contributors
The following individuals and organizations contributed to this document:
Page 17 of 17