Module 10-Data Protection - Participant Guide
Module 10-Data Protection - Participant Guide
PROTECTION
PARTICIPANT GUIDE
PARTICIPANT GUIDE
Table of Contents
Data Deduplication................................................................................................... 37
Data Deduplication Overview ............................................................................................. 38
Key Benefits of Data Deduplication .................................................................................... 40
Data Deduplication Method: Source-Based........................................................................ 41
Data Deduplication Method: Target-Based......................................................................... 42
Data Deduplication: Additional Information......................................................................... 43
Concepts in Practice................................................................................................ 65
Concepts in Practice .......................................................................................................... 66
Module Objectives
Data Replication
Replication
Servers
Data Center B
Replication
Data Replication
Data Center A
• Replicas1 are used to restore and restart operations when there is data loss.
• Data can be replicated to one or more locations.
− For example, the production data is copied from the source (primary
storage) to the target. The target can be other storage in the same data
center, storage in a different data center, or to the cloud.
Use of Replicas
Replicas
Data Replication
Data migration
Source
Types of Replication
Remote Replication
Data is replicated within a
Storage storage system in a
System storage-based replication
Storage System Storage System
Data is replicated within
Storage a data center from one
System system to another in a Remote Remote
compute-based Replication Replication
replication Cloud
Data Center
Local replication (Click image to enlarge) Remote replication (Click image to enlarge)
Snapshots can establish recovery points in a small fraction of time and can reduce
Recovery Point Objective (RPO) by supporting more frequent recovery points. If a
file is lost or corrupted, it can typically be restored from the latest snapshot data in
a few seconds.
VM Snapshot
• Write is committed to both the source and the remote replica before it is
acknowledged to the compute system.
• Synchronous replication enables restarting business operations at a remote site
with zero data loss and provides zero RPO.
Compute
1. Write I/O is received from compute system
into cache of source and placed in queue.
2. Write I/O is transmitted to the cache of the
1 4 target storage.
3. Receipt acknowledgement is provided by the
2 target storage back to the source.
Storage Storage
4. Source storage sends an acknowledgement
(Source) (Target)
back to the compute system.
3
Compute
• Continuous Data Protection provides the capability to restore data and VMs to
any previous point-in-time (PIT).
2: Continuous Data Protection solutions have the capability to replicate data across
heterogeneous storage systems.
3: Continuous Data Protection supports both local and remote replication of data
and VMs to meet operational and disaster recovery respectively.
5: Continuous Data Protection supports multisite replication, where the data can be
replicated to more than two sites using synchronous and asynchronous replication.
Click here to learn more about the local and remote Continuous Data Protection
replication.
Knowledge Check
Knowledge Check
1. Which provides the ability to create fully populated point-in-time copies of LUNs
within a storage system or create a copy of an existing VM?
a. Clone
b. Snapshot
c. Pointer-based virtual replica
d. Full volume virtual replica
Data Backup
Backup Overview
A Backup is an additional copy of production data, which is created and retained for
the sole purpose of recovering lost or corrupted data.
Backup Architecture
Backup Operation
Recovery Operation
After the data is backed up, it can be restored when required. A recovery operation
restores data to its original state at a specific Point in Time (PIT). Typically, backup
applications support restoring one or more individual files, directories, or VMs.
Backup Granularities
Full Backup
• Full backup copies all data on the production volume to a backup device.
Full Backup-Restore
For example, a full backup is created every Sunday. When there is a production
data loss on Monday, the recent full backup that was created on the previous
Sunday is used to restore the data into the production environment.
Incremental Backup
Incremental backup copies the data that has changed since the last backup.
• The main advantage of incremental backups is that fewer files are backed up
daily, allowing for shorter backup windows2.
• Click here3 to view the example of incremental backup.
Cumulative Backup
Cumulative (differential) backup copies the data that has changed since the last full
backup.
4 For example, the administrator creates a full backup on Sunday and differential
backups for the rest of the week. The backup that is created on Monday would
contain all the data that has changed since Sunday. It would therefore be identical
to an incremental backup at this point. On Tuesday, however, the differential
backup would backup any data that had changed since Sunday (full backup). The
advantage that differential backups have over incremental is shorter restore times.
Restoring a differential backup never requires more than two copies. The tradeoff is
that as time progresses, a differential backup can grow to contain more data than
an incremental backup. Suppose that an administrator wants to restore the backup
from Tuesday. The administrator must first restore the full backup that was created
on Sunday. After that, the administrator must restore the backup created on
Tuesday.
Agent-Based Backup
Backup Server
Application Servers
Backup Device
Image-Based Backup
Image-based backup makes a copy of the virtual machine disk and configuration
that is associated with a particular VM. The backup is saved as a single entity
called a VM image.
Create Snapshot
VM Management Server
A
VM Snapshot Proxy Server
Application Servers
VM File System Volume Backup Data
Backup Data
Backup Server
Backup Device
Cloud Resources
Data Center
Organizations must regularly protect the data to avoid losses, stay compliant, and
preserve data integrity. They may face challenges with IT budget, and IT
management. These challenges can be addressed with the emergence of cloud-
based data protection which:
To view the demo of performing backup and recovery using Dell EMC NetWorker, click here.
Knowledge Check
Knowledge Check
1. Which backup component manages the backup operations and maintains the
backup catalog?
a. Backup client
b. Backup target
c. Backup server
d. Backup device
Data Deduplication
Deduplication
Data Deduplication is the process of detecting and identifying the unique data
segments within a given set of data to eliminate redundancy.
5The ratio of data before deduplication to the amount of data after deduplication.
This ratio is typically depicted as “ratio:1” or “ratio X” (10:1 or 10 X). For example, if
200 GB of data consumes 20 GB of storage capacity after data deduplication, the
space reduction ratio is 10:1.
1 2 4
2: As data deduplication reduces the amount of content in the daily backup, users
can extend their retention policies. This approach can have a significant benefit to
users who require longer retention.
4: By using data deduplication at the client, redundant data is removed before the
data is transferred over the network. This approach reduces the network bandwidth
that is required for sending backup data to a remote site for DR purpose.
Deduplication at Target
VMs
Deduplication
Hypervisor Server
Deduplication Appliance
Knowledge Check
Knowledge Check
Data Archiving
• The data archiving operation involves the archiving agent, the archive server
(policy engine), and the archive storage.
• Archiving agent scans primary storage to find files that meet the archiving
policy.
− The archive server indexes the files.
• Once the files have been indexed, they are moved to archive storage and small
stub6 files are left on the primary storage.
6The stub file contains the address of the archived file. As the size of the stub file is
small, it saves space on primary storage.
• Email archiving is the process of archiving email messages from the mail server
to an archive storage.
− After the email is archived, it is retained for years, based on the retention
policy.
Legal Dispute
Government Compliance
− For example, an organization must produce all email messages from all
individuals that are involved in stock sales or transfers. Failure to comply
with these requirements could cause an organization to incur penalties.
• Email archiving provides more mailbox space by moving old email messages to
archive storage.
Knowledge Check
Knowledge Check
1. Which archiving component scans primary storage to find files that meet the
archiving policy?
a. Archiving agent
b. Archiving storage
c. Archiving client
d. Archiving policy engine
Data Migration
Hypervisor-Based Migration
VM Migrations
In this type of migration, virtual machines (VMs) are moved from one physical
compute system to another without any downtime. VM migration method enables:
VM Storage Migration
In a VM storage migration, VM files are moved from one storage system to another
system without any downtime or service disruption.
SAN-based Migration
Push
Control Remote
Device Device
Pull
NAS-based Migration
• NAS-based migration moves file-level data between NAS systems over LAN or
WAN.
In this example, the new NAS system initiates the migration operation and pulls the
data directly from the old NAS system over the LAN. The key advantage of NAS to
NAS direct data migration is that there is no need for an external component (host
or appliance) to perform or initiate the migration process.
• While the files are being moved, clients can access their files non-disruptively.
− Clients can also read their files from the old location and write them back to
the new location without realizing that the physical location has changed.
• A virtualization appliance creates a virtualization layer that eliminates the
dependencies between the data that is accessed at the file level and the
location where the files are physically stored.
Knowledge Check
Knowledge Check
1. Which migration moves file-level data between file servers over LAN or WAN?
a. NAS-based
b. Byte-based
c. SAN-based
d. Block-based
Concepts in Practice
Concepts in Practice
Dell NetWorker
DPA is a reporting and analytics platform that provides full visibility into the
effectiveness of your data protection strategy. It can automate and centralize the
collection and analysis of all data.
• Features include:
− Systems can scale to Petabyte of usable capacity.
− Cloud long-term retention and cloud DR-ready.
− Provides VMware integration.
• PowerProtect Appliance supports native Cloud DR with end-to-end
orchestration.
TimeFinder SnapVX is a local replication solution for PowerMax, VMAX All Flash
storage systems with cloud scalable snaps and clones to protect data. SnapVX
solution:
Dell SRDF
Dell RecoverPoint
− Enables Continuous Data Protection for any point in time (PIT) recovery to
optimize RPO and RTO.
− Provides synchronous (sync) or asynchronous (async) replication policies.
• VMware vSphere High Availability (HA) leverages multiple ESXi hosts that are
configured as a cluster to provide rapid recovery from outages.
− Provides high availability for applications running in virtual machines.
− Protects against a server failure by restarting the virtual machines on other
hosts within the cluster.
− Protects against application failure by continuously monitoring a virtual
machine and resetting it if a failure is detected.
• VMware vSphere Fault Tolerance (FT) provides a higher level of availability.
− Enables users to protect any virtual machine from a host failure with no loss
of data, transactions, or connections.
− Provides continuous availability by ensuring that the states of the Primary
and Secondary VMs are identical at any point in time.
− If either the host running the Primary VM or the host running the Secondary
VM fails, an immediate and transparent failover occurs.
Dell Cloud Tier provides a solution for long-term retention. Using advanced
deduplication technology that reduces storage footprints, unique data is sent to the
cloud and data lands on the cloud object storage already deduplicated. Cloud
tiering:
Dell Avamar
• Enables fast, efficient backup and recovery through its integrated variable-
length deduplication technology.
• Optimized for fast, daily full backups of physical and virtual environments, NAS
servers, enterprise applications, remote offices and desktops/laptops.
• Proven backup and recovery software that delivers secure data protection for
cloud, remote offices, desktops, laptops, and data centers.
Scenario
Challenges
Requirements
Deliverables
Solutions
Alternative source for backup: Under normal backup operations, data is read
from the production LUNs and written to the backup device. This approach places
an additional burden on the production infrastructure because production LUNs are
simultaneously involved in production operations and servicing data for backup
operations. To avoid this situation, a replica can be created from a production LUN
and it can be used as a source to perform backup operations. This approach
alleviates the backup I/O workload on the production LUNs.
Fast recovery and restart: For critical applications, replicas can be taken at short,
regular intervals. This method allows easy and fast recovery from data loss. If a
complete failure of the source (production) LUN occurs, the replication solution
enables one to restart the production operation on the replica to reduce the RTO.
Testing platform: Replicas are also used for testing new applications or upgrades.
For example, an organization may use the replica to test the production application
upgrade; if the test is successful, the upgrade may be implemented on the
production environment.
Data migration: Another use for a replica is data migration. Data migrations are
performed for various reasons such as migrating from a smaller capacity LUN to
one of a larger capacity for newer versions of the application.
Multiple snapshots can be created from the same source LUN for various business
requirements. Some snapshot software provides the capability of automatic
termination of a snapshot upon reaching the expiration date. The unavailability of
the source device invalidates the data on the target. The storage system-based
snapshot uses a Redirect on Write (RoW) mechanism.
RoW redirects new writes that are destined for the source LUN to an alternate
location in the same storage pool. In RoW, a new write from source compute
system is written to a new location (redirected) inside the pool. The original data
remains where it is, and is therefore read from the original location on the source
LUN and is untouched by the RoW process.
In synchronous replication, writes must be committed to the source and the remote
target prior to acknowledging “write complete” to the production compute system.
Another write on the source cannot occur until each preceding write has been
completed and acknowledged.
This approach ensures that data is identical on the source and the target. Further,
writes are transmitted to the remote site exactly in the order in which they are
received at the source. Write ordering is maintained and it ensures transactional
consistency when the applications are restarted at the remote location. As a result,
the remote images are always restartable copies.
In asynchronous replication, RPO depends on the size of the buffer, the available
network bandwidth, and the write workload to the source. This replication can take
advantage of locality of reference (repeated writes to the same location). If the
same location is written multiple times in the buffer prior to transmission to the
remote site, only the final version of the data is transmitted. This feature conserves
link bandwidth.
Typically, the replica is synchronized with the source, and then the replication
process starts. After the replication starts, all the writes from the compute system to
the source (production volume) are split into two copies. One copy is sent to the
local Continuous Data Protection appliance at the source site, and the other copy is
sent to the production volume. Then the local appliance writes the data to the
journal at the source site and the data in turn is written to the local replica. If a file is
accidentally deleted, or the file is corrupted, the local journal enables organizations
to recover the application data to any PIT.
In remote replication, the local appliance at the source site sends the received write
I/O to the appliance at the remote (DR) site. Then, the write is applied to the journal
volume at the remote site. As a next step, data from the journal volume is sent to
the remote replica at predefined intervals. Continuous Data Protection operates in
either synchronous or asynchronous mode.
The role of a backup client is to gather the data that must be backed up and send
it to the storage node. The backup client can be installed on application servers,
mobile clients, and desktops. It also sends the tracking information to the backup
server.
The backup server manages the backup operations and maintains the backup
catalog, which contains information about the backup configuration and backup
metadata. The backup configuration contains information about when to run
backups, which client data to be backed up, and so on. The backup metadata
contains information about the backed-up data. The storage node is responsible
for organizing the client’s data and writing the data to a backup device. A storage
node controls one or more backup devices.
In most implementations, the storage node and the backup server run on the same
system. Backup devices may be attached directly or through a network to the
storage node. The storage node sends the tracking information about the data that
is written to the backup device to the backup server. Typically this information is
used for recoveries. Backup targets include tape, disk, virtual disk library, and the
cloud.
This backup is used for restoring an entire VM if there is any hardware failure or
human error. It is also possible to restore individual files and folders within a virtual
machine.
In an image-level backup, the backup software can backup VMs without installing
backup agents inside the VMs or at the hypervisor-level. Proxy server performs the
backup operations, and it acts as the backup client. The proxy server offloads the
backup processing from the VMs.
The proxy server then performs backup by using the snapshot. Performing an
image-level backup of a virtual machine disk enables running a bare metal restore
of a VM.
Some vendors support changed block tracking. This feature identifies and tags any
blocks that have changed since the last VM snapshot. This method enables the
backup application to backup only the blocks that have changed, rather than
backing up every block.
Many files are common across multiple systems in a data center environment.
Many users across an environment store identical files such as Word documents,
Microsoft PowerPoint presentations, and Excel spreadsheets. Backups of these
systems contain many identical files. Also, many users keep multiple versions of
files that they are working on. Many of these files differ only slightly from other
versions, but are seen by backup applications as new data that must be protected.
Due to this redundant data, organizations are facing many challenges. Backing up
redundant data increases the amount of storage that is required to protect the data
and then increases the storage infrastructure cost. It is important for organizations
to protect the data within the budget constraints. Organizations are running out of
backup window time and facing difficulties meeting recovery objectives. Backing up
large amount of duplicate data at the remote site or cloud for DR purposes is also
cumbersome and requires huge bandwidth.
Data in primary storage is actively accessed and changed. As data ages, it is less
likely to change and eventually becomes “fixed” but continues to be accessed by
applications and users. This data is called fixed content. Fixed content is growing at
over 90 percent annually. Keeping the fixed content in primary storage systems
poses several challenges.
Today, organizations adopt cloud, and move data to secondary storage in the
cloud. Potential benefits of cloud archiving include lower cost and easier access.
While potential drawbacks include less control and more complex compliance
management.