Data Domain Student Guide
Data Domain Student Guide
Administration
Student Resource Guide
March 2017
Copyright
Copyright © 2017 Dell Inc. or its subsidiaries. All Rights Reserved. Dell, EMC, and other trademarks are
trademarks of Dell Inc. or its subsidiaries. Other trademarks may be the property of their respective owners.
Published in the USA March 2017
Dell EMC believes the information in this document is accurate as of its publication date. The information is
subject to change without notice.
Copyright © 2017 Dell Inc. or its subsidiaries. All Rights Reserved. Dell, EMC, and other trademarks are trademarks
of Dell Inc. or its subsidiaries. Other trademarks may be the property of their respective owners. Published in the
USA.
THE INFORMATION IN THIS PUBLICATION IS PROVIDED “AS IS.” DELL EMC MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO
THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR
PURPOSE.
Use, copying, and distribution of any DELL EMC software described in this publication requires an applicable software license. The trademarks, logos, and service marks
(collectively "Trademarks") appearing in this publication are the property of DELL EMC Corporation and other parties. Nothing contained in this publication should be construed
as granting any license or right to use any Trademark without the prior written permission of the party that owns the Trademark.
AccessAnywhere Access Logix, AdvantEdge, AlphaStor, AppSync ApplicationXtender, ArchiveXtender, Atmos, Authentica, Authentic Problems, Automated Resource Manager,
AutoStart, AutoSwap, AVALONidm, Avamar, Aveksa, Bus-Tech, Captiva, Catalog Solution, C-Clip, Celerra, Celerra Replicator, Centera, CenterStage, CentraStar, EMC
CertTracker. CIO Connect, ClaimPack, ClaimsEditor, Claralert ,CLARiiON, ClientPak, CloudArray, Codebook Correlation Technology, Common Information Model, Compuset,
Compute Anywhere, Configuration Intelligence, Configuresoft, Connectrix, Constellation Computing, CoprHD, EMC ControlCenter, CopyCross, CopyPoint, CX, DataBridge ,
Data Protection Suite. Data Protection Advisor, DBClassify, DD Boost, Dantz, DatabaseXtender, Data Domain, Direct Matrix Architecture, DiskXtender, DiskXtender 2000, DLS
ECO, Document Sciences, Documentum, DR Anywhere, DSSD, ECS, elnput, E-Lab, Elastic Cloud Storage, EmailXaminer, EmailXtender , EMC Centera, EMC ControlCenter,
EMC LifeLine, EMCTV, Enginuity, EPFM. eRoom, Event Explorer, FAST, FarPoint, FirstPass, FLARE, FormWare, Geosynchrony, Global File Virtualization, Graphic
Visualization, Greenplum, HighRoad, HomeBase, Illuminator , InfoArchive, InfoMover, Infoscape, Infra, InputAccel, InputAccel Express, Invista, Ionix, Isilon, ISIS,Kazeon, EMC
LifeLine, Mainframe Appliance for Storage, Mainframe Data Library, Max Retriever, MCx, MediaStor , Metro, MetroPoint, MirrorView, Mozy, Multi-Band
Deduplication,Navisphere, Netstorage, NetWitness, NetWorker, EMC OnCourse, OnRack, OpenScale, Petrocloud, PixTools, Powerlink, PowerPath, PowerSnap, ProSphere,
ProtectEverywhere, ProtectPoint, EMC Proven, EMC Proven Professional, QuickScan, RAPIDPath, EMC RecoverPoint, Rainfinity, RepliCare, RepliStor, ResourcePak,
Retrospect, RSA, the RSA logo, SafeLine, SAN Advisor, SAN Copy, SAN Manager, ScaleIO Smarts, Silver Trail, EMC Snap, SnapImage, SnapSure, SnapView, SourceOne,
SRDF, EMC Storage Administrator, StorageScope, SupportMate, SymmAPI, SymmEnabler, Symmetrix, Symmetrix DMX, Symmetrix VMAX, TimeFinder, TwinStrata, UltraFlex,
UltraPoint, UltraScale, Unisphere, Universal Data Consistency, Vblock, VCE. Velocity, Viewlets, ViPR, Virtual Matrix, Virtual Matrix Architecture, Virtual Provisioning, Virtualize
Everything, Compromise Nothing, Virtuent, VMAX, VMAXe, VNX, VNXe, Voyence, VPLEX, VSAM-Assist, VSAM I/O PLUS, VSET, VSPEX, Watch4net, WebXtender, xPression,
xPresso, Xtrem, XtremCache, XtremSF, XtremSW, XtremIO, YottaYotta, Zero-Friction Enterprise Storage.
A Data Domain system can connect to your network via Ethernet or Fibre Channel connections.
Data Domain systems consist of three components: a controller, disk drives, and enclosures to hold the
disk drives.
Data Domain systems use Serial Advanced Technology Attachment (SATA) disk drives and Serial
Attached SCSI (SAS) drives.
By reducing storage requirements by 10 to 30x and archive storage requirements by up to 5x, Data
Domain systems can help significantly minimize the storage footprint for small enterprise/ROBO (Remote
Office/Branch Office) environments and scaling all the way up to large enterprise environments.
Also available are the ES30 and DS60 expansion shelves that can be added to most Data Domain
systems for additional storage capacity.
The Data Domain system (controller and any additional expansion shelves) is connected to storage
applications by means of VTL via Fibre Channel, or CIFS or NFS via Ethernet.
In the exploded view diagram, the Data Domain controller sits at the center of the topology implemented
through additional connectivity and system configuration, including:
• Expansion shelves for additional storage, depending on the model and site requirements
• Media server Virtual Tape Library storage via Fibre Channel
• LAN environments for connectivity for Ethernet based data storage, for basic data interactions,
and for Ethernet-based system management
For both active and retention tiers, DD OS 5.2 and later releases support ES30 shelves. DD OS 5.7 and
later support, DS60 shelves.
It uses the same form factor as the earlier ES30 expansion shelves and offers different quantities of 800
GB SAS solid state drives depending on the capacity of the active tier.
With a DD9800, the FS15 can be configured as required with either 8 or 15 disks.
When configured for high availability the DD6800 requires 2 or 5 disks and DD9300 models require 5 or 8
disks..
The FS15 SSD shelf is always counted in the number of ES30 shelf maximums but since it is only used
for metadata, it does not affect capacity.
The SSD shelf for metadata is not supported for ER and Cloud Tier use cases.
Just like a traditional Data Domain appliance, DD VE is a data protection appliance, with one primary
difference. It has no Data Domain hardware tied to it. DD VE is an all software only virtual deduplication
appliance that provides data protection in an enterprise environment. It is intended to be used as a cost
effective solution in customer remote and branch offices.
DD VE 3.0 is supported on Microsoft Hyper-V and VMWare ESXi versions 5.1, 5.5 and 6.0.
• Data Domain Secure Multi-tenancy (SMT) is the simultaneous hosting, by an internal IT department or
an external provider, of an IT infrastructure for more than one consumer or workload (business unit,
department, or Tenant).
• SMT provides the ability to securely isolate many users and workloads in a shared infrastructure, so
that the activities of one Tenant are not apparent or visible to the other Tenants.
• Conformance with IT governance and regulatory compliance standards for archived data
This is all hidden from users and applications. When the data is read, the original data is provided to the
application or user.
Deduplication performance is dependent on the amount of data, bandwidth, disk speed, CPU, and
memory or the hosts and devices performing the deduplication.
When processing data, deduplication recognizes data that is identical to previously stored data. When it
encounters such data, deduplication creates a reference to the previously stored data, thus avoiding
storing duplicate data.
Hashing algorithms yield a unique value based on the content of the data being hashed. This value is
called the hash or fingerprint, and is much smaller in size than the original data.
Different data contents yield different hashes; each hash can be checked against previously stored
hashes.
• Fixed-Length and Variable-Length are the other two methods and are Segment-Based.
File-based deduplication enables storage savings. It can be combined with compression (a way to
transmit the same amount of data in fewer bits) for additional storage savings. It is popular in desktop
backups. It can be more effective for data restores. It doesn’t need to re-assemble files. It can be included
in backup software, so an organization doesn’t have to depend on a vendor disk.
File-based deduplication results are often not as great as with other types of deduplication (such as block-
and segment-based deduplication). The most important disadvantage is there is no deduplication with
previously backed up files if the file is modified.
File-based deduplication stores an original version of a file and creates a digital signature for it (such as
SHA1, a standard for digital signatures). Future exact copy iterations of the file are pointed to the digital
signature rather than being stored.
Fixed-length segment deduplication reads data and divides it into fixed-size segments. These segments
are compared to other segments already processed and stored. If the segment is identical to a previous
segment, a pointer is used to point to that previous segment.
For data that is identical (does not change), fixed-length segment deduplication reduces storage
requirements.
When data is altered the segments shift, causing more segments to be stored. For example, when you
add a slide to a Microsoft PowerPoint deck, all subsequent blocks in the file are rewritten and are likely to
be considered as different from those in the original file, so the deduplication effect is less significant.
Smaller blocks get better deduplication than large ones, but it takes more resources to deduplicate.
In backup applications, the backup stream consists of many files. The backup streams are rarely entirely
identical even when they are successive backups of the same file system. A single addition, deletion, or
change of any file changes the number of bytes in the new backup stream. Even if no file has changed,
adding a new file to the backup stream shifts the rest of the backup stream. Fixed-sized segment
deduplication backs up large numbers of segments because of the new boundaries between the
segments.
Unlike fixed-length segment deduplication, variable-length segment deduplication uses the content of the
stream to divide the backup or data stream into segments based on the contents of the data stream.
When you apply variable-length segmentation to a data sequence, deduplication uses variable data
segments when it looks at the data sequence. In this example, byte A is added to the beginning of the
data. Only one new segment needs to be stored, since the data defining boundaries between the
remaining data were not altered.
Eventually variable-length segment deduplication will find the segments that have not changed, and
backup fewer segments than fixed-size segment deduplication. Even for storing individual files, variable
length segments have an advantage. Many files are very similar to, but not identical to, other versions of
the same file. Variable length segments will isolate the changes, find more identical segments, and store
fewer segments than fixed-length deduplication.
Inline deduplication requires less disk space than post-process deduplication. With post-process
deduplication, files are written to disk first, then they are scanned and compressed.
There is less administration for an inline deduplication process, as the administrator does not need to
define and monitor the staging space.
Inline deduplication analyzes the data in RAM, and reduces disk seek times to determine if the new data
must be stored. Writes from RAM to disk are done in full-stripe batches to use the disk more efficiently,
reducing disk access.
Source-based deduplication
• Occurs where data is created.
• Uses a host-resident agent, or API, that reduces data at the server source and sends just changed
data over the network.
• Reduces the data stream prior to transmission, thereby reducing bandwidth usage.
• DD Boost is designed to offload part of the Data Domain deduplication process to a backup server
or application client, thus using source-based deduplication.
Target-based deduplication
• Occurs where the data is stored.
• Is controlled by a storage system, rather than a host.
• Provides an excellent fit for a virtual tape library (VTL) without substantial disruption to existing
backup software infrastructure and processes.
• Works best for high change-rate environments.
SISL is used to implement Dell EMC Data Domain inline deduplication. SISL uses fingerprints and RAM to
identify segments already on disk.
SISL architecture provides fast and efficient deduplication by avoiding excessive disk reads to check if a
segment is on disk:
• 99% of duplicate data segments are identified inline in RAM before they are stored to disk.
• Scales with Data Domain systems using newer and faster CPUs and RAM.
• Increases new-data processing throughput-rate.
Local compression compresses segments before writing them to disk. It uses common, industry-standard
algorithms (for example, lz, gz, and gzfast). The default compression algorithm used by Data Domain
systems is lz.
Local compression is similar to zipping a file to reduce the file size. Zip is a file format used for data
compression and archiving. A zip file contains one or more files that have been compressed, to reduce file
size, or stored as is. The zip file format permits a number of compression algorithms. Local compression
can be turned off.
If the checksum read back does not match the checksum written to disk, the system will attempt to
reconstruct the data. If the data can not be successfully reconstructed, the backup will fail and an alert will
be issued.
Since every component of a storage system can introduce errors, an end-to-end test is the simplest way to
ensure data integrity. End-to-end verification means reading data after it is written and comparing it to
what was sent do disk, proving that it is reachable through the file system to disk, and proving that data is
not corrupted.
This ensures that the data on the disks is readable and correct and that the file system metadata
structures used to find the data are also readable and correct. This confirms that the data is correct and
recoverable from every level of the system. If there are problems anywhere, for example if a bit flips on a
disk drive, it is caught. Mostly, a problem is corrected through self-healing. If a problem can’t be corrected,
it is reported immediately, and a backup is repeated while the data is still valid on the primary store.
1. New data never overwrites existing data. (The system never puts existing data at risk.)
Traditional file systems often overwrite blocks when data changes, and then use the old block address.
The Data Domain file system writes only to new blocks. This isolates any incorrect overwrite (a software
bug problem) to only the newest backup data. Older versions remain safe.
As shown in this slide, the container log never overwrites or updates existing data. New data is written to
new containers. Old containers and references remain in place and safe even when software bugs or
hardware faults occur when new backups are stored.
2. In a traditional file system, there are many data structures (for example, free block bit maps and
reference counts) that support fast block updates. In a backup application, the workload is primarily
sequential writes of new data. Because a Data Domain system is simpler, it requires fewer data structures
to support it. New writes never overwrite old data. This design simplicity greatly reduces the chances of
software errors that could lead to data corruption.
The system includes a non-volatile RAM (NVRAM) write buffer into which it puts all data not yet safely on
disk. The file system leverages the security of this write buffer to implement a fast, safe restart capability.
The file system includes many internal logic and data structure integrity checks. If a problem is found by
one of these checks, the file system restarts. The checks and restarts provide early detection and recovery
from the kinds of bugs that can corrupt data. As it restarts, the Data Domain file system verifies the
integrity of the data in the NVRAM buffer before applying it to the file system and thus ensures that no data
is lost due to a power outage.
For example, in a power outage, the old data could be lost and a recovery attempt could fail. For this
reason, Data Domain systems never update just one block in a stripe. Following the no-overwrite policy,
all new writes go to new RAID stripes, and those new RAID stripes are written in their entirety. The
verification-after-write ensures that the new stripe is consistent (there are no partial stripe writes). New
writes never put existing backups at risk.
RAID 6:
– Protects against two disk failures.
– Protects against disk read errors during reconstruction.
– Protects against the operator pulling the wrong disk.
– Guarantees RAID stripe consistency even during power failure without reliance on NVRAM or
an uninterruptable power supply (UPS).
– Verifies data integrity and stripe coherency after writes.
By comparison, after a single disk fails in other RAID architectures, any further simultaneous
disk errors cause data loss. A system whose focus is data protection must include the extra
level of protection that RAID 6 provides.
Continuous error detection works well for data being read, but it does not address issues with data that
may be unread for weeks or months before being needed for a recovery. For this reason, Data Domain
systems actively re-verify the integrity of all data every week in an ongoing background process. This
scrub process finds and repairs defects on the disk before they can become a problem.
If a Data Domain system does have a problem, DIA file system recovery ensures that the system is
brought back online quickly.
In a traditional file system, consistency is not checked. Data Domain systems check through initial
verification after each backup to ensure consistency for all new writes. The usable size of a traditional file
system is often limited by the time it takes to recover the file system in the event of some sort of
corruption.
Imagine running fsck on a traditional file system with more than 80 TB of data. The reason the checking
process can take so long is the file system needs to sort out the locations of the free blocks so new writes
do not accidentally overwrite existing data. Typically, this entails checking all references to rebuild free
block maps and reference counts. The more data in the system, the longer this takes.
In contrast, since the Data Domain file system never overwrites existing data and doesn’t have block
maps and reference counts to rebuild, it has to verify only the location of the head of the log (usually the
start of the last completed write) to safely bring the system back online and restore critical data.
The ddvar file structure keeps administrative files separate from storage files.
You cannot rename or delete /ddvar, nor can you access all of its sub-directories.
MTrees provide more granular space management and reporting. This allows for finer management of
replication, snapshots, and retention locking. These operations can be performed on a specific MTree
rather than on the entire file system. For example, you can configure directory export levels to separate
and organize backup files.
You can add subdirectories to MTree directories. You cannot add anything to the /data directory. /col1
can not be changed - however MTrees can be added under that. The backup MTree
(/data/col1/backup) cannot be deleted or renamed. If MTrees are added, they can be renamed and
deleted. You can replicate directories under /backup.
• Network File System (NFS) clients can have access to the system directories or MTrees on the Data
Domain system.
• Common Internet File System (CIFS) clients also have access to the system directories on the Data
Domain system.
• Dell EMC Data Domain Virtual Tape Library (VTL) is a disk-based backup system that emulates the
use of physical tapes. It enables backup applications to connect to and manage DD system storage
using functionality almost identical to a physical tape library. VTL (Virtual Tape Library) is a licensed
feature, and you must use NDMP (Network Data Management Protocol) over IP (Internet Protocol) or
VTL directly over FC (Fibre Channel).
• Data Domain Boost (DD Boost) software provides advanced integration with backup and enterprise
applications for increased performance and ease of use. DD Boost distributes parts of the deduplication
process to the backup server or application clients, enabling client-side deduplication for faster, more
efficient backup and recovery. DD Boost software is an optional product that requires a separate
license to operate on the Data Domain system.
Data Domain data paths, which include NFS, CIFS, DD Boost, NDMP, and VTL over Ethernet or Fibre
Channel.
Data Domain systems integrate non-intrusively into typical backup environments and reduce the amount
of storage needed to back up large amounts of data by performing deduplication and compression on data
before writing it to disk. The data footprint is reduced, making it possible for tapes to be partially or
completely replaced.
An organization can replicate and vault duplicate copies of data when two Data Domain systems have the
Data Domain Replicator software option enabled.
An Ethernet data path supports the NFS, CIFS, NDMP, and DD Boost protocols that a Data Domain
system uses to move data.
In the data path over Ethernet, backup and archive servers send data from clients to Data Domain
systems on the network via the TCP(UDP)/IP.
You can also use a direct connection between a dedicated port on the backup or archive server and a
dedicated port on the Data Domain system. The connection between the backup (or archive) server and
the Data Domain system can be Ethernet or Fibre Channel, or both if needed. This slide shows the
Ethernet connection.
VTL requires a fibre channel data path. DD Boost uses either a fibre channel or Ethernet data path.
To initially access the Data Domain system, the default administrator’s username and password will be
used. The default administrator name is sysadmin. The initial password for the sysadmin user is the
system serial number.
After the initial configuration, use the SSH or Telnet (if enabled) utilities to access the system remotely and
open the CLI.
The DD OS Command Reference Guide provides information for using the commands to accomplish
specific administration tasks. Each command also has an online help page that gives the complete
command syntax. Help pages are available at the CLI using the help command. Any Data Domain system
command that accepts a list (such as a list of IP addresses) accepts entries separated by commas, by
spaces, or both.
DD System Manager provides a single, consolidated management interface that allows for configuration
and monitoring of many system features and system settings. Note the Management options. As we
progress through the course we will use some of the Management options.
Also notice the information contained in the Footer: DDSM – OS – Model – User – Role.
Multiple DD Systems are now managed with Data Domain Management Center.
On Apple OS X:
Mozilla Firefox 30 and higher; Google Chrome
The Data Domain Management Center can monitor all Data Domain platforms. The Data Domain
Management Center can monitor systems running DD OS version 5.1 and later.
The Data Domain Management Center includes an embedded version of the System Manager that can be
launched, providing convenient access to a managed Data Domain system for further investigation of an
issue or to perform configuration.
Deduplication improves data storage because it is performed inline. It looks for redundancy of large
sequences of bytes. Sequences of bytes identical to those previously encountered and stored are
replaced with references to the previously encountered data.
SISL gives Data Domain deduplication speed. 99% of duplicate data segments are identified inline in
RAM before they are stored to disk. This scales with Data Domain systems using newer and faster
CPUs and RAM.
There are 3 ways to interface with Data Domain administration. You can use the Command Line (CLI),
the System Manager GUI, or the Data Domain Management Center.
The System panel shows the Model Number, DD OS version, System Uptime and System
and Chassis serial numbers. On a new installation, this screen can be used to verify the
configuration that was ordered.
Failed/Foreign/Absent Disks (Excluding Systems Disks) displays the disks that are in a failed
state; these cannot be added to the system Active or Retention tiers.
Note that if you have trouble determining which physical disk corresponds to a disk
displayed in the table, you can use the beacon feature to flash an LED on the physical disk.
Disk fail functionality allows you to manually set a disk to a failed state to force
reconstruction of the data stored on the disk. Disk Unfail functionality allows you to take a
disk in a failed state and return it to operation
On systems running DD OS 5.5.1 and later, the system serial number is also displayed. On
newer systems, such as DD4500 and DD7200, the system serial number is independent of
the chassis serial number and remains the same during many types of maintenance events,
including chassis replacements. On legacy systems, such as DD990 and earlier, the system
serial number is set to the chassis serial number.
Chassis view shows Top View, Back View, and Enclosures. Shown here is the Top View and
a mouse rollover on the components results in a pop-up with specific information on the
component. Shown is a rollover pop-up on the Power Supplies.
A sysadmin is the default admin user. An admin can configure and monitor the entire Data
Domain system. Most configuration features and commands are available only to admin role
users. The limited-admin role can configure and monitor the Data Domain system with
some limitations. Users who are assigned this role cannot perform data deletion operations,
edit the registry, or enter bash or SE mode.
The user role can monitor the system, change their own password, and view system status.
The user role cannot change the system configuration.
The Security role is for a security officer who can manage other security officers, authorize
procedures that require security officer approval, and perform all tasks supported for user-
role users. Only the sysadmin user can create the first security officer and that first account
cannot be deleted. After the first security officer is created, only security officers can create
or modify other security officers.
The Backup-operator role can perform all tasks permitted for user role users, create
snapshots for MTrees, import, export, and move tapes between elements in a virtual tape
library, and copy tapes across pools.
The role of None is used for DD Boost authentication and tenant-users. A None role can log
in to a Data Domain system and can change their password, but cannot monitor or
configure the primary system.
The Tenant Admin role can be appended to the other (non-tenant) roles when the Secure
Multi-Tenancy (SMT) feature is enabled. A tenant-admin user can configure and monitor a
specific tenant unit as well as schedule and run backup operations for the Tenant.
The Tenant User role can be appended to the other (non-tenant) roles when the SMT
feature is enabled. It enables a user to monitor a specific tenant unit and change the user
password.
Copyright © 2017 Dell Inc.. Data Domain System Administration 16
Administration> Access > Local Users is used to create and manage users. The Local
Users tab will show the current list of users and their assigned roles.
Managing users enables you to name the user, grant them privileges, make them active,
disabled or locked, and find out if and when, they were disabled. You can also find out the
user’s last login location and time.
Note: To comply with security policies it is also important to know that the Data Domain
usernames/roles can be tied into Active Directory or an LDAP service.
Management Role is the role assigned to the user, which can be admin, user, security,
backup-operator, or none.
Note: Only the sysadmin user (the default user created during the DD OS installation) can
create the first security-role user. After the first security-role user is created, only security-
role users can create other security-role users.
Force Password Change - Select this checkbox to require that the user change the
password during the first login when logging in to DD System Manager or to the CLI with
SSH or Telnet.
The Disable Date options allow for the creation of temporary user accounts often used for
contractors who need temporary access.
Maximum Days Between Change -The maximum number of days between password
changes that you allow a user. Default is 90.
Warn Days Before Expire - The number of days to warn the users before their password
expires. Default is 7.
Disable Days After Expire - The number of days after a password expires to disable the user
account. Default is Never.
The Administrator Access tab displays the configuration status for the IP protocols that can
be used to access the system. FTP and FTPS are the only protocols that are restricted to
administrators.
• FTP/FTPS provides access to a Data Domain system through an FTP or FTPS
connection.
• HTTP/HTTPS provides access to a Data Domain system through an HTTP HTTPS, or
both, connection.
• SSH provides access to a Data Domain system through an SSH connection.
• SCP provides access to securely copy files to and from a Data Domain system.
• Telnet Provides access to a Data Domain system through a Telnet connection.
The Data Domain system logs system status messages hourly. Log files can be
bundled and sent to Data Domain Support to provide the detailed system information that
aids in troubleshooting any system issues that may arise.
The Data Domain system log file entries contain messages from the alerts feature,
autosupport reports, and general system messages. The log directory is /ddvar/log.
Only a sample of the log files or folders are listed on this slide. The /ddvar folder contains
other log files that you cannot view.
Every Sunday morning, the Data Domain system automatically opens new messages and
audit log files and renames the previous files with an appended number of 1 through 9,
such as messages.1. Each numbered file is rolled to the next number each week. For
example, at the second week, the file messages.1 is rolled to messages.2. If a file
“messages.2” already existed, it rolls to messages.3. An existing messages.9 is deleted
when messages.8 rolls to messages.9.
This report is designed to aid EMC Data Domain Support in debugging system problems. An
ASUP is generated every time the file system is started.
You can configure email addresses to receive the daily ASUP reports. The default time for
sending the daily ASUP is 06.00 a.m, and it is configurable. When sending ASUPs to EMC,
you have the option to select the legacy unsecure method or the ConnectEMC method,
which encrypts the information before transmission.
The ASUP displays System Alert messages. When a System Alert message is generated it
is automatically sent to EMC and any specific recipients that have been configured.
Autosupport requires SMTP service to be active (on TCP port 25) on the Data Domain
system and pointing to a valid email server.
The Scheduled auto support option allows disabling the sending of the ASUP.
The Subscribers option allows adding or deleting the alert emails to recipients.
The Channel option allows using standard (unencrypted) ASUP and alert emails or
encrypted emails to the recipients.
The ConnectEMC method sends messages in a secure format using FTP or HTTPS.
ConnectEMC is configured through the CLI.
When the ConnectEMC method is used with an EMC Secure Remote Support (ESRS)
gateway, one benefit is that one gateway can forward messages from multiple systems, and
this allows you to configure network security for only the ESRS gateway instead of for
multiple systems. Also, a usage intelligence report is generated.
ESRS Virtual Edition (VE) Gateway, which is installed on an ESX Server, provides automated
connect home and remote support activities through an IP-based solution enhanced by a
comprehensive security system.
Note: DD OS 6.0 uses EMC Secure Remote Support version 3 (ESRSv3). Upgrading a
system running DD OS 5.X to DD OS 6.0 removes the existing ConnectEMC configuration
from the system. After the upgrade is complete, reconfigure ConnectEMC manually.
Event reports are sent immediately and provide detailed information on a system event.
The distribution lists for event alerts are called notification groups. Notification groups can
be configured to include one or more email addresses as well as the types and severity level
of the event reports sent to those addresses.
For example, you might configure one notification group for those who need to know about
critical events and another group for those who monitor less critical events. Another option
is to configure groups for different technologies. For example, one group can receive
emails about all network events and another group to receive messages related to storage
issues.
Summary reports are sent daily and provide a summary of the events that occurred during
the last 24 hours. Summary reports do not include all the information provided in event
reports.
You can also use the command line interface (CLI) to configure alerts:
• alerts notify-list create <group-name>
Creates a notification list and subscribes to events belonging to the specified list of
classes and severity levels.
• alerts notify-list add <group-name>
Adds to a notification list and subscribes to events belonging to the specified list of
classes and severity levels.
• alerts notify-list del <group-name>
Deletes members from a notification list, a list of classes, a list of email addresses.
• alerts notify-list destroy <group-name>
Destroys a notification list
• alerts notify-list reset
Resets all notification lists to factory default
• alerts notify-list show
Shows notification lists’ configuration
• alerts notify-list test
Sends a test notification to alerts notify-list
• Select Generate Support Bundle. It will take a few minutes for the bundle to be
created.
Note: If the bundle is too large to be emailed, use the EMC/Data Domain support site to
upload the bundle.
You can also generate support bundles from the command line:
An SNMP manager is required. Usually this is a third-party application that needs an SNMP
agent to monitor and respond to queries. The SNMP agent becomes the Data Domain
system.
From an SNMP perspective, a Data Domain system is a read-only device with one
exception: a remote machine can set the SNMP location, contact, and system name on a
Data Domain system.
To configure SNMP using the System Manager, go to Administration > Settings > SNMP
and make sure Enable is selected.
SNMP Properties – an SNMP system location is a description of where the Data Domain
system is located and an SNMP system contact.
Regarding SNMP V3, V2c, Configurations, the Data Domain system SNMP agent accepts
queries for Data Domain-specific information from management systems using SNMP v1,
v2c, and v3. SNMP V3 provides a greater degree of security than v2c and v1 by replacing
clear text community strings (used for authentication) with user-based authentication using
either MD5 or SHA1. Also, SNMP v3 user authentication packets can be encrypted and their
integrity verified with either DES or AES.
Starting with this release, licensing is being moved to EMC’s ELMS (Electronic Licensing
Management System) which provides a standardized method to license all EMC-products
electronically. By using ELMS, Data Domain uses a single file to license the system. The file
contains all licenses and is used for the system it is entitled to.
Both CAPACITY-SSD and Cloud Tier capacity are available exclusively through ELMS. All
other licenses can be added to the system using either the DD Licensing system or using
ELMS.
Controller COD enables an on-demand capacity increase for 4 TB DD2200 systems to 7.5
TB or 13.18 TB. An increase to 13.18 TB also requires the EXPANDED-STORAGE license.
CloudTier-Capacity enables a Data Domain system to move data from the active tier to low-
cost, high-capacity object storage in the public, private, or hybrid cloud for long-term
retention.
DD Boost enables the use of a Data Domain system with the following applications: EMC
Avamar, EMC NetWorker, Oracle RMAN, Dell vRanger, Veritas, and Backup Exec. The
managed file replication (MFR) feature of DD Boost also requires the DD Replicator license.
Encryption allows data on system drives or external storage to be encrypted while being
saved and locked when moving the system to another location.
Expanded Storage allows Data Domain system storage to be expanded beyond the level
provided in the base system.
Extended Retention licenses the Extended Retention storage feature. Formerly known as DD
Archiver.
I/OS an I/OS license is required when VTL is used to backup systems in the IBM i operating
environment. Apply this license before adding virtual tape drives to libraries.
Retention Lock Governance protects selected files from modification and deletion before a
specified retention period expires.
Retention Lock Compliance allows you to meet the strictest data retention requirements
from regulatory standards such as SEC17a-4.
Capacity Active enables a Data Domain system to expand the active tier storage capacity to
an additional enclosure or a disk pack within an enclosure.
Capacity Archive enables a Data Domain system to expand the archive tier storage capacity
to an additional enclosure or a disk pack within an enclosure.
Storage Migration for DD Systems enables migration of data from one enclosure to another
to support replacement of older, lower capacity enclosures.
VTL (Virtual Tape Library) enables the use of a Data Domain system as a virtual tape library
over a Fibre Channel network. This license also enables the NDMP Tape Server feature,
which previously required a separate license.
To complete the ELMS license, the Locking ID (or serial number) must be provided since
the license is generated only for that system. Once all the required fields are filled out, the
output is the ELMS license, which can be added onto the Data Domain using either the CLI
or the GUI.
All DD shipped with 6.0 requires ELMS licensing. For upgrades, the administrator has the
option to upgrade to ELMS using the conversion tool or continue with the keys-based
license.
Note that elicense reset will wipe the license off the DD. Be sure to save the license
information in case it is needed.
Data Domain recommends that you track Data Domain OS releases deployed in your
backup environment. It is important that the backup environment run the most current,
supported releases. Minimize the number of different deployed release versions in the same
environment. As a general rule, you should upgrade to the latest GA release of a particular
release family. This ensures you are running the latest version that has achieved our
highest reliability status.
Any upgrade packages, regardless of where they are in the release cycle, that are available
for your organization can be downloaded from the EMC/Data Domain support site.
There is no down-grade path to a previous version of the Data Domain operating system
(DD OS). The only method to revert to a previous DD OS version is to destroy the file
system and all the data contained therein, and start with a fresh installation of your
preferred DD OS.
Make sure you allocate appropriate system downtime to perform the upgrade. Set aside
enough time to shut down processes prior to the upgrade and for spot-checking the
upgraded system after completing the upgrade. The time to run an the actual upgrade
should take no longer than 45 minutes. Adding the time to shut down processes, and to
check the upgraded system, might take 90 minutes or more to complete the upgrade.
Double this time if you are upgrading more than two release families.
For replication users: Do not disable replication on either side of the replication pair. After it
is back online, replication automatically resumes service.
You should upgrade the destination (replica) before you upgrade the source Data Domain
system.
If you are upgrading to a new version of DD OS that requires you to perform multiple
separate upgrades, such as from 5.4 to 5.6 and then from 5.6 to 6.0, you must insure that
the first upgrade is complete before starting the second upgrade. Be sure to follow all of the
upgrade instructions for the each upgrade process, and verify that the process is complete
before initiating a subsequent upgrade. Be aware that certain versions of DD OS disallow
upgrades if those versions themselves are not completely upgradable on the given
platforms.
For example, you can manage the configuration of the Ethernet components. This includes Network
Interface Cards (NICs), Link Failover, Link Aggregation, Virtual LANs (VLANs), and Virtual Network
Interfaces.
The Domain Name Service (DNS) configuration is also accessible through the user interface. The Host
name, Domain Name, Local Host File, Search Domains, and dynamic DNS configuration are all
configurable.
From here you can select the interfaces, settings, or routes tab as appropriate.
You can filter the number of interfaces displayed in the interface table by name or by interface type.
The Interface column shows the name of each interface associated with the selected Data Domain
system. Physical interface names start with eth. Virtual interface names start with veth.
The Enabled column indicates whether or not the interface is enabled. Select Yes to enable the interface
and connect it to the network. Select No to disable the interface and disconnect it from the network.
The DHCP column indicates if the interface is configured to use DHCP. This column displays a value of
Yes, No, or not applicable.
The IP Address column shows the IP address associated with the interface. If the interface is configured
through DHCP, an asterisk appears after this value.
The Netmask column shows the netmask associated with the interface. The display uses the standard IP
network mask format. If the interface is configured through DHCP, an asterisk appears after this value.
The Link column indicates whether or not the interface currently has a live Ethernet connection.
The Additional Info column lists additional settings for the interface, such as the bonding mode.
You can view the details of an interface by selecting its associated row in the Interface table.
The Intelligent Platform Management Interface (IPMI) section of the screen indicates if IPMI health and
management monitoring is configured for the interface. You can view more information about IPMI
interfaces by selecting the View IPMI Interfaces hot link. This hot link takes you to the Maintenance >
IPMI configuration tab.
Displayed are example CLI commands that provide most of the relevant information associated with
network interfaces. Use the help net show CLI command to obtain more information on these
commands.
The net show settings CLI command displays the interface's network settings.
The net show hardware CLI command displays the interface's hardware configuration.
The net show config CLI command displays the active network configuration.
The net show domainname CLI command displays the domain name associated with this device.
The net show searchdomain CLI command lists the domains that will be searched when only the host
name is provided for a configuration or command.
The net show dns CLI command lists the domain name servers used by this device.
The net show stats CLI command provides a number of different networking statistics. Use the help
net show command for more information.
The net show all CLI command combines the output of several other net show CLI commands. The
output from this command is quite long an will likely scroll off the screen.
Refer to the documentation or the help net config CLI command to obtain more information.
From here, you can manage the host name, domain name, domain search list, host mappings (local host
file), and the DNS server list.
The domain name is shown beneath the host name. The domain name is appended to the host name to
produce the system's fully-qualified domain name.
The Search Domain List section displays the search domains used by the Data Domain system when a
host name (not a fully qualified domain name) is entered into the system as a configuration parameter or
as an argument to a command.
When a host name is used in this way, the system attempts to determine the correct domain name to
associate with the provided host name by appending each of the listed search domains to the host name.
The system uses the fully qualified domain name if it is discovered. If none of the domain names yield the
correct fully qualified domain, the system returns an error.
The Host Mappings section shows local name to IP address mappings. Unlike the mappings provided by
the DNS server, these name mappings only apply to this system.
The DNS List displays the IP addresses of the DNS servers used by this system. An asterisk (*) indicates
the DNS server addresses were assigned through DHCP.
Refer to the documentation or the help net show and help net hosts CLI commands to obtain more
information on these commands.
2. If you wish for the host name and domain name to be configured by the DHCP server, choose the
Obtain Settings using DHCP option.
If you wish to configure a static host name and domain name, choose the Manually configure host
option and enter the host name and domain name.
2. Click the green plus icon to display the Add Search Domain input panel.
4. Select OK to add the name to the search domain list. You may add more search domains by
selecting the green plus icon again.
5. To remove a domain name from the list, select the name from the search domain list.
6. Next, select the red x icon. This removes the domain name from the search domain list.
7. Once the search domain list is complete, select OK to save the list to the system.
• This feature allows the users of the system to specify locally configured names (aliases) in place of IP
addresses for CLI commands and other system parameters.
• Host name mapping is typically used when a target system does not have a DNS entry and the IP
address is difficult to remember.
• When using this feature, you create a list of names that are mapped to a single IP address.
1. To create a new host mapping list, navigate to the Hardware > Ethernet > Settings tab and select Add
in the Host Mapping section. This causes the Add Hosts input panel to appear.
2. In the IP address field, add the address of the station to which you wish to map names.
3. Select the green plus icon to display the Add Host input panel.
5. Select OK to add the name to the Host Name list. You can associate more host names with the IP
address by selecting the green plus icon again.
6. If in an entry you just added to the host name list is incorrect, you can quickly delete it by first select
the host name from the Host Name list.
7. And then selecting the red x icon. This removes the name from the Host Name list.
8. Once the Host Name list is complete, select OK to save the list to the system.
2. Click Edit. This causes the Add Hosts input panel to appear.
3. You cannot edit the IP address field, but you can add more host names to the list by selecting the
green plus icon to display the Add Host input panel.
6. To quickly delete an entry, select the host name from the Host Name list.
7. Click the red x icon. This removes the name from the Host Name list.
Refer to the documentation or the help net set, help net hosts, and help net reset CLI commands to
obtain more information.
Data Domain systems use source-based routing, which allows the sender of the packet to specify the
route or interface that a packet must in order to reach the destination.
Navigate to Hardware > Ethernet > Routes to view or configure the IP routes on the Data Domain system.
2. To configure the default gateway, click the Edit button associated with the Default IPv4 Gateway. The
Configure Default IPv4 Gateway dialog box appears.
3. If the system is to receive the default gateway from the IPv4 DHCP server, select the Use DHCP
value option.
4. If the system is to be configured with a static IPv4 address, select the Manually Configure option and
enter the gateway address when the Gateway input box becomes available.
5. Click OK. The system processes the information and returns you to the Routes tab.
1. Go to the top of the Hardware > Ethernet > Routes tab to review the address of the IPv6 Default
Gateway.
2. To configure the default gateway, click the Edit button associated with the Default IPv6 Gateway. The
Configure Default IPv6 Gateway dialog box appears.
3. If the system is to receive the default gateway from the IPv6 DHCP server, select the Use DHCP
value option.
4. If the system is to be configured with a static IPv6 address, select the Manually Configure option and
enter the gateway address when the Gateway input box becomes available.
5. Click OK. The system processes the information and returns you to the Routes tab.
1. After navigating to the Hardware > Ethernet > Routes tab, you can configure a static route by clicking
the Create button in the Static Routes area.
2. In the Create Routes dialog, select the interface you want to host the static route.
4. Specify the destination. To specify a destination network, select Network and enter the network
address and netmask or prefix for IPv6 addresses. To specify a destination host, select Host and
enter the hostname or IP address of the destination host.
Note: This is not the IP of any interface. The interface is selected in the initial dialog, and it is used for
routing traffic.
5. As an option, specify the gateway to use to connect to the destination network or host.
6. Review the configuration and click Next. The create routes Summary page appears.
7. Click Finish. After the process is completed, click OK. The new route specification is listed in the
Route Spec table.
2. IP Address Configuration
In this section of the training, components are defined as parts of the system that must be configured or
managed.
Bonding modes define the methods and protocols used to control the physical links between systems.
Bonding is a term used by Linux community to describe the grouping of interface together to act as one
interface to the outside world. Other analogous terms include link bundling, EtherChannel (from Cisco),
Trunking, Port Trunking, Port aggregation, NIC bonding, and Load balancing. Link aggregation and link
faliover are two type of bonding supported by Data Domain system.
The bonding hash defines the methods used to balance transmissions over the physical links. Balancing
is typically done to obtain better physical link utilization.
The system software sends and receives data to and from the virtual interface in the same way it would as
if the virtual interface was a physical network interface.
The virtual network interface provides the system software with a way to access the underlying
aggregated link connection, link failover connection, or VLAN. It appears to the system as a normal
physical network interface. A virtual interface can also be viewed as a container to hold physical
interfaces.
The virtual interface operation is the component that performs the functions defined by the virtual interface
type (bonding mode). This component processes data according to rules associated with the interface
type.
Finally, there are physical network interfaces. These components are responsible for actually transmitting
and receiving data over the network. Of course, there are physical interfaces on the connected devices as
well.
If configuring link failover, the interfaces on the connected device do not require any special configuration
other than normal Ethernet network configuration.
If configuring link aggregation, the interfaces on the connected device must be setup with a compatible
bonding type, mode, and hash.
Link control does not extend beyond the directly connected device. If the media or application server is
not directly connected to the Data Domain system, the operation of its physical links are not managed by
the failover or aggregation functions. Of course, a loss of connectivity would still be detected by higher
level protocols.
The physical Ethernet connections must follow existing guidelines which typically means all interfaces
have the same speed and duplex settings. Some configurations allow the links in the bundle to have
different media types.
The direct connect topology may be used for any type of bonding mode, but is most often used with round
robin because it provides the most fair traffic distribution between the two links. Even though round robin
is more susceptible to out-of order packet transmission, this problem is minimized by the fact that traffic
destined for other devices is not going to be contending for the resources provided by these links.
In this topology, the Data Domain system is directly connected to a layer 2 switch. The physical Ethernet
links between the Data Domain system and the layer 2 switch must have the same speed and duplex
settings.
The bonding configuration must also be compatible between the Data Domain system and the layer 2
switch. This includes the bonding type, mode, and hash.
Also, the Data Domain system and the server are on the same subnet. This means that there is no router
between the Data Domain system and the server.
The server is also connected to a layer 2 switch, but that doesn't mean it is connected to the same switch
as the Data Domain system.
Because link aggregation and link failover are point-to-point protocols and not end-to-end, the physical
network link configuration of the server is unrelated to the configuration of the Data Domain system in this
topology. It is required that the server and switch have compatible physical network and bonding
configurations, but not required for the server and Data Domain system to also have the same level of
compatibility. In fact, as shown on the screen, the configuration of the Data Domain system's physical
links can be completely different from the server's.
The failover-enabled virtual interface represents a primary physical network interface and a group of
secondary physical network interfaces.
The system makes the primary interface the active interface whenever the primary interface is operational.
A configurable Down Delay failover option allows you to configure a failover delay in 900 millisecond
intervals. The failover down and up delays guard against multiple failovers when a network is unstable. By
default, a link must be up or down continuously for 29700 milliseconds (29.7 seconds) before the system
activates a standby link or restores the primary link.
If the carrier signal is lost, the active interface is changed to another standby interface. An address
resolution protocol (ARP) is sent to indicate that the data must flow to the new interface. The interface can
be on the same switch, on a different switch or directly connected.
When you create the virtual network interface, you identify how the bonded links are to be used. In this
case, the virtual interface is used to identify primary and secondary failover links and to make them appear
to the operating system as a single network connection.
You can create as many virtual interfaces as there are physical interfaces. You can even create a link
failover connection with only one physical link.
2. Disable the physical Ethernet interfaces you want to add to the failover link by selecting the interfaces
and choosing No from the Enabled menu.
• A physical network interface that is part of a virtual interface is seen as disabled for other network
configuration options.
• Each physical interface can belong to one virtual interface.
• The number and type of cards installed on the system determines the number of physical Ethernet
interfaces available.
3. If an error is displayed warning about the dangers of disabling the interface, verify the interface is not
in use and click OK.
4. From the Create menu, select the Virtual Interface option. The Create Virtual Interface dialog box
appears.
7. Select the interfaces that will be part of the failover configuration by clicking the checkbox
corresponding to the interface.
• Physical network interfaces or virtual link aggregation interfaces can be added to a link failover
virtual interface.
• Virtual interfaces must be created from identical physical interfaces. For example, all copper, all
optical, all 1 Gb, or all 10 Gb. However, 1 Gb interfaces support bonding a mix of copper and
optical interfaces. This applies to virtual interfaces across different cards with identical physical
interfaces, except for Chelsio cards. For Chelsio cards, only failover is supported, and that is only
across interfaces on the same card.
• Bonded physical interfaces can be connected to the same or different switches.
• All interfaces in a virtual interface must be on the same physical network.
• Network switches used by a virtual interface must be on the same physical network.
• 10 Gb CX4 Ethernet card, which are restricted to one primary interface and one failover interface
from the same card, and
• There is no special failover configuration required on the switch. Since the Data Domain system is
the device that manages the failover, a normal Ethernet configuration of the switch should work.
• Only one interface in a group can be active at a time.
• On the DD4200, DD4500, and DD7200 systems, the ethMa interface does not support failover or
link aggregation.
9. Click Next and the Create Virtual Interface dialog box appears.
11. Specify the speed and duplex options that will be applied to all physical interfaces that are associated
with the virtual interface.
12. If necessary, configure the MTU. Verify the MTU settings with the network administrator before
modifying the configuration.
13. Click Next. A panel with the summary of the configuration should now appear.
17. Click OK after the virtual interface creation process is completed. If there are errors, address them
and reconfigure the interface. Observe as the virtual interface is created.
18. Click OK after the virtual interface creation process is completed. If there are errors, address them
and reconfigure the interface.
2. Create the virtual interface and configure it for link failover bonding mode.
a. Use the net create virtual CLI command to create the virtual interface.
b. Use the net modify CLI command with the bonding failover arguments to configure a virtual
interface for link failover bonding mode.
c. Use the net config CLI command to provide the virtual interface with an IP address and netmask.
3. Add the physical network interfaces to the virtual interface and select a primary link.
a. Use the net failover add CLI command to add the physical interfaces to the virtual interface.
b. Use the net failover modify CLI command to select the primary link.
The Data Domain link aggregation feature is between the local system and the connected network device.
The device connected to the Data Domain system can be a switch, router, or server.
Link aggregation also provides link failover. If one of the physical network links in the bundle should fail,
the other links continue to service the Data Domain system's network connection.
A virtual network interface must be created in order for link aggregation to work. The system uses this
virtual interface as an access point to the link aggregation bundle.
When you create the virtual network interface, you identify how the bonded (bundled) links are to be used.
In this case, the virtual interface is used to aggregate multiple physical links and make them appear as a
single network connection.
You can create as many virtual interfaces as there are physical interfaces.
1. After verifying the device connected to the Data Domain system support compatible link aggregation
bonding methods, navigate to the Hardware > Ethernet > Interfaces tab.
2. Disable the physical Ethernet interfaces you want to add to the aggregation link by selecting the
interfaces and choosing No from the Enabled menu.
• A physical network interface that is part of a virtual interface is seen as disabled for other network
configuration options.
• Each physical interface can belong to one virtual interface.
• The number and type of cards installed on the system determines the number of physical Ethernet
interfaces available.
• Changes to disabled Ethernet interfaces flush routing table. Schedule interface changes during
downtimes. Reconfigure routing rules and gateways afterwards.
3. If an error is displayed warning about the dangers of disabling the interface, verify the interface is not
in use and click OK.
4. From the Create menu, select the Virtual Interface option. The Create Virtual Interface dialog box
appears.
7. Specify the bonding mode. The bonding mode must be compatible with the link aggregation method
supported by the system directly connected to the physical interfaces that are part of the bundle. The
available bonding modes are round robin, Balanced, and Link Aggregation Control protocol (LACP).
• Round robin bonding mode is typically used by Linux systems. It transmits packets in sequential
order from the first available link through the last link in the bundle. This provides the best
distribution across the bonded interfaces. Normally this would be the best bonding mode to use, but
throughput can suffer because of packet ordering.
• LACP bonding mode is similar to Balanced, except for the control protocol that communicates with
the other end and coordinates which links in the bond are available. It provides heartbeat failover.
LACP was originally defined in IEEE 802.3ad. 802.3ad was subsequently incorporated into the
IEEE 802.1AX-2008 specification which was in turn superseded by IEEE 802.1AX-2014.
• Balanced bonding mode sends data over the interfaces as determined by the selected hash
method. All associated interfaces on the switch must be grouped into an EtherChannel (trunk).
EtherChannel is the bonding method defined by Cisco systems.
9. Select an interface to add to the aggregate configuration by clicking the checkbox corresponding to
the interface. Link aggregation not supported on 10 Gb single-port optical NICs, DD2500 ethMe and
ethMf interfaces, and DD4200, DD4500, DD7200 ethMA NICs.
10. Click Next and the Create Virtual Interface dialog box appears.
12. Specify the speed and duplex options that will be applied to all physical interfaces associated with the
virtual interface.
13. If necessary, configure the MTU. Verify the MTU settings with the network administrator before
modifying the configuration.
14. Click Next. A panel with the summary of the configuration should now appear.
18. Click OK after the virtual interface creation process is completed. If there are errors, address them
and reconfigure the interface.
2. Configure the Ethernet parameters on each physical NIC port using the net config <ifname> CLI
command. Ensure member ports, on the Data Domain system and the connected device, are
configured the same.
• If you are including NIC ports with different media types in the virtual interface, check the
documentation to verify this is allowed with your hardware.
• Verify the device connected to the Data Domain system supports compatible link aggregation
bonding mode and hash settings.
3. Create a virtual interface, using the net create virtual CLI command.
4. Configure link aggregation bonding using the net modify CLI command with the bonding aggregate
argument.
5. Add a physical NIC port to the virtual interface using the net aggregate add CLI command. The
bonding mode and hash must be configured when adding the first physical interface. They cannot be
configured later.
6. Assign an IP address and netmask to the virtual interface using the net config CLI command.
7. Enable the virtual interface using the net config <virtual-ifname> up CLI command.
8. Verify the configuration of the virtual interface using the net aggregate show CLI command. The net
aggregate show CLI command does not provide any output unless the virtual interface is up and
enabled.
The speed of the network switch or network link impacts performance when the amount of data has
exceeded the switch's capacity. Normally, a network switch can handle the speed of each connected link,
but it may lose some packets if all of the packets are coming from several ports that are concentrated on
one uplink running at maximum speed. In most cases, this means you can use only one switch for port
aggregation coming out of a Data Domain system. Some network topologies allow for link aggregation
across multiple switches.
Out-of-order packets can impact performance due to the processing time needed to reorder the packets.
Round robin link aggregation mode could result in packets arriving at the destination out-of-order. The
receiving device must reorder the data stream. This adds overhead that may impact the throughput speed
enough that the link aggregation mode causing the out-of-order packets should not be used.
The number of clients can also impact performance. In most cases, either the physical or OS resources
cannot drive data at multiple Gbps. Also, due to hashing limits, you need multiple clients to push data at
multiple Gbps.
The number of streams (connections) per client can significantly impact link utilization depending on the
hashing used.
• VLANs provide the segmentation services normally provided by routers in LAN configurations.
• Routers in VLAN topologies provide broadcast filtering, security, address summarization, and traffic-
flow management.
• Switches may not bridge IP traffic between VLANs as doing so would violate the integrity of the VLAN
broadcast domain.
By using VLANs, one can control traffic patterns and react quickly to relocations. VLANs provide the
flexibility to adapt to changes in network requirements and allow for simplified administration.
Partitioning a local network into several distinctive segments in a common infrastructure shared across
VLAN trunks can provide a very high level of security with great flexibility to a comparatively low cost.
Quality of Service schemes can optimize traffic on trunk links.
VLANs could be used in an environment to provide easier access to local networks, to allow for easy
administration, and to prevent disruption on the network.
IP aliasing is associating more than one IP address to a network interface. With this, one node on a
network can have multiple connections to a network, each serving a different purpose.
No IP address is required on the underlying network or virtual interface when you create a VLAN interface.
Unlike the VLAN interface, Network and Virtual Interfaces require untagged ports. Make sure to configure
the connected switch to support both packet types and all VLAN IDs configured on the Data Domain
system's physical interface.
2. In the interfaces table, select the interface to which you want to add the VLAN.
4. From the Create menu, select the VLAN... option. The Create VLAN Interface dialog box appears.
12. Observe the user interface as the system configures the VLAN.
13. After successful completion of the VLAN interface configuration, click OK.
Up to 100 IP aliases are supported. However, the recommended total number of IP aliases, VLAN,
physical, and virtual interfaces that can exist on the system is 80. Although up to 100 interfaces are
supported, as the maximum number is approached, you might notice slowness in the display.
The format of an IP alias interface name is the base interface name, followed by a colon character (:),
which is then followed by the IP alias ID.
Using this format as a reference, we know that the ifname eth5a:35 refers to an IP alias assigned to the
physical interface and the IP alias's ID is 35.
The interface name veth4:26 refers to an IP alias assigned to virtual interface 4 and its ID is 26.
The IP alias interface name eth5a.82:162 is an IP alias assigned to VLAN 82, which in turn is assigned to
physical interface eth5a, and it the IP alias's ID is 162.
The acceptable IP alias ID values differ depending upon the user interface or CLI command used to create
the IP alias. If you use the Data Domain System Manager or the net create interface CLI command to
create the IP alias, IP Alias ID values from 1 to 4094 are supported. If you use the net config CLI
command, the IP Alias ID values from 1 to 9999 are supported.
2. In the interfaces table, select the interface to which you wish to add the IP alias. You may choose an
existing physical, VLAN, or virtual interface.
4. From the Create menu, select the IP Alias... option. The Create IP Alias dialog box appears.
The net config CLI command allows alias-id values from 1 to 9999. The alias-ID cannot be in use by
another alias.
On the screen are examples commands that show how the net config CLI command can be used to
assign an IP alias to physical, VLAN, and virtual interfaces.
To destroy or delete an IP alias using the net config CLI command, assign it IP address of 0.
Shown on screen are examples that demonstrate removing an IP alias from physical, VLAN, and virtual
interfaces by assigning the IP alias an IP address of 0 using the net config CLI command.
1. First, is the FC switch properly zoned and communicating with the FC server and Data Domain
system?
2. Next, what is the server's WWPN? If needed, what is the server's IP address?
3. What name or alias do you wish to apply to the server? This name will be mapped to the WWPN on
the Data Domain system.
4. What is the Data Domain system's WWPN, IP address, and FC slot and port?
1. Navigate to the Administration > Licenses page of the Data Domain system manager.
4. However, services that require the support of fibre channel - such as VT L, DD boost, and I/OS - all
require licenses.
The fibre channel status can only be changed through the CLI. Use the scsitarget enable CLI command
or the scsitarget disable CLI command to change the status.
• In non-NPIV mode, ports use the same properties as the endpoint, that is, the WWPN for the base port
and the endpoint are the same.
• In NPIV mode, the base port properties are derived from default values, that is, a new WWPN is
generated for the base port and is preserved to allow consistent switching between NPIV modes. Also,
NPIV mode provides the ability to support multiple endpoints per port.
• Ports must be enabled before they can be used. When you enable an FC port, any endpoints currently
using that port are also enabled. If the failback-endpoints feature is used, any fail-over endpoints that
use this port for their primary system address may be failed-back to the primary port from the
secondary port.
• Disabling one or more SCSI target ports also disables any endpoints currently using that port. If
specified, the failover configured endpoints that use the target port(s) as their primary system address
will be failed-over if the secondary port is available. Endpoints that are already disabled by
administrative operation prior to a port being disabled are remembered as manually disabled. This
state will be restored when that port is later enabled.
The summary information includes the System Address, WWPN, WWNN and enabled status. Also
included are the NPIV status, the Fibre Channel Link status, and the operation status as well as the
number of endpoints configured on the system.
The detailed information section shows the Fibre Channel HBA Model, installed firmware version number,
port id, link speed, topology, and connection type.
WWPN Unique worldwide port name, which is a 64-bit identifier (a 60-bit value preceded by
a 4-bit Network Address Authority identifier), of the Fibre Channel (FC) port.
WWNN Unique worldwide node name, which is a 64-bit identifier (a 60-bit value preceded
by a 4-bit Network Address Authority identifier), of the FC node.
Link Status Link status: either Online or Offline; that is, whether or not the port is up and capable
of handling traffic.
1. After you navigate to the Hardware > Fibre Channel page, select More Tasks > Ports > Enable to
select the target ports. If all ports are already enabled, a message to that effect is displayed otherwise
the Enable ports dialogue box is displayed.
2. Select one or more ports from the list, and select Next.
3. After the confirmation, select next to continue to complete the port selection process.
4. Select the Failback endpoints option if you wish for endpoints that have been failed over to the
secondary port to be returned to this port if it is their primary port.
5. Select next to continue. The Enable Ports Status dialogue box appears.
6. Select Close if you do not wish to wait for the enable process to complete. A message is displayed
indicating the enable process will complete in the background.
8. Select Close if you wish to wait for the port enable process to complete. The dialogue box eventually
displays a completion message.
1. After you navigate to the Hardware > Fibre Channel > Resources tab, select More Tasks > Ports >
Disable... to select the target ports. If all ports are already disabled, a message to that effect is
displayed otherwise the Disable Ports dialogue box is displayed.
3. Select Next.
4. Select the Failover endpoints option if you wish for endpoints with this port configured as primary to
fail over to the secondary port.
5. Select next to continue. The Disable Ports Status dialogue box appears.
6. Wait for the disable process to complete and select Close to dismiss the Disable Ports Status dialogue
box.
3. In the Configure Port dialog, select whether to automatically enable or disable NPIV for this port. This
option can only be modified if NPIV is globally enabled.
4. For Topology, select Default, Loop Only, Point to Point, or Loop Preferred.
6. Select OK.
The scsitarget port modify CLI command can also be used to configure the port. Modify options for SCSI
target ports.
3. In the Enable NPIV dialog, you will be warned that all Fibre Channel ports must be disabled before
NPIV can be enabled. Also review any messages about correcting configuration errors and take
appropriate action. If you are sure that you want to enable NPIV, select Yes.
Disabling NPIV
Before you can disable NPIV, you must not have any ports with multiple endpoints.
3. In the Disable NPIV dialog, review any messages about correcting the configuration, and when ready,
select Yes.
2. If necessary, click the plus sign (+) to expand the endpoint configuration summary table.
The summary information includes the endpoint name, WWPN, WWNN, system address currently in use
and if the address is primary or secondary, enabled status, and link status.
The detailed information section shows the primary system address, secondary system address, and if
FCP2 Retry is enabled.
1. After navigating to the Hardware > Fibre Channel page, select More Tasks > Endpoints > Enable. If all
endpoints are already enabled, a message to that effect is displayed.
2. In the Enable Endpoints dialog, select one or more endpoints from the list.
3. Select Next.
4. Confirm the endpoints are correct and select Next. The Enable Endpoint Status dialogue box
appears.
If in non-NPIV mode, disabling an endpoint also disables the underlying port if it is currently enabled. In
NPIV mode, only the endpoint is disabled.
1. After navigating to the Hardware > Fibre Channel page, select More Tasks > Endpoints > Disable... If
all endpoints are already disabled, a message to that effect is displayed.
2. In the Disable Endpoints dialog, select one or more endpoints from the list.
3. Select Next.
4. Confirm the endpoints are correct. If the endpoint is associated with an active service, a warning is
displayed. Select Disable and the Disable Endpoint Status dialogue box appears.
2. Click the green plus icon to open the Add endpoint dialogue box.
3. In the Add Endpoint dialog, enter a Name for the endpoint. The endpoint name can be from 1 to 128
characters in length. The field cannot be empty or be the word all,” and cannot contain the characters
asterisk (*), question mark (?), front or back slashes (/, \), or right or left parentheses [(,)].
5. If NPIV is enabled, select a Primary system address from the drop-down list. The primary system
address must be different from any secondary system address.
6. If NPIV is enabled you can select the secondary address to use for fail over operations. If the
endpoint cannot be created, an error is displayed. Correct the error and retry. If there are no errors,
the system proceeds with the Endpoint creation process.
7. Monitor the system as the endpoint is created. The system notifies you when the endpoint creation
process has completed.
8. Select Close.
1. After navigating to the Hardware > Fibre Channel > Resources tab, begin the Endpoint delete process
by selecting the plus sign (+) to expand the endpoint configuration summary table if necessary.
3. Select the delete icon represented by a red X. This icon is not active unless an endpoint is selected.
The Delete Endpoint dialogue box is displayed. If an endpoint is in use, you are warned that deleting it
might disrupt the system.
4. Verify the endpoints listed in the Delete Endpoint dialogue box are correct.
5. Click Delete.
An initiator name is an alias that maps to a initiator's WWPN. The Data Domain system uses the initiator
name to interface with the initiator for VTL activity.
Initiator aliases are useful because it is easier to reference a name than an eight-pair WWPN number
when configuring the system, including access groups.
For instance, you might have a host server with the name HP-1, and you want it to belong to a group HP-
1. You can name the initiator coming from that host server as HP-1. You can then create an access group
also named HP-1 and ensure that the associated initiator has the same name.
An initiator can be configured to support DD Boost over FC or VTL, but not both. A maximum of 1024
initiators can be configured for a Data Domain system.
2. Click the plus sign (+) at the top of the initiator section to expand the initiator configuration summary
table
CLI Equivalent
1. After navigating to the Hardware > Fibre Channel > Resources tab, begin the Initiator Add process by
selecting the plus sign (+) to expand the endpoint configuration summary table if necessary.
3. In the Add Initiator dialog, enter the WWPN for the device to be added to the system. Use the format
shown in the field.
4. Enter a Name for the initiator. This name is also called an Alias.
6. Select OK.
CLI Equivalent
2. Verify the target initiator if offline and not a part of any access group. Otherwise, you will get an error
message, and the initiator will not be deleted.
• You must delete all initiators in an access group before you can delete the access group.
• If an initiator remains visible, it may be automatically rediscovered.
3. Select the target initiator from the initiator configuration summary table.
5. A warning is provided in the Initiator Delete dialog box. Read the warning and Click OK if you wish to
proceed. Otherwise, click Cancel.
This lesson describes how to modify these settings and how to manage data access using
the Data Domain System Manager (DDSM) and the CLI.
Users with administrative privileges can perform major CIFS operations such as enabling
and disabling CIFS, setting authentication, managing shares, and viewing configuration and
share information. CIFS clients write data to a share.
The CLI command cifs status will show whether CIFS is enabled or disabled. To disable
CIFS, use the command cifs disable. To enable CIFS use cifs enable.
Clients, such as backup servers that perform backup and restore operations with a Data
Domain System, need access to the /data/col1/backup directory. Clients that have
administrative access need to be able to access the /ddvar directory to retrieve core and
log files.
The share name does not have to be the same name as the directory name. Here, the
share backup is the same name as the directory backup. It does not need to be the same
name if there is a preference. For example, you may create a path /data/col1/backup2
but prefer to call the share that points to backup2 as HR for easier identification of the
specific share assignment.
Note
• The command accepts /backup as an alias for the default path /data/col1/backup
• All other paths must be entered /data/col1/[folder name]
In this example, our share name is HR. The directory name is /data/col1/backup2.
Client access needs to be assigned. To make a share available to all clients, use the
wildcard *. To make the Share available to only specific clients, use the client name or IP
address. It is not required to use both the name and the IP address.
Do not mix an * with client names or IP addresses. When an * is present in the list any
other client entries are not used.
In the Max Connections, the default value is Unlimited. A value of zero entered in the
adjacent option would have the same effect as Unlimited. Remember that there actually is
a limit of up to 600 simultaneous connections, but it is dependent on the specific Data
Domains system memory. Check the specifics of the Data Domain system being
configured.
Log Level – Options are 1 – 5. One is the default system level that sends the least-
detailed level of CIFS-related log messages, five results in the most detail. Log messages
are stored in the file /ddvar/log/debug/cifs/cifs.log.
The higher the log level, the more likely it is to degrade system performance. Clicking the
Default in the Log Level sets the level back to 1.
Server Signing – the options are: Enabled – Disabled – Required. The default is
Disabled. This feature is disabled by default because it degrades performance. When
enabled, it can cause a 29 percent (reads) to 50 percent (writes) throughput performance
drop, although individual system performance will vary.
Server Signing is a security mechanism in the CIFS protocol (a.k.a SMB Signing – Server
Message Block was the original name of the CIFS protocol) and is also known as security
signatures. Server Signing is designed to help improve the security of the CIFS protocol by
having the communication digitally signed at the packet level. This enables the recipient of
the packets to confirm their point of origin and authenticity. This security mechanism in the
CIFS protocol helps avoid issues like tampering of packets. If the packet is changed from
the original packet that was sent by a CIFS client, it will be flagged as invalid by the Data
Domain server.
2. Select a drive letter – type in the path to the shared folder – enable Reconnect at
login – and click on Connect using a different user name and Finish.
3. In the Connect As… dialog, enter appropriate user credentials for the Data Domain
system, and click OK.
4. The new drive window will appear and can now accept backup files.
Selecting Connection Details will display specific information about Sessions and Open Files.
• The computer IP address or computer name connected with DDR for the session.
• The User indicates the user operating the computer and connected with the DDR.
• The Open Files column refers to the number of open files for each session
• The User column which shows the name of the computer and the user on that computer.
• The Mode column displays file permissions. The following values and their corresponding
permissions are:
0 – No permission 1 – Execute 2 – Write 3 – Execute and Write 4 – Read
5 – Read and Execute 6 – Read and Write 7 – All Permissions
The CLI command cifs show stats will display basic statistics on CIFS activity and
performance.
Network File System (NFS) clients can have access to the system directories or MTrees on
the Data Domain system:
• The /ddvar directory contains Data Domain system, core, and log files.
• The /data/col1 path is the top-level destination when using MTrees for compressed
backup server data.
Clients, such as backup servers that perform backup and restore operations with a Data
Domain system, need to mount an MTree under /data/col1. Clients that have
administrative access need to mount the /ddvar directory to retrieve core and log files.
In the CLI the command NFS Status will indicate if NFS is enabled or disabled. If it is not
active, NFS Enable will start the NFS server.
/backup
/data/ col1/backup
/ddvar
A Data Domain system supports a maximum number of 128 NFS exports and allows 900
simultaneous connections.
You have to assign client access to each export separately and remove access from each
export separately. For example, a client can be removed from /ddvar can still have access
to /data/col1/backup:
• A single asterisk (*) as a wild card indicates that all backup servers are used as
clients.
• Clients given access to the /data/col1/backup directory have access to the entire
directory.
• Clients given access to a subdirectory under the /data/col1/backup have access
only to that subdirectory.
Root squash is a reduction of the access rights for the remote superuser, the “root”,
when using authentication. It is a feature of NFS. So “no_root_squash” basically
means that the administrator has complete access to the path, the Export.
• no_all_squash - Turn off the mapping of all user requests to the anonymous uid/gid
(default value).
• secure - Require that requests originate on an Internet port that is less than
1024. Kerberos uses port 88.
• nolog - The system will not log NFS requests. If enabled, this option may impact
performance.
In CLI the command nfs add path client-list [(option-list)] will add NFS clients to
an Export.
It can be configured in DDSM from the NFS screen. Configure will open Adminstration >
Access > Authentication
If Disabled, NFS clients will not use Kerberos authentication and CIFS clients will default to
Workgroup authentication.
If Windows / Active Directory is enabled then both NFS and CIFS clients will use
Kerberos authentication.
Selecting UNIX will mean that only NFS clients will use Kerberos authentication. CIFS
clients will default to Workgroup authentication.
You can use the CLI to monitor NFS client status and statistics with the following
commands:
• nfs show active will List clients active in the past 15 minutes and the mount path
for each. Allow all NFS-defined clients to access the Data Domain system.
• nfs show clients will list NFS clients allowed to access the Data Domain system and
the mount path and NFS options for each.
• nfs show detailed-stats will display NFS cache entries and status to facilitate
troubleshooting.
Subdirectories can be created within user-created MTrees. the Data Domain system
recognizes and reports on the cumulative data contained within the entire MTree.
The term, snapshot, is a common industry term denoting the ability to record the state of a
storage device or a portion of the data being stored on the device, at any given moment,
and to preserve that snapshot as a guide for restoring the storage device, or portion
thereof. Snapshots are used extensively as a part of the Data Domain data restoration
process. With MTrees, snapshots can be managed at a more granular level.
Retention lock, is an optional feature used by Data Domain systems to securely retain
saved data for a given length of time and protecting it from accidental or malicious deletion.
Retention lock feature can now be applied at the MTree level.
Another major benefit is to limit the logical, pre-comp, space used by the specific MTree
through quotas.
For example, a DD 9800 running DD OS 6.0 supports 256 MTrees 256 concurrently active
MTtrees while a DD 9500 running OD 5.6 supports 100 MTrees and 64 concurrently active
MTrees. Refer to the Data Domain Operating System Administration Guide for specific
limits for various Data Domain Systems and versions of DD OS.
Be aware that, system performance might degrade if more than the recommended number
MTrees are concurrently engaged in read or write streams. The degree of degradation
depends on overall I/O intensity and other file system loads. For optimum performance,
constrain the number of simultaneously active MTrees. When possible, aggregate
operations on the same MTree into a single operation.
You can set a soft limit, a hard limit, or both soft and hard limits. If you set both limits, the
soft limit must be less than the hard limit. The smallest quota that can be set is 1 MiB.
An administrator can set the storage space restriction for an MTree to prevent it from
consuming excess space.
Quota Settings are disabled by default. They can be set at the same time that an MTree is
created, or they can be set after creating the MTree. Quotas can be set and managed using
the System Manager or the CLI. The advantage of MTree operations is that quotas can be
applied to a specific MTree as opposed to the entire file system.
As data fills the MTree, Data Domain System Manager will display graphically and by
percentage the quota hard limit. You can view this display at Data Management > MTree.
The MTree display presents the list of MTrees, quota hard limits, daily and weekly pre-comp
and post-comp amounts and ratios.
You can also set quotas from the CLI with the command:
• quota set {all | mtrees <mtree-list> | storage-units <storage-unit-list>}
{soft-limit <n> {MiB|GiB|TiB|PiB} | hard-limit <n> {MiB|GiB|TiB|PiB} |
soft-limit <n> {MiB|GiB|TiB|PiB} hard-limit <n> {MiB|GiB|TiB|PiB}}
quota enable
Note: The information on this summary page may be delayed by up to 10-15 minutes. For
immediate data select Update.
For real-time monitoring of MTrees and quotas, the following commands can be used from
the command prompt:
In Space Usage by clicking on a specific point the graph will display the pre-comp written for that date and
time. This is the total amount of data sent to the MTree by backup servers. Pre-compressed
data on an MTree is what a backup server sees as the total uncompressed data held by an
MTree-as-storage-unit.
The Daily Written display shows the flow of data over the last 24 hours. Data amounts are
shown over time for pre and post-compression.
Pre-Comp Written is the total amount of data written to the MTree by backup servers.
Post-Comp Written is the total amount of data written to the MTree after compression has
been performed, as shown in GiBs.
These alerts are also reported in the Home Dashboard > Alerts pane.
A snapshot copy is made instantly and is available for use by other applications for data
protection, data analysis and reporting, and data replication. The original copy of the data
continues to be available to the applications without interruption, while the snapshot copy is
used to perform other functions on the data.
Snapshots enable better application availability, faster recovery, and easier back up
management of large volumes of data.
Snapshots continue to place a hold on the original data they reference even when the
backups have expired.
Snapshots are useful for saving a copy of Mtrees at specific points in time – for instance,
before a Data Domain OS upgrade – which can later be used as a restore point if files need
to be restored from that specific point in time.
You can schedule multiple snapshots at the same time or create them individually as you
choose.
The maximum number of snapshots allowed to be stored on a Data Domain system is 750
per MTree. You receive a warning when the number of snapshots reaches 90% of the
allowed number (675-749) in a given MTree. An alert is generated when you reach the
maximum snapshot count.
When changed production data is backed up, additional blocks are written, and pointers are
changed to access the changed data. The snapshot maintains pointers to the original, point-
in-time data. All data remains on the system as long as pointers reference the data.
Snapshots are a point-in-time view of a file system. They can be used to recover previous
versions of files, and also to recover from an accidental deletion of files.
Use the snapshot feature to take an image of an MTree, to manage MTree snapshots and
schedules, and to display information about the status of existing snapshots.
The Snapshots pane in the MTree summary page allows you to see at-a-glance, the total
number of snapshots collected, expired, and unexpired, as well as the oldest, newest, and
next scheduled snapshot.
You can associate configured snapshot schedules with a selected MTree name. Click Assign
Snapshot Schedules, select a schedule from the list of snapshot schedules and assign it.
You can create additional snapshot schedules if needed.
Sometimes, access to production backup data is restricted. Fast Copy gives access to all
data fast copied readable and writeable, making this operation handy for data recovery
from backups.
The difference between snapshots and fast copied data is that the Fast Copy duplicate is not
a point-in-time duplicate. Any changes that are made during the data copy, in either the
source or the target directories, will not be duplicated in the Fast Copy.
Note that Fast Copy is a read/write copy of a point-in-time copy at the time it was made
and a snapshot is read-only.
Fast Copy makes a copy of the pointers to data segments and structure of a source to a
target directory on the same Data Domain system.
You can use the Fast Copy operation to retrieve data stored in snapshots. In this example,
the /HR MTree contains two snapshots in the /.snapshot directory. One of these snapshots,
10-31-2016, is fast copied to /backup/Recovery. Only pointers to the actual data are
copied, adding a 1% to 2% increase in actual used data space. All of the referenced data is
readable and writable. If the /HR MTree or any of its contents is deleted, no data referenced
in the Fast Copy is deleted from the system.
Specifying a non-existent directory creates that directory. Be aware that the destination
directory must be empty or the Fast Copy operation will fail. You can choose to overwrite
the contents of the destination by checking that option in the Fast Copy dialog window.
You can also perform a Fast Copy from the command line. The following command copies a
file or directory tree from a Data Domain system source directory to a destination on the
Data Domain system: filesys fastcopy source <src> destination <dest>
Fast Copy makes a destination equal to the source, but not at a particular point in time. The
source and destination may not be equal if either is changed during the copy operation.
This data must be manually identified and deleted to free up space. Then, space
reclamation (file system cleaning) must be run to regain the data space held by the Fast
Copy. When backup data expires, a Fast Copy directory will prevent the Data Domain
system from recovering the space held by the expired data because it is flagged by the Fast
Copy directory as in-use.
Depending on the amount of space the file system must clean, file system cleaning can take
from several hours to several days to complete.
The example in this figure refers to dead and valid segments. Dead segments are segments
in containers no longer needed by the system, for example, claimed by a file that has been
deleted and was the only/or final claim to that segment, or any other segment/container
space deemed not needed by the file system internally. Valid segments contain unexpired
data used to store backup-related files. When files in a backup are expired, pointers to the
related file segments are removed. Dead segments are not allowed to be overwritten with
new data since this could put valid data at risk of corruption. Instead, valid segments are
copied forward into free containers to group the remaining valid segments together. When
the data is safe and reorganized, the original containers are appended back onto the
available disk space.
Since the Data Domain system uses a log structured file system, space that was deleted
must be reclaimed. The reclamation process runs automatically as a part of file system
cleaning.
During the cleaning process, a Data Domain system is available for all normal operations, to
include accepting data from backup systems.
Cleaning does require a significant amount of system processing resources and might take
several hours, or under extreme circumstances days, to complete even when undisturbed.
Cleaning applies a set processing throttle of 50% when other operations are running,
sharing the system resources with other operations. The throttling percentage can be
manually adjusted up or down by the system administrator.
Access the Clean Schedule section by selecting Settings > Cleaning. This displays the
current cleaning schedule, and throttle setting. In this example, we can see the default
schedule - every Tuesday @ 6 a.m. and 50% throttle. The schedule can be edited.
When the cleaning operation finishes, a message is sent to the system log giving the
percentage of storage space that was reclaimed.
The factors affecting how fast data on a disk grows on a Data Domain system include:
• The size and number of data sets being backed up. An increase in the number of
backups or an increase in the amount of data being backed-up and retained causes
space usage to increase.
• The compressibility of data being backed up. Pre-compressed data formats do not
compress or deduplicate as well as non-compressed files and thus increase the
amount of space used on the system.
• The retention period specified in the backup software. The longer the retention period,
the larger the amount of space required.
If any of these factors increase above the original sizing plan, your backup system could
easily overrun its capacity.
There are several ways to monitor the space usage on a Data Domain system to help
prevent system full conditions.
The first pane shows the amount of disk space available and used by file system
components, based on the last cleaning.
Size: The amount of total physical disk space available for data.
Used: The actual physical space used for compressed data. Warning messages go to the
system log, and an email alert is generated when the use reaches 90%, 95%, and 100%.
At 100%, the Data Domain system accepts no more data from backup hosts.
Available: The total amount of space available for data storage. This figure can change
because an internal index may expand as the Data Domain system fills with data. The index
expansion takes space from the Available amount.
Cleanable: The estimated amount of space that could be reclaimed if a cleaning operation
were run.
The bottom pane displays compression information:
• Pre-Compression: Data written before compression
• Post-Compression: Storage used after compression
• Global-Comp Factor: Pre-Compression / (Size after global compression)
• Local-Comp Factor: (Size after global compression) / Post- Compression
• Total-Comp Factor: Pre-Compression / Post-Compression
• Reduction %: [(Pre-Compression - Post-Compression) / Pre-Compression]
The Space Usage view contains a graph that displays a visual representation of data usage
for the system. The time frame choices are one week, one month, three months, one year,
and All. Custom date ranges can also be entered. The above graph is set for 7 days.
It displays Post-Comp in red, Comp Factor in green, Cleaning in yellow, and Data Movement
in purple.
Data Movement refers to the amount of disk space moved to the archiving storage area.
The Archive license is required for this.
With the Capacity option greyed out, as shown on the slide, the scale is adjusted in order
to present a clear view of space used. In this example, ~9 GiB Post-Comp has been stored
on a 62.78 TiB Capacity and a Comp Factor of 79x.
This view is useful to note trends in space availability on the Data Domain system, such as
changes in space availability and compression in relation to cleaning processes.
Clicking the Capacity checkbox toggles this line on and off. The scale now displays Space
Used relative to the total capacity of the system, with a blue Capacity line indicating the
storage limit.
This view also displays cleaning start and stop data points. This graph is set for one week
and displays one cleaning event. The cleaning schedule on this Data Domain system is at
the default of one day per week.
This view is useful to note trends in space availability on the Data Domain system, such as
changes in space availability and compression in relation to cleaning processes.
For more information about using PCM from the command line, see the EMC Data Domain Operating
System Command Reference Guide.
At a system level, shared data is calculated only once. and is reported to each namespace that is
sharing the data subset along with their unique data.
Physical Capacity Measurement can answer questions like, how much physical space is each subset
using? How much total compression is each subset reporting? How does physical space utilization for
a subset grow and shrink over time? How can one tell whether a subset has reached its physical
capacity quota? And what proportion of the data is unique and what proportion is shared with other
subsets?
With IT as a service (ITAAS), Physical Capacity Measurement can be used to calculate chargeback
details for internal customers or billing details for third-party customers sharing space on a Data
Domain system.
Using physical capacity measurement, it is now possible to enforce data capacity quotas for physical
space use where previously only logical capacity could be calculated. These types of measurements
are essential for customer chargeback and billing.
Through physical capacity measurement, IT management can view trends in customer’s physical
storage, plan capacity needs, identify poor datasets, and identify accounts that might benefit by
migrating to a different storage space for growth purposes.
The Data Domain System Manager can configure and run physical capacity measurement operations
at the MTree level only.
The Data Domain Management Center version 1.4 and later is enhanced to perform all of the physical
capacity measurement operations except defining pathsets.
Four, click the plus, pencil, or X button to add, edit, or delete a schedule, respectively.
When a measurement job completes, the results are graphed and are viewed under the
selected MTree in the Space Usage tab.
It is useful to see data ingestion and compression factor results over a selected duration.
You should be able to notice trends in compression factor and ingestion rates.
Local-Comp Factor refers to the compression of the files as they are written to disk. The
default Local compression is lz. lz is the default algorithm that gives the best throughput.
Data Domain recommends the lz option.
Gzfast is a zip-style compression that uses less space for compressed data, but more CPU
cycles (twice as much as lz). Gzfast is the recommended alternative for sites that want
more compression at the cost of lower performance.
gz is a zip-style compression that uses the least amount of space for data storage (10% to
20% less than lz on average; however, some datasets get much higher compression). This
also uses the most CPU cycles (up to five times as much as lz). The gz compression type is
commonly used for nearline storage applications in which performance requirements are
low.
For more detailed information on these compression types refer to the Data Domain
Operating System Administration Guide.
After new enclosures are installed, you can migrate the data from the older enclosures to
the new enclosures while the system continues to support other processes such as data
access, expansion, cleaning, and replication. The storage migration requires system
resources, but you can control this with throttle settings that give migration a relatively
higher or lower priority. You can also suspend a migration to make more resources available
to other processes, then resume the migration when resource demand is lower.
You can check the storage migration status in the Hardware > Storage tabs.
In our example, we select an ES20 enclosure as the source for our migration.
This screen also displays the storage license status and an Add Licenses button.
The Available Enclosures list displays the enclosures that are eligible destinations for
storage migration. We have selected an ES30 enclosure as our destination.
The license status bar represents all of the storage licenses installed on the system. The
green portion represents licenses that are in use, and the and clear portion represents the
licensed storage capacity available for destination enclosures. If you need to install
additional licenses to support the selected destination controllers, click Add Licenses.
For example, An Expanded-Storage license is required to expand the active tier storage
capacity beyond the entry capacity. Be aware that the capacity supported varies by Data
Domain model.
Remember that a Storage Migration can take hours, days, or weeks depending on the
amount of data being migrated.
The Storage Migration that was used to create these screens was taken from a Data
Domain system in a lab test environment. It does not represent the amount of data that
would be found in a real-life environment. This is why the times are fairly short for the
duration of a Storage Migration.
It is not necessary to run the Estimate first, but if the preconditions are not met, the
migration will be halted. It is recommended to run Estimate first so we will know that the
migration will run. We will also know the estimated duration of the migration and plan
accordingly.
During the first stage, the progress is shown on the progress bar and no controls are
available.
You can click Pause to suspend the migration and later click Resume to continue the
migration.
The Low, Medium, and High buttons define throttle settings for storage migration
resource demands. A low throttle setting gives storage migration a lower resource priority,
which results in a slower migration and requires fewer system resources. Conversely, A high
throttle setting gives storage migration a higher resource priority, which results in a faster
migration and requires more system resources. The medium setting selects an intermediate
priority.
You do not have to leave this dialog open for the duration of the migration. To check the
status of the migration after closing this dialog, select Hardware > Storage and view the
migration status. To return to this dialog from the Hardware/Storage page, click Manage
Migration. The migration progress can also be viewed by selecting Health > Jobs.
It is a good practice to start this stage during a maintenance window or a period of low
system activity.
Even when a Data Domain system is outfitted with a maximum recommended number of
shelves attached, the storage migration feature can accommodate additional, destination
shelves, but only for the purpose of data migration.
When migrating storage to new expansion shelves, all of the attached destination storage
becomes a part of the file system. The destination storage must be of a capacity within the
limits of the maximum capacity supported on the source Data Domain system. If there is
not enough capacity to support the amount of data found at the source or if the destination
file system size exceeds the maximum amount of storage allowed by the source system,
the storage migration feature reports the conflict during the pre-check phase and does not
allow the data transfer to the destination storage.
Storage migration supports extended retention, always within the same tier, either active or
archive.
You can set limits on the amount of logical, pre-comp, space used by individual MTrees
using MTree hard and soft quotas.
Snapshots enable you to save a read-only copy of an MTree at a specific point in time.
Fast copy gives read/write access to all data fast copied, making this operation handy for
data recovery from snapshots.
The default time scheduled for File system cleaning is every Tuesday at 6 a.m. EMC
recommends running cleaning once per week at a time of low network activity.
Frequent cleaning, more than once per week, is not recommended. It can cause poor
deduplication and increased file fragmentation.
Total compression factor is the pre-compression rate divided by the post-compression rate.
Storage migration supports the replacement of existing storage enclosures with new
enclosures that may offer higher performance, higher capacity, and a smaller footprint.
In a replication scenario, a local Data Domain system can be used to store backup data onsite for a short period,
such as 30, 60, or 90 days. Lost or corrupted files can be recovered easily from the local Data Domain system.
The replication process allows you to quickly copy data to another system (typically offsite) for a second level of
disaster recovery when the data on the local system is unavailable.
Replication occurs in real time and does not require that you suspend backup operations. Data is replicated after it
has been deduplicated and compressed on the source system.
The replication process only copies information that does not exist on the destination system. This technique reduces
network demands during replication because only unique data segments are sent over the network.
The replication process is designed to deal with network interruptions common in the WAN and to recover gracefully
with very high data integrity and resilience. This ensures that the data on the replica is in a state usable by
applications – a critical component for optimizing the utility of the replica for data recovery and archive access.
If the local data becomes unavailable, the offsite replica may be used to ensure operations continue.
The data on the replica can be restored to the local site using a few simple recovery configuration and initiation
commands. The replication process allows you to quickly move data offsite (with no delays in copying and moving
tapes).
Replication is a software feature that requires an additional license. You need a replicator license for both the source
and destination Data Domain systems.
A Data Domain system can simultaneously be the source of some replication contexts and the destination
for other contexts.
The count of replication streams per system depends upon the processing power of the Data Domain
system on which they are configured. Smaller, less powerful systems can be limited to only 15 source and
20 destination streams, while the most powerful Data Domain system can handle over 200 streams.
Directory replication: A subdirectory under /backup and all files and directories below it on a source
system replicates to a destination directory on a different Data Domain system. This transfers only the
deduplicated changes of any file or subdirectory within the selected Data Domain file system directory.
Directory replication can also be used to replicate a media pool if the pool is using backward-compatibility
mode.
MTree replication: This is used to replicate MTrees between Data Domain systems. Media pools can
also be replicated. By default (as of DD OS 5.3), MTrees (that can be replicated) are used when a media
pool is created.
It uses the same WAN deduplication mechanism as used by directory replication to avoid sending
redundant data across the network. The use of snapshots ensures that the data on the destination is
always a point-in-time copy of the source with file consistency, while reducing replication churn, thus
making WAN use more efficient. Replicating individual directories under an MTree is not permitted with
this type.
Managed File Replication: A fourth type, managed replication, belongs to Data Domain Boost
operations and is discussed later in this course.
One-to-one replication is the simplest type of replication is from a Data Domain source system to a Data
Domain destination system. This replication topology can be configured with directory, MTree, or
collection replication types.
With bi-directional replication, data from a directory or MTree on System A is replicated to System B,
and from another directory or MTree on System B is replicated to System A.
With one-to-many replication, data flows from a source directory or MTree on a System A to several
destination systems. You could use this type of replication to create more than two copies for increased
data protection, or to distribute data for multi-site usage.
With many-to-one replication MTree or directory replication data flows from several source systems to a
single destination system. This type of replication can be used to provide data recovery protection for
several branch offices at the corporate headquarters IT systems.
Cascaded replication: In a cascaded replication topology, a source directory or MTree is chained among
three Data Domain systems. The last hop in the chain can be configured as collection, MTree, or directory
replication, depending on whether the source is directory or MTree.
For cascaded configurations, the maximum number of hops is two, that is, three DD systems.
For example, the first DD system replicates one or more MTrees to a second DD system, which then
replicates those MTrees to a final DD system. The MTrees on the second DD system are both a
destination (from the first DD system) and a source (to the final DD system). Data recovery can be
performed from the non-degraded replication pair context.
After replication is initialized, ownership and permissions of the destination are always identical to those of
the source.
You can usually replicate only between machines that are within two releases of each other, for example,
from 5.6 to 6.0. However, there may be exceptions to this (as a result of atypical release numbering), so
review the user documentation.
The Data Domain file system must be enabled or, based on the replication type, will be enabled as part of
the replication initialization.
In the replication command options, a specific replication pair is always identified by the destination.
Both systems must have an active, visible route through the IP network so that each system can resolve
its partner's host name.
During replication, a Data Domain system can perform normal backup and restore operations.
Collection replication is the fastest and lightest type of replication offered by the DD OS. There is no on-
going negotiation between the systems regarding what to send. Collection replication is mostly unaware of
the boundaries between files. Replication operates on segment locality containers that are sent after they
are closed.
With collection replication, all user accounts and passwords are replicated from the source to the
destination. However, as of DD OS 5.5.1.0, other elements of configuration and user settings of the DD
system are not replicated to the destination; you must explicitly reconfigure them after recovery.
If the Data Domain system is a source for collection replication, snapshots are also replicated.
Because there is only one collection per Data Domain system, this is specifically an approach to system
mirroring. Collection replication is the only form of replication used for true disaster recovery. The
destination system cannot be shared for other roles. It is read-only and shows data only from one source.
After the data is on the destination, it is immediately visible for recovery.
The destination system is a read-only system. It can only accept data from the replication process. No
data, including snapshots and files, can be written to the destination system except through the replication
process. If you must write data to the destination, you must first disable replication by breaking the
replication context. Unfortunately, if the context has been broken, a resync cannot be performed.
Collection replication supports Retention Lock Compliance. Of course, it must be licensed on both
systems.
Data Domain Replicator software can be used with the optional Encryption of Data at Rest feature,
enabling encrypted data to be replicated using collection replication. Collection replication requires the
source and target to have the exact same encryption configuration because the target is expected to be an
exact replica of the source data. In particular, the encryption feature must be turned on or off at both
source and target and if the feature is turned on, then the encryption algorithm and the system
passphrases must also match.
Encryption parameters are checked during the replication association phase. During collection replication,
the source system transmits the encrypted user data along with the encrypted system encryption key. The
data can be recovered at the target, because the target machine has the same passphrase and the same
system encryption key.
Directory replication operates based upon filesystem activity. When activity occurs on the system, such
as a new directory, change of permissions, file rename, or file closed, the source system communicates
the update to the destination. In cases where file closures are infrequent, the Data Domain source system
forces the data transfer periodically.
If there is new user file data to be sent, the source first creates a list of file segment IDs in the file. The
source then sends this list to the destination system. The destination system examines the list of segment
IDs to determine which are missing. The destination then sends a list of the missing segments to the
source. The source now sends the missing segments to the destination. In this way, bandwidth between
the source and destination system is used more efficiently.
If the Data Domain system is a source for directory replication, snapshots within that directory are not
replicated. You must create and replicate snapshots separately.
Directory replication can receive backups from both CIFS and NFS clients as long as separate directories
are used for each. Do not mix CIFS and NFS data in the same directory.
The directory replication source cannot be the parent or the child of a directory that is already being
replicated.
In a directory replication pair, the destination is always read-only. The destination can only receive data
only from the source system and directory. If you need to write to the destination directory outside of
replication, you must first break (delete) the replication context between the two systems. Breaking the
context is also referred to as deleting the link.
The destination directory can coexist on the same system with other replication destination directories,
replication source directories, and other local directories.
MTree replication copies the data segments associated with the entire MTree structure. This means that
all metadata, file data, and everything else related to the MTree is replicated.
The destination Data Domain system does not expose the replicated data until all of the data for that
snapshot has been received. This ensures the destination is always a point-in-time image of the source
Data Domain system. Because the directory tree structure is part of the data included in the snapshot,
files do not show out of order at the destination. This provides file-level consistency. Snapshots are also
replicated.
MTree uses the same WAN deduplication mechanism as used by directory, and collection, replication to
avoid sending redundant data across the network. It also supports the same topologies that directory
replication supports.
MTree replication works only at the MTree level. If you want to implement MTree replication, you must
move data from the existing directory structure within the /backup MTree to a new or existing MTree, and
create a replication pair using that MTree.
For example, suppose that a Data Domain system has shares mounted in locations under /backup as
shown in the directory-based layout.
If you want to use MTree replication for your production (prod) data, but are not interested in replicating
any of the development (dev) data, the data layout can be modified to create two MTrees: /prod and /dev,
with two directories within each of them. The old shares would then be deleted and new shares created for
each of the four new subdirectories under the two new MTrees. This would look like the structure shown in
the MTree-based layout.
The Data Domain system now has two new MTrees, and four shares as earlier. You can set up MTree
replication for the /prod MTree to replicate all of your production data and not set up replication for the /dev
MTree as you are not interested in replicating your development data.
Source
• Data can be logically segregated into multiple MTrees to promote greater replication performance.
• Replicating directories under an MTree is not permitted. Therefore, a directory below the root of an
MTree cannot be the replication source.
Snapshots
• Snapshots must be created on source contexts.
• Snapshots cannot be created on a replication destination.
• Snapshots are replicated with a fixed retention of one year; however, the retention is adjustable on
the destination and must be adjusted there.
VTL
• Replicating VTL tape cartridges (or pools) simply means replicating MTrees or directories that
contain VTL tape cartridges. Media pools are replicated by MTree replication, as a default.
• A media pool can be created in backward-compatibility mode and can then be replicated via
directory-based replication. You cannot use the pool:// syntax to create replication contexts using
the command line. When specifying pool-based replication in DD System Manager, either directory
or MTree replication will be created, based on the media pool type.
After data is initially replicated using the high-speed network, you then move the system back to its
intended location.
After data is initially replicated, only new data is sent from that point onwards.
The CLI, system logs, and other facilities use a replication URL to identify the endpoints of a context on
the replication source and destination systems. On screen are some example replication URL contexts.
The replication context type is identified in the part of the URL known as the scheme. The scheme is also
referred to as the protocol or prefix portion of a URL.
A URL scheme of "dir" identifies a directory replication context. An "mtree" URL scheme identifies an
MTree replication context. A URL scheme of col identifies a collection replication context.
The host-name portion of the URL the same as the output of the net show hostname CLI command. The
path is the logical path to the target directory or MTree. The path for a directory URL must start with
/backup and end with the name of the target directory. The path for an MTree URL starts with /data/col1
and ends with the name of the target MTree. The path is not part of a collection URL.
Reference
2. Selecting a context causes the system to display detailed information about that context in the
Detailed Information section of the screen.
Since collection, MTree, and directory contexts have different requirements, the detailed information
shown changes depending on the context type.
1. When you add a partner system, first make sure the partner system being added is running a
compatible DD OS version.
3. Select Manage Systems. The Manage System Dialogue box appears listing the devices this Data
Domain system is currently configured to manage.
4. Select the add icon which is represented by the green plus sign (+). The Add System dialogue box
appears.
5. Enter the partner system's host name and the password assigned to the sysadmin user.
The source system transmits data to a destination system listen port. As a source system can have
replication configured for many destination systems (each of which can have a different listen port),
each context on the source can configure the connection port to the corresponding listen port of the
destination.
7. Select OK when the information for the partner system is complete. Select OK. The Verify Certificate
dialogue box appears.
9. If the system was successfully added, DDSM returns to the Manage Systems dialogue box and the
newly added partner system is listed.
If you identify the systems using an IPv6 addresses are supported only when adding a DD OS 5.5 or later
system to a management system using DD OS 5.5 or later.
2. Next, select Create Pair. The Create Pair Dialogue box appears.
4. Select the replication direction for the context. If the device being configured is the source for the
context, select Outbound. If the device being configured is the destination in the context, select
Inbound.
3. If the destination system is not listed in the dropdown menu, add it at this time by selecting the Add
System hyperlink.
5. If the file system on the replication source is enabled, a warning is displayed. Select OK to continue
or Cancel to go back.
3. If the destination system is not listed in the dropdown menu, add it at this time by selecting the Add
System hyperlink.
5. Provide the name of the destination directory. The source and destination directories must be under
the /data/col1/backup directory MTree. The source and destination directories are not required to be
on the same directory level.
7. Monitor the system as it verifies the destination system is qualified as a destination for a directory
replication context.
3. If the destination system is not listed in the dropdown menu, add it at this time by selecting the Add
System hyperlink.
5. Provide the name of the destination MTree. The source and destination MTrees must be directly
under /data/col1/ in the filesystem. The source and destination MTrees are required to be at the same
directory level.
7. Monitor the system as it verifies the destination system is qualified as a destination for an MTree
replication context.
This means all replication source systems must be configured to connect to this particular port value.
On the right side of the screen are three replication source systems. All are supposed to connect to the
single replication destination on the left side of the screen.
Because the replication destination has a default listen port value of 2051, each replication source needs
to have a corresponding connection port value of 2051. The top two systems are configured correctly, but
the bottom right system has an incorrect connection port value that prohibits it from successfully
replicating to the destination system.
You can modify the listen port option if the default connection between the replication and source are
impacted by a firewall configuration or other network issues.
The connection port is the TCP port the source system uses to communicate to the replication destination.
The connection port is configured per context. It is not a global setting. The default value for the
connection port is 2051.
3. Select Change Network Settings. The Network Settings dialogue box appears.
4. Enter the new Listen Port value or select Default if you wish to change the Listen Port value back the
default value.
1. If you are creating a context with a non-default value, navigate to the Replication > Automatic >
Summary tab on the source system.
LBO can reduce WAN bandwidth utilization. It is useful if file replication is being performed over a low-
bandwidth WAN link.
LBO reduces bandwidth utilization by providing additional compression during data transfer.
Only enable LBO for replication contexts that are configured over WAN links with less than 6 Mb per
second of available bandwidth.
LBO can be applied on a per-context basis to all file replication jobs on a system.
Additional tuning might be required to improve LBO functionality on your system. Use bandwidth and
network-delay settings together to calculate the proper TCP buffer size, and set replication bandwidth for
replication for greater compatibility with LBO.
LBO is enabled on a per-context basis. LBO must be enabled on both the source and destination Data
Domain systems. If the source and destination have incompatible LBO settings, LBO will be inactive for
that context.
Reductions through deduplication make it possible to replicate everything across a small WAN link. Only
new, unique segments need to be sent. This reduces WAN traffic down to a small percentage of what is
needed for replication without deduplication. These large factor reductions make it possible to replicate
over a less-expensive, slower WAN link or to replicate more than just the most critical data.
Delta compression is a global compression algorithm that is applied after identity filtering. The algorithm
looks for previous similar segments using a sketch-like technique that sends only the difference between
previous and new segments. In this example, segment S1 is similar to S16. The destination can ask the
source if it also has S1. If it does, then it needs to transfer only the delta (or difference) between S1 and
S16. If the destination doesn’t have S1, it can send the full segment data for S16 and the full missing
segment data for S1.
Delta comparison reduces the amount of data to be replicated over low-bandwidth WANs by eliminating
the transfer of redundant data found with replicated, deduplicated data. This feature is typically beneficial
to remote sites with lower-performance Data Domain models.
When using DDSM, you can enable LBO when you create the context, or the LBO setting can be modified
after the context is created.
1. If you wish to create a context with LBO enabled, navigate to the Replication > Automatic >
Summary tab on the source system.
1. If you wish to change the LBO setting on an existing context, navigate to the Replication >
Automatic > Summary tab on the source system.
It is important to note, when configuring encrypted file replication, that it must be enabled on both the
source and destination Data Domain systems. Encrypted replication uses the ADH-AES256-SHA cipher
suite and can be monitored through the Data Domain System Manager.
When you enable the encryption over wire option on a replication context, the system must first process
the data it reads from the disk. If you have the data at rest encryption feature enabled, the source system
must decrypt the data before it can be processed for replication. Otherwise, the data is simply read from
the source system.
Prior to transmitting the data to the destination system, the replication source encrypts the data using the
encryption over wire algorithm.
When the replication destination system receives the replication traffic, it must decrypt it using the
encryption method employed by the replication feature.
If the data at rest encryption feature is enabled on the destination Data Domain system, the data must be
encrypted by the destination using the method specified by the data at rest encryption feature.
If the data at rest encryption feature is not enabled, the destination system writes the data to the disk using
normal processes.
When using DDSM, you can enable the encryption over wire feature when you create the context. You
can also modify the encryption over wire setting after the context is created.
1. If you wish to create a context with Encryption over Wire enabled, navigate to the Replication >
Automatic > Summary tab on the source system.
1. If you wish to change the Encryption Over Wire setting on an existing context, navigate to the
Replication > Automatic > Summary tab on the source system.
The Throttle Settings area shows the current settings for any Temporary Overrides. If an override is
configured, this section shows the throttle rate, or 0 which means all replication traffic is stopped. The
throttle Settings area also shows the currently configured Permanent Schedule. You should see the time
for days of the week on which scheduled throttling occurs.
1. To add throttle settings, navigate to the Replication > Automatic > Advanced Settings tabs.
2. Select the Add Throttle Setting button. The Add Throttle Setting dialog box appears.
3. Set the days of the week that throttling is active by clicking the checkboxes next to the days.
4. Set the time that throttling starts with the Start Time selectors for the hour, minute and A.M./P.M.
5. In the Throttle Rate area, Click the Unlimited radio button to set no limits.
6. Enter a number in the text entry box (for example, 20000) and select the rate from the drop-down
menu (bps, Bps, Kibps, or KiBps).
7. Select the 0 Bps (Disabled) option to disable all replication traffic.
8. Click OK to set the schedule.
9. Select to override the current throttle configuration, select Set Throttle Override. The throttle override
dialogue box appears.
10. If you select the Clear at next scheduled throttle event checkbox, the throttle schedule will return to
normal at that time. If you do not select this option, the override throttle stays in affect until you
manually clear it.
11. Select OK to invoke the Throttle Override setting. The overrides schedule is shown in the Throttle
Settings Permanent Schedule area.
The Encryption Over Wire feature can also be controlled from the command line. The encryption enable
or disable directive can be included in the command line when you add or modify a context. Since
Encryption over the wire is disabled by default, there is no need to use the disabled option when adding a
context.
# replication add … encryption enabled
# replication modify … encryption enabled
# replication modify … encryption disabled
There are two types of replication reports provided by the Data Domain system; the Replication status
report and the Replication Summary report.
The Replication Status report displays three charts that provide the status of the current replication job
running on the system. This report is used to provide a snapshot of what is happening for all replication
contexts to help understand the overall replication status on a Data Domain System.
The Replication summary report provides performance information about a system's overall network in-
and-out usage for replication, as well as per context levels over a specified duration. You select the
contexts to be analyzed from a list.
1. Select Reports > Management. The information panel displays a new report area and a list of saved
reports.
4. Click Create. After the report is created, it appears in the Saved Reports section of the screen.
6. Select View to display the report. If the report does not display, verify the option to block pop-up
windows is enabled on your browser.
If an error exists in a reported context, a section called “Replication Context Error Status” is added to the
report. It includes the ID, source/destination, the type, the status, and a description of the error.
1. Select Reports > Management. The information panel displays a new report area and a list of saved
reports.
4. Click Create. After the report is created, it appears in the Saved Reports section of the screen.
6. Select View to display the report. If the report does not display, verify the option to block pop-up
windows is enabled on your browser.
Network In (MiB): The amount of data entering the system. Network In is indicated by a thin green line.
Network Out (MiB): The amount of data sent from the system. Network Out is indicated by a thick
orange line.
Onsite Data Domain systems are typically used to store backup data onsite for short periods such as 30,
60, or 90 days, depending on local practices and capacity. Lost or corrupted files are recovered easily
from the onsite Data Domain system since it is disk-based, and files are easy to locate and read at any
time.
In the case of a disaster destroying onsite data, the offsite replica is used to restore operations. Data on
the replica is immediately available for use by systems in the disaster recovery facility. When a Data
Domain system at the main site is repaired or replaced, the data can be recovered using a few simple
recovery configuration and initiation commands.
If something occurs that makes the source replication data inaccessible, the data can be recovered from
the offsite replica. During collection replication, the destination context must be fully initialized for the
recover process to be successful.
Note: If a recovery fails or must be terminated, the replication recovery can be aborted.
If source replication data becomes inaccessible, it can be recovered from the replication destination. The
source must be empty before recovery can proceed. Recovery can be performed for all replication
topologies, except for MTree replication.
2. Select More > Start Recover... to display the Start Recover dialog box.
4. Select the host name of the system to which data needs to be restored from the System to recover to
menu.
5. Select the host name of the system that will be the data source from the System to recover from
menu.
Note: If a recovery fails or must be terminated, the replication recover can be aborted. Recovery on the
source should be restarted again as soon as possible by restarting the recovery.
1. Click the More menu and select Abort Recover. The Abort Recover dialog box appears, showing the
contexts that are currently performing recovery.
2. Click the checkbox of one or more contexts to abort from the list.
3. Click OK.
Resynchronization can be used to convert a collection replication to directory replication. This is useful
when the system is to be a source directory for cascaded replication. A conversion is started with a
replication resynchronization that filters all data from the source Data Domain system to the destination
Data Domain system. This implies that seeding can be accomplished by first performing a collection
replication, then breaking collection replication, then performing a directory replication resynchronization.
Resynchronization can also be used to re-create a context that was lost or deleted.
Also, use resynchronization when a replication destination runs out of space and the source system still
has data to replicate.
Also covered in this module was replication seeding and the resynchronizing of recovered data.
A virtual tape library appears to the backup software as a SCSI robotic device or changer. Virtual tape
drives are accessible to backup software in the same way as physical tape drives. Once drives are
created in the VTL, they appear to the backup software as SCSI tape drives.
A Fibre Channel (FC) equipped host connecting to a Storage Area Network (SAN) can communicate with
a Fibre Channel equipped Data Domain system. When properly zoned, the host can send its backups
using the FC protocol directly to the VTL-enabled Data Domain system.
Data Domain systems support backups over the SAN via Fibre Channel. The backup application on the
backup host manages all data movement to and from Data Domain systems. An FC switch is not needed
when a direct connection from the backup host to the Data Domain system is used.
When disaster recovery is needed, tape pools can be replicated to a remote Data Domain system using
the Data Domain replication process.
To protect data on tapes from modification, tapes can be locked using Retention Lock Governance
software.
The VTL service provides a network interface to the Data Domain file system. The VTL service can be
active along-side CIFS, NFS, and DD Boost services - which also provide network interfaces into the file
system.
VTL has been tested with, and is supported by, specific backup software and hardware configurations. For
more information, see the appropriate Backup Compatibility Guide on the EMC Online Support Site.
Data Domain systems simultaneously support data access methods through Data Domain Virtual Tape
Library over Fibre Channel, remote Network Data Management Protocol (NDMP) access over Ethernet for
network-attached storage (NAS), Network File System (NFS) and Common Internet File System (CIFS)
file service protocols over Ethernet, and EMC Data Domain Boost. This deployment flexibility and simple
administration means users can rapidly adjust to changing enterprise requirements.
A Data Domain VTL eliminates the use of tape and the accompanying tape-related issues (large physical
storage requirement, off-site transport, high time to recovery, and tape shelf life) for the majority of
restores. Compared to normal tape technology, a Data Domain VTL provides resilience in storage through
the benefits of Data Invulnerability Architecture (DIA) (end-to-end verification, fault avoidance and
containment, continuous fault detection and healing, and file system recoverability).
Data Domain systems configured for VTL reduces storage space requirements through the use of Data
Domain deduplication technology.
Disk-based network storage provides a shorter Recovery Time Objective (RTO) by eliminating the need
for handling, loading, and accessing tapes from a remote location.
A Barcode is a unique ID for a virtual tape. Barcodes are assigned when the user creates the virtual tape
cartridge. A unique ID for a virtual tape that is assigned when the user creates the virtual tape cartridge.
A tape is a cartridge holding magnetic tape used to store data long term. The backup software creates
virtual tapes which to act the same as physical tape media. Tapes are usually represented in a system as
grouped data files. Tapes - virtual and real - can be moved between a long term retention vault to a
library. They can also move within a library across drives, slots, and CAPs.
A pool is a collection of tapes that maps to a directory on a file system, used to replicate tapes to a
destination. Note: Data Domain pools are not the same as backup software pools. Most backup software,
including EMC NetWorker, has its own pooling mechanism.
A tape drive is the device that records backed-up data to a tape cartridge. In the virtual tape world, this
drive still uses the same Linear Tape-Open (LTO) technology standards as physical drives.
There are additional generations of LTO, but only LTO -1, -2, -3, -4, and -5 are currently supported by
Data Domain systems. Depending on the multiplex setting of the backup application, each drive operates
as a device that can support one or more data streams.
A Changer (Tape Backup Medium Changer) is the device that handles the tape between a tape library and
the tape drive. In the virtual tape world, the system creates an emulation of a specific type of changer.
Although no tapes are physically moved within the Data Domain VTL system, the virtual tape backup
medium changer must emulate the messages your backup software expects to see when tapes are
moved to and from the drives. Selecting and using the incorrect changer model in your VTL configuration
causes the system to send incorrect messages to the backup software, which can cause the VTL system
to fail.
A cartridge access port (CAP) enables the user to deposit and withdraw tape cartridges (volumes) in an
autochanger without opening its door. In a VTL, a CAP is the emulated tape enter and eject point for
moving tapes to or from a library. The CAP is also called a mail slot.
A slot is a storage location within a library. For example, a tape library has one slot for each tape that the
library can hold.
A tape vault is a holding place for tapes not currently in any library. Tapes in the vault eventually have to
be moved into the tape library before they can be used.
An Access Group, or VTL Group, is a collection of initiators and the drives and changers they are allowed
to access. An access group may contain multiple initiators, but an initiator can exist in only one access
group.
Typically, any production Data Domain system running VTL has been assessed, planned, and configured
by Data Domain implementation expert prior to implementation and production.
The information presented in this lesson provides the current capacities for the various features in a Data
Domain VTL configuration. Your backup host may not support these capacities. Refer to your backup host
software support for correct sizing and capacity to fit your software.
Understand that the Data Domain VTL is scalable and should accommodate most configurations.
Standard practices suggest creating only as many tape cartridges as needed to satisfy backup
requirements, and enough slots to hold the number of tapes you create. Creating additional slots is not a
problem. The key in good capacity planning is to not be excessive beyond the system needs and add
capacity as needed.
For further information about the definitions and ranges of each parameter, consult the DD OS System
Administration Guide and the most current VTL Best Practices Guide. Both are available through the Data
Domain Support Portal.
• Data Domain systems support a maximum I/O block size of 1MB in size.
• All systems are currently limited to a maximum of 64 library instances, (64 concurrently active VTL
instances on each Data Domain system).
• The maximum numbers of slots in a library is 32,000. There can be a maximum of 64,000 slots in the
Data Domain system. You cannot have more tapes than you have slots.
• The Data Domain system supports 100 cartridge access ports (CAPs) per library and a maximum of
1000 CAPs in the system.
A Data Domain system with 59 or fewer CPU cores can support up to 540 drives.
A Data Domain system with 60 or more CPU cores can support up to 1080 drives.
Note: These are some of the maximum capacities for various features in a VTL configuration for the larger
Data Domain systems. Check the VTL Best Practices Guide for recommendations for your system and
configuration.
Also, verify the backup software can support one of the Changers and drives supported by the Data
Domain system. As of this writing, the Data Domain systems emulate the StorageTek L180 ,
RESTORER-L180, IBM TS3500, IBM I2000, Quantum I6000. The L180 is the default changer. The Data
Domain system emulates a number of Linear Tape-Open drives, including the IBM LTO-1, LTO-2, LTO-
3, LTO-4, and LTO-5 tape drives. It also emulates the HP LTO-3 and LTO-4 tape drives. The default
tape drive emulation is the IBM-LTO-5.
In a physical tape library setting, multiplexing – sending data from multiple clients interleaving the data
onto a single tape drive simultaneously – is a method to gain efficiency by sending data from multiple
clients to a single tape drive.
Multiplexing was useful for clients with slow throughput since a single client could not send data fast
enough to keep the tape drive busy.
With Data Domain VTL, multiplexing causes existing data to land on a Data Domain system in a different
order each time a backup is performed. Multiplexing makes it nearly impossible for a system to recognize
repeated segments, thus ruining deduplication efficiency. Do not enable multiplexing on your backup host
software when writing to a Data Domain system.
To increase throughput efficiency and maintain deduplication-friendly data, establish multiple data streams
from your client system to the Data Domain system. Each stream will require writing to a separate virtual
drive.
Refer to the DD OS Backup Compatibility Guide to verify initiator's FC HBA hardware and driver are
supported.
Upgrade initiator HBA to the latest supported version of firmware and software.
Dedicate the initiator's Fibre Channel port to Data Domain VTL devices.
Verify the speed of each FC port on the switch to confirm that the port is configured for the desired rate.
Consider spreading the backup load across multiple FC ports on the Data Domain system in order to
avoid bottlenecks on a single port.
The VTL service requires an installed FC interface card or VTL configured to use NDMP over Ethernet.
If the VTL communication between a backup server and a DD system is through an FC interface, the DD
system must have an FC interface card installed. Notice that whenever an FC interface card is removed
from (or changed within) a DD system, any VTL configuration associated with that card must be updated.
If the VTL communication between the backup server and the DD system is through NDMP, no FC
interface card is required. However, you must configure the Tape Server access group. Also, when using
NDMP, all initiator and port functionality does not apply.
Only initiators that need to communicate with a particular set of VTL target ports on a Data Domain
system should be zoned with that Data Domain system.
User Access
Make sure to plan which users will have access to the VTL features and plan to give them the appropriate
access to the system. For basic tape operations and monitoring, only a user login is required. To enable
and configure VTL services and perform other configuration tasks, a sysadmin login is required.
Depending on the configuration and overall performance limits of your particular Data Domain system you
might need to adjust the overall number of drives assigned for VTL.
See the current Data Domain Operating System Administration Guide for details.
Slot counts are typically based on the number of tapes are used over a retention policy cycle.
When choosing a tape size, you should also consider the backup application being used. For instance,
Hewlett Packard Data Protector supports only LTO-1 /200 GB capacity tapes.
Data Domain systems support LT0-1, LTO-2, LTO-3, LTO-4 and LTO-5 formats.
• LTO-1: 100 GB per tape
• LTO-2: 200 GB per tape
• LTO-3: 400 GB per tape
• LTO-4: 800 GB per tape
• LTO-5: 1.5 TiB per tape
If the data you are backing up is large, (over 200 GB, for example), you may want larger-sized tapes since
some backup applications are not able to span across multiple tapes.
The strategy of using smaller tapes across many drives gives your system greater throughput by using
more data streams between the backup host and Data Domain system.
Larger capacity tapes pose a risk to system full conditions. It is more difficult to expire and reclaim the
space on data being held on a larger tape than on smaller tapes. A larger tape can have more backups on
it, making it potentially harder to expire because it might contain a current backup on it.
If backups with different retention policies exist on a single piece of media, the youngest image will prevent
file system cleaning and reuse of the tape. You can avoid this condition by initially creating and using
smaller tape cartridges – in most cases, tapes in the 100GB to 200GB range.
Expired tapes are not deleted, and the space occupied by that tape is not reclaimed until it is relabeled,
overwritten, or deleted. Consider a situation in which 30% of your data is being held on a 1TB tape. You
could delete half of that data (500 GB) and still not be able to reclaim any of the space because the tape is
still holding unexpired data.
Unless you are backing up larger-size files, backing up smaller files to larger-sized tapes will contribute to
this issue by taking longer to fill a cartridge with data. Using a larger number of smaller-sized tapes can
reduce the chances of a few young files preventing cleaning older data on a larger tape.
When deciding how many tapes to create for your VTL configuration, remember, that creating more tapes
than you actually need might cause the system to fill up prematurely and cause unexpected system full
conditions. In most cases, backup software will use blank tapes before recycling tapes. It is a good idea to
start with a tape count less than twice the available space on the Data Domain system.
A good practice is to use either two or three of the first characters as the identifier of the group or pool in
which the tapes belong. If you use two characters as the identifier, you can then use four numbers in
sequence to number up to 10,000 tapes. If you use three characters, you are able to sequence only 1,000
tapes.
Note: If you specify the tape capacity when you create a tape through the Data Domain System Manager,
you will override the two-character tag capacity specification.
Data Domain systems support backups using NDMP over TCP/IP via standard Ethernet as an alternate
method. This offers a VTL solution for remote office/back office use.
Backup servers configured only with Ethernet can also back up to a Data Domain VTL when used with an
NDMP tape server on the Data Domain system. The backup host must also be running NDMP client
software to route the server data to the related tape server on the Data Domain system.
When a backup is initiated, the host tells the server to send its backup data to the Data Domain VTL tape
server. Data is sent via TCP/IP to the Data Domain system where it is captured to virtual tape and stored.
Access group configuration allows initiators (in general backup applications) to read and write data to
devices in the same access group. Access groups let clients access only selected LUNs (media changers
or virtual tape drives) on a system. A client set up for an access group can access only devices in its
access group.
An access group may contain multiple initiators, but an initiator can exist in only one access group.
A VTL preconfigured VTL access group named TapeServer lets you add devices that will support NDMP
(Network Data Management Protocol)-based backup applications.
Avoid making access group changes on a Data Domain system during active backup or restore jobs. A
change may cause an active job to fail. The impact of changes during active jobs depends on a
combination of backup software and host configurations.
2. Displayed on the screen is a table containing summary information about the DD Boost Access
Groups and the VTL access groups. Note the information includes the name of the group, the type of
service the group supports, the endpoint associated with the group, the names of the initiators in the
group, and the number of devices (disks, changers, LUNs) in the group. Note the groups that contain
initiators and devices.
3. The total number of groups configured on the system is shown at the bottom of this section.
4. Select the View VTL Groups hyperlink to navigate the Data Domain System Manager Protocol > VTL
page where there is more information and configuration tools.
2. Select the Access Group menu item. Click the plus sign (+) to expand the list if necessary.
2. Select the Access Group menu item. Click the plus sign (+) to expand the list if necessary.
3. Select the top-level groups folder. If you do not select this folder, the More Tasks > Group > Create...
item will not be available.
4. Select the More Tasks > Group > Create... item. The Create Access Group dialogue box appears.
6. From the Initiator list, select the Initiators you wish to add to this VTL Access Group. You may add
your initiator later, as you are not required to add one at this time.
7. Select Next. The Access group devices dialogue box now appears.
1. Navigate to the Protocols > VTL page in DDSM to start the delete process.
2. Select the Access Group menu item. Click the plus sign (+) to expand the list if necessary.
3. Select the target access group access group from the Access Groups list.
6. Click Next.
7. Click the delete icon - the red x - to remove the selected devices.
8. When the Modify Access Group Dialogue box is redisplayed, verify all devices have been deleted from
the devices list.
9. Click Next. The Modify Access Group > Summary dialogue box is displayed.
13. Verify the Protocols > VTL > Access Groups tab is active.
15. Select the More Tasks > Delete... menu item. The Delete Group Dialogue box with a list of VTL
Access groups is displayed.
To open the System Manager Configuration Wizard, go to the System Manager, and select Maintenance
> More Tasks > Launch Configuration Wizard.
Navigate to the VTL configuration, and click No until you arrive at the VTL Protocol configuration section.
Select Yes to configure VTL.
The wizard steps you through library, tape, initiator, and access group configuration.
Manual configuration is also possible. Manually configuring the tape library and tapes, importing tapes,
configuring physical resources, setting initiators, and creating VTL access groups are covered in the
following slides.
2. Navigate to the Protocols > VTL page to manage the VTL service. Once you navigate to this page,
you will see that the page is subdivided into sections.
The options under the Virtual Tape Libraries section enable you to manage the VTLs and their associated
devices.
The options under the Access Group section enable you to define the devices an individual initiator can
access.
The Resources section allows you to view the configuration of endpoints and initiators. To configure these
devices, you must navigate to the Hardware > Fibre Channel menu.
The VTL service provides the environment for virtual devices to exist. You may think of it as a virtual data
center.
The VTL service requires installation of an EMC Data Domain Virtual Tape Library (VTL) license before it
can be enabled.
If the VTL is going to provide virtual IBM i devices, an EMC Data Domain I/OS (for IBM i operating
environments) license is also required.
2. Select the VTL Services item. The state of the VTL service and VTL licenses are displayed. You will
not see the state of the service unless the VTL Service item is selected.
3. Verify the VTL license has been installed. If the license has not been installed, select the Add License
hyperlink and install the VTL license at this time.
4. Verify an I/OS license has also been installed if the VTL is in an IBM environment. This license must
be installed before any VTLs or tape drives are created.
5. After all required licenses have been installed, select the enable button to Enable the VTL service.
The VTL status should show as Enabled: Running and the Enable button changes to Disable.
When you create the VTL, you can only have one changer and you must identify the changer's model.
You must provide the number of slots your VTL contains. You can specify a quantity between 1 and
32,000.
You must also assign cartridge access ports (CAPs) to the VTL. Values from 0 to 100 are acceptable.
Finally, you must also provide the quantity and model of the tape drives in the VTL.
Even though tapes are used by the VTL, they are not an integral part of the VTL itself. The same is true
for tape pools.
3. Next, select the Libraries menu item. The contents of the More Tasks menu is dependent upon the
item selected in the left side menu, so you must ensure the correct item is selected.
4. Select More Tasks > Library > Create... menu item. The Create Library dialogue box is displayed.
5. Enter the values appropriate for your application. If the VTL is properly planned, you should know the
values to enter.
Select the Virtual Tape Libraries > VTL Service > Libraries menu item to view summary information
relating to all VTLs.
Select the Virtual Tape Libraries > VTL Service > Libraries > {library-name} menu item to view summary
information on the selected VTL. The number and disposition of tapes in the VTL is also shown. If no
tapes are associated with the VTL, there is nothing in the Tapes section.
Selecting the VTL's Drives menu item provides detailed related information for all drives. This includes the
drive number, vendor, product ID, revision number, serial number, and status. If a tape is in the drive, the
tape's barcode is displayed along with the name of the tape pool to which the tape belongs.
The system also provides tools to manage tape pools. You can create, delete, or rename tape pools.
1. After navigating to the Data Management > VTL page with DDSM, expand the Virtual Tape
Libraries menu and select the VTL that will hold the tapes. By doing this, the tapes you create will be
added directly to the VTL. There will be no need to import them after they are created.
2. Now, select More Tasks > Tapes > Create... to open the Create Tapes dialogue box.
3. Provide the information about the tapes you are creating. Refer to your implementation planning, to
find the number, capacity, and starting barcode for your tape set. You may select the Default tape
pool or a pool that you have created to hold the tapes.
4. Select OK when you are ready to create the tapes. The create tape process starts.
5. Once the Create Tapes process completes, select OK. You can now verify if the tapes have been
successfully created.
1. After navigating to the Data Management > VTL page with DDSM, expand the Pools menu on the
left side of the screen.
3. Now, select More Tasks > Pool > Create... to open the Create Pool dialogue box.
4. Provide a name for the Pool. Use a name that will identify the type of data that is on the tape. For
example, you could name the pool EngBkupPool to signify that it contains tapes relevant to
engineering backups.
5. Click the backwards compatibility checkbox to create the older-style tape pool under
/data/col1/backup/. If you do not check this box, the system creates a newer style tape pool that
leverages the MTree structure.
You can examine the list of MTrees on the system to view the MTrees associated with VTL.
When you enable VTL, the Default MTree-based tape pool is created.
To import tapes:
1. Select Data Management > VTL > VTL Service > Libraries.
2. Select a library and view the list of tapes, or click More Tasks.
3. Select Tapes > Import...
4. Enter the search criteria about the tapes you want to import and click Search.
5. Select the tapes to import from the search results.
6. Choose the target location for the tapes.
7. Select Next to beginning the importation process.
The NDMP must be enabled separately from the VTL service. The NDMP service is managed through
CLI.
Without NDMP, the VTL can only be accessed through Fibre Channel.
NDMP on a Data Domain system does not require a Fibre Channel HBA.
In fact, NDMP does not use a Fibre Channel HBA if one is installed.
The NDMP-client computer must also log in to a user account on the Data Domain system.
Two types of user accounts allow you to access the Data Domain system's VTLs through NDMP: a
standard DDOS user account and an NDMP user account.
If a standard DDOS user account is employed, the password is sent over the network as plain text. This,
of course, is unsecure.
The NDMP feature on the Data Domain system allows you to add a user specifically for NDMP access.
Password encryption can be added to the NDMP user for added security.
DSP distributes parts of the deduplication process to the NetWorker storage node using the embedded
DD Boost Library (or, for other backup applications, using the DD BOOST plug-in), moving some of the
processing normally handled by the Data Domain system to the application host. The application host
performs a comparison of the data to be backed up with the library and looks for any unique segments.
Thus it sends only unique segments to the Data Domain system.
To improve data transfer performance and increase reliability, you can create a group interface using the
advanced load balancing and link failover feature. Configuring an interface group creates a private
network within the Data Domain system, comprised of the IP addresses designated as a group. Clients
are assigned to a single group by specifying client name (client.emc.com) or wild card name (*.emc).
Benefits include:
• Potentially simplified installation management.
• A system that remains operational through loss of individual interfaces.
• Potentially higher link utilization.
• In-flight jobs that fail over to healthy links, so jobs continue uninterrupted from the point of view of
the backup application.
DD Boost in DD OS 5.2, and higher, supports optimized synthetic backups when integrated with backup
software. Currently, EMC NetWorker and Symantec NetBackup are the only supported software
applications using this feature.
Optimized synthetic backups reduce processing overhead associated with traditional synthetic full
backups. Just like a traditional backup scenario, optimized synthetic backups start with an initial full
backup followed by incremental backups throughout the week. However, the subsequent full backup
requires no data movement between the application server and Data Domain system. The second full
backup is synthesized using pointers to existing segments on the Data Domain system. This optimization
reduces the frequency of full backups, thus improving recovery point objectives (RPO) and enabling single
step recovery to improve recovery time objectives (RTO). In addition, optimized synthetic backups further
reduce the load on the LAN and application host.
Benefits include:
• Reduces the frequency of full backups
• Improves RPO and RTO
• Reduces load on the LAN and application host
Both low bandwidth optimization and encryption of managed file replication data are replication optional
features and are both supported with DD Boost enabled.
Storage units can be monitored and controlled just as any data managed within an MTree. You can set
hard and soft quota limits and receive reports about MTree content.
In normal backup operations, the backup host has no part in the deduplication process. When backups
run, the backup host sends all backup data to allow the Data Domain system to perform the entire
deduplication process on all of the data.
DD Boost can operate with DSP either enabled or disabled. DSP must be enabled or disabled on a per-
system basis; individual backup clients cannot be configured differently than the Data Domain system.
With application hosts, use DSP if your application hosts are underutilized and can accommodate the
additional processing assignment.
The network bandwidth requirements are significantly reduced because only unique data is sent over the
LAN to the Data Domain systems.
Consider DSP only if your application hosts can accommodate the additional processing required by its
share of the DSP workflow.
The Data Domain system uses a wide area network (WAN)-efficient replication process for deduplicated
data. The process can be optimized for WANs, reducing the overall load on the WAN bandwidth required
for creating a duplicate copy.
In this environment, a backup server is sending backups to a local Data Domain system. A remote Data
Domain system is set up for replication and disaster recovery of the primary site.
1. The NetWorker storage node initiates the backup job and sends data to the Data Domain system.
Backup proceeds.
2. The Data Domain system signals that the backup is complete.
3. Information about the initial backup is updated in the NetWorker media database.
4. The NetWorker storage node initiates replication of the primary backup to the remote Data
Domain system through a clone request.
5. Replication between the local and remote Data Domain systems proceed.
6. When replication completes, the NetWorker storage node receives confirmation of the completed
replication action.
7. Information about the clone copy of the data set is updated in the NetWorker media database.
Replicated data is now immediately accessible for data recovery using the NetWorker media database.
While it is acceptable for both standard MTree replication and managed file replication to operate on the
same system, be aware that managed file replication can be used only with MTrees established with DD
Boost storage units.
Be mindful not to exceed the total number of MTrees on a system. The MTree limit is a count of both
standard MTrees, and MTrees created as DD Boost storage units. Note that the limit is dependent on the
Data Domain System and the DD OS version.
Also, remember to remain below the maximum total number of replication pairs (contexts) recommended
for your particular Data Domain systems.
The Advanced Load Balancing and Link Failover feature allows for combining multiple Ethernet links into a
group. Only one of the interfaces on the Data Domain system is registered with the backup application.
DD Boost software negotiates with the Data Domain system on the interface registered with the backup
application to obtain an interface to send the data. The load balancing provides higher physical throughput
to the Data Domain system compared to configuring the interfaces into a virtual interface using Ethernet-
level aggregation.
The links connecting the backup hosts and the switch that connects to the Data Domain system are
placed in an aggregated failover mode. A network-layer aggregation of multiple 1 GbE or 10 GbE links is
registered with the backup application and is controlled on the backup server.
This configuration provides network failover functionality from end-to-end in the configuration. Any of the
available aggregation technologies can be used between the backup servers and the switch.
An interface group is configured on the Data Domain system as a private network used for data transfer.
The IP address must be configured on the Data Domain system and its interface enabled. If an interface
(or a NIC that has multiple interfaces) fails, all of the in-flight jobs to that interface transparently fail-over to
a healthy interface in the interface group (ifgroup). Any jobs started subsequent to the failure are routed to
the healthy interfaces. You can add public or private IP addresses for data transfer connections.
Note: Do not use 1GbE and 10 GbE connections in the same interface group.
During a traditional full backup, all files are copied from the client to a media server and the resulting
image set is sent to the Data Domain system . The files are copied even though those files may not have
changed since the last incremental or differential backup. During a synthetic full backup, the previous full
backup and the subsequent incremental backups on the Data Domain system are combined to form a
new, full backup. The new, full synthetic backup is an accurate representation of the client’s file system at
the time of the most recent full backup.
Because processing takes place on the Data Domain system under the direction of the storage node, or
media server, instead of the client, virtual synthetic backups help to reduce the network traffic and client
processing. Client files and backup image sets are transferred over the network only once. After the
backup images are combined into a synthetic backup, the previous incremental and/or differential images
can be expired.
The virtual synthetic full backup is a scalable solution for backing up remote offices with manageable data
volumes and low levels of daily change. If the clients experience a high rate of change daily, the
incremental or differential backups are too large. In this case, a virtual synthetic backup is no more helpful
than a traditional full backup. To ensure good restore performance, it is recommended that you create a
traditional full backup every two months, presuming a normal weekly full and daily incremental backup
policy.
The virtual synthetic full backup is the combination of the last full (synthetic or full) backup and all
subsequent incremental backups. It is time-stamped as occurring one second after the latest incremental.
It does NOT include any changes to the backup selection since the latest incremental.
Restore performance from a synthetic backup is typically worse than a standard full backup due to poor
data locality.
DD Boost over FC presents Logical Storage Units (LSUs) to the backup application and removes a
number of limitations inherent to tape and VTL:
• Enables concurrent read and write; Not allowed per virtual tape.
• Backup image is smallest unit of replication or expiration vs. Virtual tape cartridge, which results in
efficient space management.
• No access group limitations, simple configuration using very few access groups.
• Initiators can read and write to devices in its access group, but not to devices in other DD Boost access
groups.
• Initiators assigned to DD Boost access groups cannot be assigned to VTL access groups on the same
Data Domain system.
2. Displayed on the screen is a table containing summary information about the DD Boost Access
Groups and the VTL access groups. Note the information includes the name of the group, the type of
service the group supports, the endpoint associated with the group, the names of the initiators in the
group, and the number of devices (disks, changers, LUNs) in the group. Note the groups that contain
initiators and devices.
3. The DD Boost and VTL access groups are distinguished from one another by the Service type.
4. The total number of groups configured on the system is shown at the bottom of this section.
5. Select the View DD Boosts Groups hyperlink to navigate the Data Domain System Manager Protocol
> DD Boost page where there is more information and configuration tools.
4. Enter the group name in the Group Name field of the Create Access Group dialogue box. The group
name can be up to 128 characters in length. The name must be unique. Duplicate names are not
allowed.
5. From the Initiator list, select the Initiators you wish to add to this access group. You may add your
initiator later, as you are not required to add one at this time.
6. Select Next. The Create Access Group > Devices dialogue box now appears.
7. Enter the number of devices. The range is from 1 to 64 devices.
8. Select which endpoints to include.
9. Click Next. The Create Access Group > Summary dialogue box now appears.
11. Once you are satisfied, select Finish to create the DD Boost Access Group.
12. When the indicates the DD Boost Access Group creation process has completed, click OK.
For Dell/EMC Networker, Dell/EMC Avamar and Dell vRanger users, the Data Domain Boost library is
already included in recent versions of software. Before enabling DD Boost on Veritas Backup Exec, and
NetBackup, a special OST plug-in must be downloaded and installed on the backup host. The plug-in
contains the appropriate DD Boost Library for use with compatible Symantec product versions. Consult
the most current DD Boost Compatibility Guide to verify compatibility with your specific software and Data
Domain operating system versions. Both the compatibility guide and versions of OpenStorage (OST) plug-
in software are available through the Dell EMC Data Domain support portal at: http://support.emc.com.
A second destination Data Domain system licensed with DD Boost is needed when implementing
centralized replication awareness and management.
Enable DD Boost by navigating in the Data Domain System Manager to Protocols > DD Boost >
Settings. If the DD Boost Status reads “Disabled,” click the Enable button to enable the feature.
You can also enable DD Boost from the command line interface using the ddboost enable command.
You can use the ddboost status command to verify whether DD Boost is enabled or disabled on your
system.
In the Allowed Clients area, click the green plus button to allow access to a new client using the DD
Boost protocol on the system. Add the client name as a domain name since IP addresses are not allowed.
An asterisk (*) can be added to the Client field to allow access to all clients. You can also set the
Encryption Strength and Authentication Mode when setting up allowed clients.
To add a DD Boost user for the system, click the green plus button in the Users with DD Boost Access
section. In the Add User window, select from the list of existing users or add a new user.
You can also add users and clients using the command line:
• ddboost set user-name <user-name>
Set DD Boost user.
• ddboost access add clients <client-list>
Add clients to DD Boost access list.
Consult the Data Domain Operating System Command Reference Guide for more detailed information on
using the ddboost commands to administer DD Boost.
Click the plus sign to open the Create Storage Unit dialog. Name the storage unit, select a DD Boost
user, and set any quota settings you wish.
Under the Storage Unit tab, you can view information about a storage unit such as the file count, full path,
status, quota information and physical capacity measurements.
The command line can also be used to create and manage storage units:
• ddboost storage-unit create <storage-unit-name>
Create storage-unit, setting quota limits.
• ddboost storage-unit delete <storage-unit-name>
Delete storage-unit.
• ddboost storage-unit show [compression] [<storage-unit-name>]
List the storage-units and images in a storage-unit.
Consult the Data Domain Operating System Command Reference Guide for more detailed information on
using the ddboost commands to administer DD Boost.
To rename or modify a storage unit, click the pencil icon. This will open the Modify Storage Unit dialog
allowing you to change the name, the DD Boost User and the quota settings.
You can delete one or more storage units by selecting them from the list and clicking the red X icon. Any
deleted storage units can be retrieved using the Undelete Storage Unit item under the More Tasks button.
Deleted storage units can only be retrieved if file system cleaning has not taken place between the time
the storage unit was deleted and when you would like to undelete the storage unit.
You can also rename, delete and undelete storage units from the command line:
• ddboost storage-unit create <storage-unit> user <user-name>
Create a storage unit, assign tenant, and set quota and stream limits.
• ddboost storage-unit delete <storage-unit>
Delete a specified storage unit, its contents, and any DD Boost assocaitions.
• ddboost storage-unit rename <storage-unit> <new-storage-unit>
Rename a storage-unit.
• ddboost storage-unit undelete <storage-unit>
Recover a deleted storage unit.
Consult the Data Domain Operating System Command Reference Guide for more detailed information on
using the ddboost commands to administer DD Boost.
You can also set DD Boost options from the command line:
• ddboost option reset
Reset DD Boost options.
• ddboost option set distributed-segment-processing {enabled | disabled}
Enable or disable distributed-segment-processing for DD Boost.
• ddboost option set virtual-synthetics {enabled | disabled}
Enable or disable virtual-synthetics for DD Boost.
• ddboost option show
Show DD Boost options.
Consult the Data Domain Operating System Command Reference Guide for more detailed information on
using the ddboost commands to administer DD Boost.
You can also configure and manage DD Boost over Fibre Channel from the command line:
• ddboost option set fc {enabled | disabled}
Enable or disable fibre-channel for DD Boost.
• ddboost fc dfc-server-name set <server-name>
DDBoost Fibre-Channel set Server Name.
• ddboost fc dfc-server-name show
Show DDBoost Fibre-Channel Server Name.
• ddboost fc group add <group-name> initiator <initiator-spec>
• ddboost fc group add <group-name> device-set
Add initiators or DDBoost devices to a DDBoost FC group.
• ddboost fc group create <group-name>
Create a DDBoost FC group.
• ddboost fc group show list [<group-spec>] [initiator <initiator-spec>]
List configured DDBoost FC groups.
• ddboost fc status
DDBoost Fibre Channel Status.
Consult the Data Domain Operating System Command Reference Guide for more detailed information on
using the ddboost commands to administer DD Boost.
After you configure a Data Domain system for the DD Boost environment, you can configure NetWorker
resources for devices, media pools, volume labels, clients, and groups that will use the DD Boost devices.
Keep the following NetWorker considerations in mind:
• Each DD Boost device appears as a folder on the Data Domain system. A unique NetWorker
volume label identifies each device and associates the device with a pool.
• NetWorker uses the pools to direct the backups or clones of backups to specific local or remote
devices.
• NetWorker uses Data Protection policy resources to specify the backup and cloning schedules
for member clients. Dell EMC recommends that you create policies that are dedicated solely to
DD Boost backups.
Dell EMC recommends that you use the Device Configuration Wizard, which is part of the NetWorker
Administration GUI, to create and modify DD Boost devices. The wizard can also create and modify
volume labels and the storage pools for DD Boost devices.
After the wizard creates a DD Boost device, you can modify the device configuration by editing the device
resource that is created by the wizard.
Avamar clients use a multi-stream approach to send specific data types that are better suited to high-
speed inline deduplication to Data Domain systems. All other data types are still sent to the Avamar Data
Store. This enables users to deploy the optimal approach to deduplication for different data types and
manage the entire infrastructure from a single interface.
Once vRanger is installed, add the DD Boost instance to vRanger as a repository. Any backup written to
this repository will be deduplicated according to the Data Domain configuration.
DD Boost for NetBackup has two components. The DD Boost Library is embedded in the OpenStorage
plug-in that runs on the NetBackup Media servers. The DD Boost server is built into DD OS and runs on
the Data Domain system.
Veritas Backup Exec: The combination of a Data Domain system and DD Boost for Symantec Backup
Exec creates an optimized connection to provide a tightly integrated solution. DD Boost for Symantec
Backup Exec offers operational simplicity by enabling the media server to manage the connection
between the backup application and one or more Data Domain systems.
With Symantec Backup Exec, the OST plug-in software must be installed on media servers that need to
access the Data Domain system. Backup Exec is not supported with DD Boost over Fibre Channel.
Implementing DD Boost for RMAN requires installing the DD Boost plug-in on the Oracle server, and then
the DD Boost plug-in interfaces between the Oracle Media Management Layer (MML) API (Also known as
the Simple Backup to Tape API) and DD Boost. The Oracle MML API allows backup applications to
interface with Oracle RMAN.
• Data Sanitization
Unlike backup data, which is a secondary copy of data for shorter-term recovery purposes,
archive data is a primary copy of data and is often retained for several years. In many
environments, corporate governance and/or compliance regulatory standards can mandate
that some or all of this data be retained “as-is.” In other words, the integrity of the archive
data must be maintained for specific time periods before it can be deleted.
The Data Domain (DD) Retention Lock feature provides unchangeable file locking and
secure data retention capabilities to meet both governance and compliance standards.
Therefore, DD Retention Lock ensures that archive data is retained for the length of the
policy with data integrity and security.
This lesson presents an overview of Data Domain Retention Lock, its configuration and use.
After the retention period expires, files can be deleted, but cannot be modified. Files that
are written to a Data Domain system, but not committed to be retained, can be modified or
deleted at any time.
The storage system has to securely retain archive data per corporate governance standards
and must meet the following requirements:
• Allow archive files to be committed for a specific period of time during which the
contents of the secured file cannot be deleted or modified.
• Allow for deletion of the retained data after the retention period expires.
• Allow for ease of integration with existing archiving application infrastructure through
CIFS and NFS.
• Provide flexible policies such as allow extending the retention period of a secured file,
revert of locked state of the archived file, etc.
• Ability to replicate both the retained archive files and retention period attribute to a
destination site to meet the disaster recovery (DR) needs for archived data.
This security privilege is in addition to the user and admin privileges. A user assigned the
security privilege is called a security officer. The security officer can run a command via the
CLI called the runtime authorization policy.
Updating or extending retention periods, and renaming MTrees, requires the use of the
runtime authorization policy. When enabled, runtime authorization policy is invoked on the
system for the length of time the security officer is logged in to the current session.
Runtime authorization policy, when enabled, authorizes the security officer to provide
credentials, as part of a dual authorization with the admin role, to set-up and modify both
retention lock compliance features, and data encryption features as you will learn later in
this module.
Note: The security officer is the only user that is permitted to change the security officer
password. Contact support if the password is lost or forgotten.
The retention period attribute used by the archiving application is the last access time - the
atime. DD Retention Lock allows granular management of retention periods on a file-by-file
basis. As part of the configuration and administrative setup process of the DD Retention
Lock, a minimum and maximum time-based retention period for each MTree is established.
This ensures that the atime retention expiration date for an archive file is not set below the
minimum, or above the maximum, retention period.
The archiving application must set the atime value, and DD Retention Lock must enforce it,
to avoid any modification or deletion of files under retention of the file on the Data Domain
system. For example, Symantec Enterprise Vault retains records for a user-specified
amount of time. When Enterprise Vault retention is in effect, these documents cannot be
modified or deleted on the Data Domain system. When that time expires, Enterprise Vault
can be set to automatically dispose of those records.
Locked files cannot be modified on the Data Domain system even after the retention period
for the file expires. Files can be copied to another system and then be modified. Archive
data retained on the Data Domain system after the retention period expires is not deleted
automatically. An archiving application must delete the remaining files, or they must be
removed manually.
DD Retention Lock Compliance, when enabled on an MTree, ensures that all files locked by
an archiving application, for a time-based retention period, cannot be deleted or overwritten
until the retention period expires. This is archived using multiple hardening procedures by
requiring dual sign-on for certain administrative actions. Before engaging DD Retention
Lock Compliance edition, the System Administrator must create a Security Officer role. The
System Administrator can create the first Security Officer, but only the Security Officer can
create other Security Officers on the system.
With the data sanitization function, deleted files are overwritten using a DoD/NIST-
compliant algorithm and procedures. No complex setup or system process disruption is
required. Existing data is available during the sanitization process, with limited disruption to
daily operations. Sanitization is the electronic equivalent of data shredding. Normal file
deletion provides residual data that allows recovery. Sanitization removes any trace of
deleted files with no residual remains.
During sanitization, the system runs through five phases: merge, analysis, enumeration,
copy, and zero.
• Merge: Performs an index merge to flush all index data to disk.
• Analysis: Reviews all data to be sanitized. This includes all stored data.
• Enumeration: Reviews all of the files in the logical space and remembers what data
is active.
• Copy: Copies live data forward and frees the space it used to occupy.
• Zero: Writes zeroes to the disks in the system.
You can view the progress of these five phases by running the system sanitize watch
command.
You also learn about the purpose of other security features, such as file system locking, and
when and how to use this feature.
Furthermore, you can use all of the currently supported backup applications described in
the Backup Application Matrix on the Support Portal with the Encryption of Data at Rest
feature.
A single internal Data Domain encryption key is available on all Data Domain systems.
The first time Encryption of Data at Rest is enabled, the Data Domain system randomly
generates an internal system encryption key. After the key is generated, the system
encryption key cannot be changed and is not accessible to a user.
The encryption key is further protected by a passphrase, which is used to encrypt the
encryption key before it is stored in multiple locations on disk. The passphrase is user-
generated and requires both an administrator and a security officer to change it:
• The RSA DPM Key Manager enables the use of multiple, rotating keys on a Data
Domain system.
• The RSA DPM Key Manager consists of a centralized RSA DPM Key Manager Server and
the embedded DPM client on each Data Domain system.
Note: As this course was being written we received notice that RSA DPM will be
replaced in the near future by RSA SecurID. No details were available. Check with Dell
EMC support for details when it does become available.
For example, to set encryption, the admin enables the feature, and the security officer
enables runtime authorization.
A user in the administrator role interacts with the security officer to perform a command
that requires security officer sign off.
In a typical scenario, the admin issues the command, and the system displays a message
that security officer authorizations must be enabled. To proceed with the sign-off, the
security officer must enter his or her credentials on the same console at which the
command option was run. If the system recognizes the credentials, the procedure is
authorized. If not, a Security alert is generated. The authorization log records the details of
each transaction.
The status indicates Enabled, Disabled, or Not configured. In the slide, the encryption
status is “Not configured.”
To configure encryption:
1. Click Configure.
Caution: Unless you can reenter the correct passphrase, you cannot unlock the file system
and access the data. The data will be irretrievably lost.
2. Enter a passphrase and then click Next.
3. Choose the encryption algorithm and then click Next:
– Configurable 128-bit or 256-bit Advanced Encryption Standard (AES) algorithm
with either:
Confidentiality with Cipher Block Chaining (CBC) mode.
Both confidentiality and message authenticity with Galois/Counter (GCM)
mode.
In this configuration window, you can optionally apply encryption to data
that existed on the system before encryption was enabled.
4. Select whether you will obtain the encryption key from the Data Domain system or
an external RSA Key Manager. Click Finish. Note that the system needs to be
restarted for the new configuration to start.
In the above example, sysadmin is logged in. Notice we are asked for the security
Username and Password.
Caution: Be sure to take care of the passphrase. If the passphrase is lost, you will
never be able to unlock the file system and access the data. There is no backdoor
access to the file system. The data is irretrievably lost.
• Click Disable on state line of the File System section
• Click Lock File System on the status line of the File System Lock section.
• Enter the security officer credentials.
• Enter the current and new passphrase to re-encrypt the encryption keys.
• Shut down the system using the system poweroff command from the command line
interface (CLI).
Caution: Do not use the chassis power switch to power off the system. There
is no other method for shutting down the system to invoke file system
locking.
With SMT, a Data Domain system is able to logically isolate data for up to 32 tenants, which
will restrict each tenant’s visibility and read/write access to only their data (contained in
their MTrees). In addition, secure multi-tenancy enables management and monitoring by
tenant to enable chargeback, trending, and other reporting.
This diagram shows a simplified architecture of an individual tenant unit residing on a single
Data Domain system (here named, DD System 1). Starting with DD OS 5.5, a tenant unit is
created using the command line interface.
Note that NFS and CIFS MTrees, VTL pools, and DD Boost storage units are each logically
isolated by MTree within a single tenant unit and are securely accessed by tenant client
applications using protocol-specific security.
Secure multi-tenancy for Data Domain systems is a feature that enables secure isolation of
many users and workloads on a shared system. As a result, the activities of one tenant are
not visible or apparent to other tenants. This capability improves cost efficiencies through a
shared infrastructure while providing each tenant with the same visibility, isolation, and
control that they would have with their own stand-alone Data Domain system.
A tenant may be one or more business units or departments hosted onsite for an
enterprise or large enterprise (LE). A common example would be Finance and Human
Resources sharing the same Data Domain system. Each department is unaware of the
presence of the other on the system. A tenant might also be one or more external
applications that are hosted remotely by a service provider (SP) on behalf of a client.
Multiple roles with different privilege levels combine to provide the administrative isolation
on a multitenant Data Domain system. The Tenant Admin and Tenant User are restricted
only to certain tenant-units on a Data Domain system and allowed to execute a subset of
the commands that a Data Domain system administrator is allowed.
The landlord is the storage admin or the Data Domain Administrator. The landlord is
responsible for managing the Data Domain system. The landlord sets up the file systems,
tenant units, tenant roles, storage, networking, replication, and protocols. They are also
responsible for monitoring overall system health and replace any failed hardware as
necessary.
Storage-units are MTrees configured for use with the DD Boost protocol. Data isolation
is achieved by creating a storage-unit and assigning the storage-unit to a DD Boost user.
The DD Boost protocol only permits access to storage-units assigned to DD Boost users
connected to the Data Domain system.
MTrees reside on logical partitions of the file system and offer the highest degree of
management granularity, meaning users can perform operations on a specific MTree
without affecting the entire file system. MTrees are assigned to tenant-units and contain a
tenant-unit's individualized settings for managing and monitoring SMT. A tenant-unit may
comprise one or more MTrees. Tenant units can also span multiple Data Domain systems.
A tenant-unit is a logical partition in a Data Domain system isolating one tenant’s data from another.
Tenant Admins may only administer the tenant units that belong to them providing administrative
isolation.
The DD Boost protocol allows creation of multiple DD Boost users on a Data Domain system. With
that, each tenant is assigned one or more DD Boost user credentials that can be assigned access
privileges to one ore more MTrees in a tenant unit defined for a particular tenant. This allows secure
access to different tenant datasets using their separate DD Boost credentials by restricting access and
visibility.
Similarly, for other protocols such as CIFS, NFS and VTL the native protocol level access control
mechanisms can be used to provide data path isolation.
Mutual isolation is a security feature that ensures local users, management groups, and remote IPs
associated with one tenant in an SMT environment are not associated with another tenant. When
configuring tenants, users, tenant units, or protocol that transfers data such as replication and DD
Boost, mutual isolation ensures data and administrative isolation across tenants.
Through metering and reporting a provider (Landlord) has information to ensure they are running
a sustainable business model. The need of such reporting in a multi-tenant environment is even
greater when the provider to tracks usage on a shared asset such as a Data Domain system.
With secure multi-tenancy, the Landlord has the capability to track and monitor usage of various
system resources. Similarly, the Tenant User can access metrics via tenant self-service. The tenant-
view of the metrics is restricted to resources that are assigned to a particular Tenant User.
Different metrics can be extracted from the Data Domain system using SNMP. The SNMP MIB provides
relationships of the different metrics to the tenant unit thereby allowing grouping the metrics on a per
tenant basis.
Tenant administrators can perform self-service fast copy operations within their tenant
units for data restores as needed. Tenant administrators are able to monitor data capacity
and associated alerts for capacity and stream use.
The landlord, responsible for the Data Domain system monitors and manages all tenants in
the system and has visibility across the entire system. They set capacity and stream quotas
on the system for the different tenant units, and report on tenant unit data.
When no security mode is selected, the system provides a default security mode. Default
security mode allows replication as long as the source and destination do not belong to
different tenants.
Shown here, the source MTree belongs to Tenant A, therefore replication can only occur on
the destination system with Tenant A. Failure occurs when the specified destination is
Tenant B.
Shown here, Tenant A is present on both the source and destination Data Domain. When
Tenant A names Tenant-Unit-A1.1 as the source and Tenat-Unit-A2.1 as the destination,
the replication protocol checks to make sure both tenant units belong to the same tenant.
Upon confirmation the replication proceeds.
Tenant A proceeds to set up a new replication pair naming Tenant-Unit-A1.2 as the source
and Tenant-Unit-B2.2 as the destination. The protocol checks the ownership of both source
and destination. Upon confirming that each tenant unit belongs to a different tenant, the
replication fails.
Capacity quotas are also set on the replication destination to make sure individual tenants
do not consume storage beyond a set limit on the Data Domain system they are sharing.
Even before replication to a destination begins, the capacity quota is set through the
command line for any future replication MTrees. This prevents any single tenant from
consuming all available space on a system and creating a full storage condition that
prevents other tenants from adding data to their own spaces.
Capacity quota and replication stream limits are set by the service provider owning the
destination.
Shown here, Tenant A has multiple tenant units on the Data Domain system and uses Client
A to log in and manage those units. An unauthorized user wants to access and manage
tenant-units belonging to Tenant A using a different client, Client B.
Normally, the unauthorized client could do so by simply providing the username and
password used by Tenant A. By assigning a local IP to Tenant A, their tenant-units can then
only be accessed by the client using the configured local IP. Without a local IP associated
with Client B, the unauthorized user cannot access the Data Domain system.
By configuring a set of remote IP, the tenants can only be accessed from a client connecting
from a defined set of configured remote IPs. An authorized user with a username and
password without a remote IP assigned to their client will not gain access to the system.
This form of network isolation creates an association between the management IP and a
tenant unit. It provides a layout of network isolation using access validation. Setting local
and remote IPs is only for self-service sessions.
• SMT requires the system run DD OS 5.5 or higher. To access all SMT features, run the
most current version of DD OS.
• SMT provides secure logical isolation – not physical isolation. Tenant data on a system
securely co-mingles with other tenant data and shares deduplication benefits of all data
on the system.
• Retention Lock Compliance Editions still function on systems configured with secure
Multi-tenancy, but not at the tenant level. If enabled, function and management of
MTrees are severely impaired. For tenant-level Compliance Lock deployment, it is
recommended the tenant use separate Data Domain systems.
• SMT does not currently allow management of system-wide parameters at the tenant-unit
level. For instance, depending on the model, a Data Domain system running DD OS 6.0
is limited to the current level of 32 to 256 maximum, concurrently active MTrees. If
multiple tenants choose to run operations simultaneously on the same DD system, the
threshold of multiple active MTrees could be crossed. The same should be considered
with multiple clients employing a number of NFS connections. A current maximum of 900
simultaneous NFS connections is allowed system-wide, tenants could possibly run into
this limit when sharing the number of allowed NFS connections in a multi-tenant
environment.
There are two sections: one provides a listing of all the tenants and tenant units in the data
center and the other provides a detailed overview of either the selected tenants or tenant
units. In this slide All Tenants is selected and the detailed overview displays the number of
tenants, tenants units and host systems that are configured in this Data Domain
Management Center (DDMC).
When All Tenants is selected, new tenants can be created by clicking the green Plus sign.
A Create Tenant window appears. You complete creation by entering a tenant name and
the administrator’s email address.
When a single tenant is selected, new tenant units can be created by clicking the green Plus
sign.
Complete a few pages with specific information to use for a customized tenant unit
including the host system size, a tenant unit name, security mode, the use of a new or
existing MTree or Storage Unit.
Shown here are two pages, one represents the page requesting the administrator to choose
how storage is provisioned for the Tenant Unit, and the second indicates the current
storage capacity, the future size, and how long it takes the system to grow.
The following slides do not represent the process step-by-step, but do show some of the
key information about the Tenant Unit.
First, the tenant unit hostname. Just as you are able to set a hostname on an entire Data
Domain system, you can configure a unique hostname for each tenant unit. In order for the
hostname to resolve to the specific tenant unit, you must assign it to an IP address within
that tenant unit.
Second, local data access IP addresses to a tenant unit for isolated data access
purposes. They act as server or local data access IP addresses. Local data access IPs must
be unique IP addresses; you may not assign the same IP to more than a single tenant unit.
Third, remote data access IP addresses are the client IP address or subnets that are
assigned a tenant unit for data access. Unlike the local data access IP, a remote data access
IP may be shared within the same tenant.
The last network attribute is the default gateway that can be configured for tenant units
belonging to the same tenant.
As long as the hostname resolves to correct IP address in DNS or in the client local
/etc/hosts file, here it is IP1. Clients can connect to the tenant unit using only its hostname.
Based on the originating IP that the data operation request comes from, SMT data access
isolation restricts the operation access to only a set of storage objects.
In the diagram, local IP1 is allowed access to storage objects in tenant unit 1 (tu1) but not
allowed access to storage objects in tenant unit 2.
Similarly, local IP2 and 3 are allowed access to objects in tu2 but is denied when attempting
to access objects in tu1.
To achieve this level of isolation, you have to add a local data access IP address to the
tenant unit. The SMT data access isolation check performs checks in the I/O path to ensure
access to the selected objects is allowed by the local IP being used.
Leased IP addresses such as DHCP addresses may not be used as local data access IPs.
Leased IP addresses would be difficult to track and enforce when trying to isolate data
access by IP.
Lastly, tenant unit local data IPs cannot be used to access non tenant data.
The way to make SMT tenant units work with DIGs is to configure all of the IP addresses in
the DIG as local data access IP addresses within the tenant unit.
When properly configured, the SMT tenant unit can take full advantage of any link the DIG
provides.
On the left side of this diagram, there is a configuration in the DNS or /etc/hosts where
tu1hn resolves to IP1 and tu1hn-failover resolves to IP2. This configuration is specifically for
DD Boost ifgroup configuration. In order for ifgroups to work properly, two hostnames are
required within DD Boost.
Along with the SMT data access isolation protocol, a net filter or IP table restricts access by
blocking packets based on remote and local IP setup. In this diagram, client 1 is allowed
access to certain storage objects only through local IP1, If client 1 attempts to access
storage objects through local IP2, the net filter denies access using IP filtering and .
This further strengthens data isolation throughout SMT on a Data Domain system.
On the right is a Data Domain system with two tenant units configured. Each tenant unit
has its own assigned local data IP. IP1 for tu1 and IP2 for tu2.
Each tenant unit has a firewall rule set to only allow traffic from certain client IP addresses.
If a tenant client requests access to data over an unassigned local data IP, based on the
rules, the firewall will disallow access.
Only targeted default gateways are configured with SMT. While there are other gateway
types configurable within a Data Domain system such as static, added, or DHCP, only
targeted gateways are supported with SMT.
You may not share the same default gateway among different tenants. These gateways
are intended to be unique per tenant.
Unique default gateways assigned to a tenant may not be used by non-SMT entities within
the Data Domain system.
Quota management is performed for an MTree or Storage Unit during the creation or
modification of a Tenant Unit. Once quotas are configured, any objects within the Tenant
Unit are bound to their set capacity quotas.
Tenant User: has the privileges to monitor specific tenant units for important parameters
such as space usage, streams performance, alerts, and status of replication context and
snapshots.
Tenant Admin: gets all the privileges of a tenant user and can modify the recipient list of
alerts and also perform Data Domain fastcopy operations.
These commands are available for both administrative users and self-service users. Self-
service users can only see the clients specific to their tenant-units.
Placing tapes in the DD VTL allows them to be written to, and read by, the backup
application on the host system. DD VTL tapes are created in a DD VTL pool, which is an
MTree. Because DD VTL pools are MTrees, the pools can be assigned to Tenant Units. This
association enables SMT monitoring and reporting.
A storage-unit is an MTree configured for the DD Boost protocol. A user can be associated
with, or “own,” more than one storage-unit. Storage-units that are owned by one user
cannot be owned by another. The number of DD Boost usernames cannot exceed the
maximum number of MTrees (current maximum is 100).
Each backup application must authenticate using its DD Boost username and password.
After authentication, DD Boost verifies the authenticated credentials to confirm ownership
of the storage unit. The backup application is granted access to the storage- unit only if the
user credentials presented by the backup application match the usernames associated with
the storage-unit. If user credentials and usernames do not match, the job fails with a
permission error.
The procedure for creating a storage-unit is initially performed by the admin as prompted
by the configuration wizard. Instructions for creating a storage-unit manually are included
later in this chapter.
• Capacity (2): shows capacity overview details with a variable meter that shows the
quota (available, used and used percentage).
• Replication (3): shows replication overview details that include the total number of
bytes replicated for Automatic Replication Pairs and On-Demand Replication Pairs.
• Network Bytes Used (4): shows network overview details that include the last 24
hours of back-up, restored data and total inbound and outbound replication.
• System Charts (5): shows the system charts for the DD system of a selected Tenant
Units associated with this Tenant.
For details explanation of what is included within each tab, please refer to the current Data
Domain Management Center User Guide.
#mtree list – List the Mtrees on a Data Domain system (when used by a landlord) or
within a tenant-unit (when used by a tenant-admin).
#mtree show performance – Collect performance statistics for MTrees associated with a
tenant-unit.
#mtree show compression – Collect compression statistics for MTrees associated with a
tenant-unit.
#quota capacity show – List capacity quotas for MTrees and storage-units.
Output may be filtered to display usage in intervals ranging from minutes months. The
results can be used by the landlord as a chargeback metric.
Quotas may be adjusted or modified by the Landlord after the initial configuration using the
#ddboost storage-unit modify command.
This feature extends into secure multi-tenancy environments for Tenant Admins and Tenant
Users.
Using the command line, Tenant Admins can create or destroy a pathset, add or delete
paths in a pathset, and modify a pathset.
Tenant Admins can start or stop a physical capacity measurement job, create, destroy and
modify a physical capacity measure schedule and enable or disable a physical capacity
measurement schedule.
Tenant Users may only view physical capacity measurement activities belonging to their
tenant units.
For more information about using PCM form the command line, see the EMC Data Domain
Operating System Command Reference Guide.
SMT Tenant alerts are specific to each tenant-unit and differ from Data Domain system
alerts. When tenant self-service is enabled, the Tenant-Admin can choose to receive alerts
about the various system objects he or she is associated with and any critical events, such
as an unexpected system shutdown. A Tenant-Admin may only view or modify notification
lists to which he or she is associated.
The Status template includes daily status for the tenant or tenant unit as it pertains to
capacity, replication, network bytes used.
The Usage Metrics template includes metrics for the tenant and tenant unit as it pertains
to logical and physical capacity and network bytes used.
Once the scope is defined, you can proceed to complete the template creation.
The DD system uses the Add Reports Template to generate and send reports to the
appropriate personnel. Ultimately, these reports can be used for chargebacks to the various
tenants of the system.
The main goal in capacity planning is to design your system with a Data Domain model and
configuration that is able to store the required data for the required retention periods with
sufficient space remaining.
When planning for throughput requirements, the goal is to ensure the link bandwidth is
sufficient to perform daily and weekly backups to the Data Domain system within the
allotted backup window. Effective throughput planning takes into consideration network
bandwidth sharing, and adequate backup and system housekeeping timeframes (windows).
EMC Sales uses detailed software tools and formulas when working with its customers to
identify backup environment capacity and throughput needs. Such tools help systems
architects recommend systems with appropriate capacities and correct throughput to meet
those needs. This lesson discusses the most basic considerations for capacity and
throughput planning.
Data Domain system internal indexes and other product components use additional,
variable amounts of storage, depending on the type of data and the sizes of files. If you
send different data sets to otherwise identical systems, one system may, over time, have
room for more or less actual backup data than another.
Data reduction factors depend on the type of data being backed up. Some types of
challenging (deduplication-unfriendly) data types include:
• pre-compressed (multimedia, .mp3, .zip, and .jpg)
• pre-encrypted data
Retention policies greatly determine the amount of deduplication that can be realized on a
Data Domain system. The longer data is retained, the greater the data reduction that can
be realized. A backup schedule where retained data is repeatedly replaced with new data
results in very little data reduction.
A daily full backup retained only for one week on a Data Domain system may result in a
compression factor of only 5x, while retaining weekly backups plus daily incrementals for up
to 90 days may result in 20x or higher reduction.
Data reduction rates depend on a number of variables including data types, the amount of
similar data, and the length of storage. It is difficult to determine exactly what rates to
expect from any given system. The highest rates are usually achieved when many full
backups are stored.
When calculating capacity planning, use average rates as a starting point for your
calculations and refine them after real data is available.
For example, 1 TB of data is backed up, and a conservative reduction rate is estimated at
5x (which may have come from a test or is a reasonable assumption to start with). This
gives 200 GB needed for the initial backup. With a 10 percent change rate in the data each
day, incremental backups are 100 GB each, and with an estimated compression on these of
10x, the amount of space required for each incremental backup is 10 GB.
As subsequent full backups run, it is likely that the backup yields a higher data reduction
rate. 25x is estimated for the data reduction rate on subsequent full backups. 1 TB of data
compresses to 40 GB.
Four daily incremental backups require 10 GB each, and one weekly backup needing 40 GB
yields a burn rate of 80 GB per week. Running the 80 GB weekly burn rate out over the full
8-week retention period means that an estimated 640 GB is needed to store the daily
incremental backups and the weekly full backups.
Adding this to the initial full backup gives a total of 840 GB needed. On a Data Domain
system with 1 TB of usable capacity, this means the unit operates at about 84% of
capacity. This may be okay for current needs. You might want to consider a system with a
larger capacity or that can have additional storage added, which might be a better choice to
allow for data growth.
Again, these calculations are for estimation purposes only. Before determining true
capacity, use the analysis of real data gathered from your system as a part of an EMC BRS
sizing evaluation.
An assumption would be that the greatest backup need is to process a full 200 GB backup
within a 10-hour backup window. Incremental backups should require much less time to
complete, and we could safely presume that incremental backups would easily complete
within the backup window.
It is important to note the effective throughput of both the Data Domain system and the
network on which it runs. Both points in data transfer determine whether the required
speeds are reliably feasible. Feasibility can be assessed by running network testing software
such as iperf.
The maximum capacity for each Data Domain model assumes the maximum number of
drives (either internal or external) supported for that model.
Maximum throughput for each Data Domain model is dependent mostly on the number and
speed capability of the network interfaces being used to transfer data. Some Data Domain
systems have more and faster processors so they can process incoming data faster.
Advertised capacity and throughput ratings for Data Domain products are best case results,
based on tests conducted in laboratory conditions. Your throughput will vary depending on
your network conditions.
The number of network streams you may expect to use depends on your hardware model.
Refer to the specific model Data Domain system guide to learn specific maximum supported
stream counts.
If the capacity or throughput percentage for a particular model does not provide at least a
20% buffer, then calculate the capacity and throughput percentages for a Data Domain
model of the next higher capacity. For example, if the capacity calculation for a DD620
yields a capacity percentage of 91%, only a 9% buffer is available, so you should look at
the DD640 next to calculate its capacity.
Sometimes one model provides adequate capacity, but does not provide enough
throughput, or vice versa. The model selection must accommodate both throughput and
capacity requirements with an appropriate buffer.
Model B has a capacity of 428 TB. The capacity percentage estimated for Model B is 58%,
and the 42% buffer is more than adequate.
It appears by the capacity specifications that Model A does not meet this need with only
285 TB capacity. It leaves only a 12% buffer.
Model A with an additional shelf, offers 570 TB capacity. A 66 % buffer is clearly a better
option.
Select a model that meets throughput requirements with no more than 80% of the model’s
maximum throughput capacity.
In this example, the throughput requirement of 9 TB per hour would load Model A to close
to 85% of capacity, with a buffer of 15%.
A better selection is a model with higher throughput capability, such as Model B, rated with
12.6 TB per hour throughput and offering a 29% buffer in estimated throughput.
While Model A meets the storage capacity requirement, Model B is the best choice based
upon the need for greater throughput.
Another option is to consider implementing DD Boost with Model A to raise the throughput
rating.
• Clients • Configuration
• Configuration • Connectivity
• Connectivity
• Data Domain System
• Network
• Connectivity
• Wire speeds
• Configuration
• Switches and routers
• Log level set too high
• Routing protocols and firewalls
As demand shifts among system resources – such as the backup host, client, network, and
Data Domain system itself – the source of the bottlenecks can shift as well.
From the command line, use the command system show performance.
For example:
• system show performance 24 hr 10 min
This shows the system performance for the last 24 hours at 10 minute intervals. 1
minute is the minimum interval.
Servicing a file system request consists of three steps: receiving the request over the
network, processing the request, and sending a reply to the request.
If the CPU utilization shows 80% or greater, or if the disk utilization is 60% or greater for
an extended period of time, the Data Domain system is likely to run out of disk capacity or
is the CPU processing maximum. Check that there is no cleaning or disk reconstruction in
progress. You can check cleaning and disk reconstruction in the State section of the system
show performance report.
The following is a list of states and their meaning indicated in the system show performance
output:
• C – Cleaning
• D – Disk reconstruction
• B – GDA (also known as multinode cluster [MNC] balancing)
• V – Verification (used in the deduplication process)
• M – Fingerprint merge (used in the deduplication process)
• F – Archive data movement (active to archive)
• S – Summary vector checkpoint (used in the deduplication process)
• I – Data integrity
Typically the processes listed in the State section of the system show performance report
impact the amount of CPU utilization for handling backup and replication activity.
If slow performance is happening in real-time, you can also run the following command:
• system show stats interval [interval in seconds]
Example:
• system show stats interval 2
Adding 2 produces a new line of data every two seconds.
The system show stats command reports CPU activity and disk read/write amounts.
In the example report shown, you can see a high and steady amount of data inbound on
the network interface, which indicates that the backup host is writing data to the Data
Domain device. We know it is backup traffic and not replication traffic as the Repl column is
reporting no activity.
Low disk-write rates relative to steady inbound network activity are likely because much of
the incoming data segments are duplicates of segments already stored on disk. The Data
Domain system is identifying the duplicates in real time as they arrive and writing only
those new segments it detects.