Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Data Deduplication Quantum Stornext: Real Tes NG Real Data Real Results

Download as pdf or txt
Download as pdf or txt
You are on page 1of 49

Data Deduplication

Quantum StorNext

real tes ng real data real results


11/05/09
Table of Contents 
 

1 Introduction .............................................................................................................................................. 1
2. Background............................................................................................................................................. 1
2.1 Offered Capabilities......................................................................................................................... 2
3. Key Evaluation Criteria ......................................................................................................................... 7
4. Evaluation Setup.................................................................................................................................... 7
5. Evaluation Results ................................................................................................................................. 9
5.1 Evaluation Findings ......................................................................................................................... 9
5.1.1 Software Installation and Setup ............................................................................................. 9
5.1.2 Management ........................................................................................................................... 32
5.1.3 Testing Results ....................................................................................................................... 41
5.1.4 Functional Testing.................................................................................................................. 41
6. Summary ............................................................................................................................................... 46
High Performance Computing Modernization Program
Air Force Research Lab
DoD SuperComputing Resource Center
Data Deduplication
Quantum

1. Introduction

The project will include an in-depth investigation of various data deduplication (de-dupe) technologies that
will identify the following: capabilities, user and center impacts, security issues and inter-operability issues
within a single location.

Quantum provides a software solution for intelligent data management called StorNext. StorNext is a self-
archiving, self-protecting shared file system for heterogeneous high-speed data sharing. It has a built-in
policy engine for automated file archival with the feature of data deduplication. Since the Data Intensive
Computing Environment (DICE) team has been tasked with reviewing the specific requirements regarding
data deduplication functionality, this report will not focus on other features offered by the StorNext
product.

The StorNext product contains both a command line interface (CLI) and a browser-based graphical user
interface (GUI) for configuration and management. Most of the configuration and management is
accomplished using the GUI with a subset of capability in the CLI. The product is designed to work with a
storage area network (SAN or LAN clients) architecture and rely on the underlying server/storage
hardware for data integrity. It also offers some recoverability features in the event of failures.

For the purposes of this investigation, we will use the latest version of StorNext 3.5.1 software as an
online storage solution similar to what is used for home directories of the HPC users at the AFRL DSRC.

2. Background

The Department of Defense (DoD) High Performance Computing Modernization Program (HPCMP) has
formed a Storage Initiative (SI) team to investigate the program’s current storage architectures across the
centers. Today, HPCMP is responsible for five major centers and two disaster recovery sites. One of the
key areas of concern to the SI team is the storing, managing and organizing of user data.

A data deduplication solution could provide a cost savings in storage needs for user data files. The
HPCMP SI team partnered with the DICE Program Management Team to conduct a technical evaluation
of the Quantum product within an HPC environment to gain a better understanding of the functionality and
integration requirements.

As a leading global specialist in backup, recovery and archive for more than 27 years, Quantum focuses
on helping IT departments address data protection and data retention challenges by incorporating
innovative solution sets with world-class service and support. Quantum also sells a series of hardware
appliances for Network Attached Storage and Virtual Tape Library that have data deduplication capability
called the DXi series.

1
2.1 Offered Capabilities

Table 2.1 below describes the capabilities of the Quantum StorNext product. The data is generally
available on Quantum‘s website (www.quantum.com), through documentation or questions directed to
Quantum’s personnel.

Table 2.1 Capabilities Summary

Quantum StorNext

General
Name & version of data deduplication StorNext version 3.5.1
software
General architecture Software solution that runs on multiple server
platforms and storage devices, preserving user
choice.

Can function in a heterogeneous Yes, A broad range of server platforms and storage
environment? for greater collaboration and fewer delays

Connectivity & Security


File systems supported StorNext utilizes its own proprietary file system
called StorNext File System (SNFS); StorNext
supports exporting SNFS as CIFS and NFS
OS supported Windows, SuSE Linux, Red Hat Linux, Mac OS X
(via Apple’s Xsan), Solaris, Irix, HP-UX, and / or
AIX
Relational databases supported Current StorNext customers utilize Oracle, SQL
and other databases in the StorNext File System.
Web browsers supported • Internet Explorer 5.5, 6 and 7
• Netscape 7.x
• Mozilla 1.0 and later
• FireFox 1.5 and later or 2.0 and later”

Archives supported A large variety of tape libraries are supported and


include these type of tape libraries: SCSI or Fibre
Channel-attached, Network-attached (ACSLS or
DAS) or Vault. A complete list can be found in the
StorNext release notes online at:
http://downloads.quantum.com/SNMS/3.5.1/6-
00431-24_SN351_ReleaseNotes.pdf
However, StorNext is in itself an archival product
and does not support third party archival solutions
Email support (syslogger, SNMP, email) Email notifications for Reliability, Availability, and
Serviceability (RAS) events
Future support for IPv6 StorNext 3.5.1 currently supports IPv6.

Service and ports used There are several ports that StorNext utilizes. The
StorNext GUI uses Apache on port 81. The
StorNext file system uses configurable TCP/UDP

2
port ranges as defined in the /usr/cvfs/config/fsports
file.
Can the system detect intrusion No. StorNext is a File System with standard POSIX
(unauthorized access to file) and react and Windows permissions.
either automatically (log entry) or report
to the site system administrator?
Performance
Available benchmark data Enterprise Strategy Group Lab Validation Report
published on 05/11/2007:
http://salestools.quantum.com/dsp_doclistQDC.cfm
?c=20&sid=161&ssid=120&tid=35&lid=1
Throughput #’s for read/write Throughput varies with the solution configuration.
transactions In this evaluation the data was written to a stripped
LUN on the SAN that only has a 1GB/Sec interface
that resulted in about 95MB/Sec in read/write
performance. Using the Distributed Lan Client
functionality resulted in about 60 MB/Sec for
read/write performance. As a comparison NFS
was also tested which resulted in an 18-24 MB/Sec
in read/write performance.
Determine maximum file size - can Maximum file size can be up to the size of the file
handle file sizes of 25 TB system; the file system size can be greater than
1PB.
Scalability
Optimizing scalability Yes, scales with underlying hardware implemented
with the ability to access hundreds of millions of
shared files and petabytes of storage.
Reliability and Availability
Single points of failure StorNext supports dual node High Availability
Remote capabilities StorNext can be accessed by SAN clients, LAN
clients and CIFS/NFS exports (including replication
in StorNext 4.0).
Is there any mechanism for detecting Yes, StorNext uses a proactive monitoring and
data corruption? alerting to automatically monitor system health and
send email alerts, simplifying service and reducing
downtime. The Web GUI also has the capability to
not only view StorNext logs but system logs as well.
File data corruption must be reported, File corruption is corrected only if a copy of the data
corrected and completely understood if it exists in a backup tier to be restored. A RAS ticket
is the fault of the data deduplication is generated with recommended actions including
framework checking for error logs that are outside the StorNext
framework.
Metadata corruption should be reported, Metadata corruption triggers a RAS event
corrected and completely understood if it notification and instructs the admin to run metadata
is the fault of the data deduplication dump to re-create the metadata.
framework?
Does the system provide failover StorNext increases data availability through
capabilities for each component? optional Metadata Controller failover capabilities,
multipath to storage and multiple secondary tier
copies.
Capacity Planning and Performance
Analysis
What tools are available to determine Information is available about the capacities within
future capacity needs? each tier of storage.

3
Does StorNext have the capability to add Yes, the StorNext File System can be increased by
capacity on demand? adding more disk/storage.
What is the complexity of adding or Beyond typical SAN device setup, it is fairly simple
removing storage devices? to add/remove storage devices to StorNext.
What is the complexity of migrating data StorNext uses a policy-based mechanism to define
from disk to tape? data workflow and move data between storage tires
without user intervention. This includes movement
between primary, secondary (deduplication tier),
and tertiary (tape).
Are there tools to see how StorNext is Yes, tools are available to determine the amount of
performing (transactions / minute, # of data read/written to the secondary tier; however, it
users / transaction, etc.)? does not measure transactions or track the users
that generated the I/O.
Data Deduplication Management
Are ACLs from HPC system managed For NFS, the ACL’s are managed through the HPC
separately? system.
Is there a command line interface? Yes, the CLI has a subset of functionality of the
StorNext Web GUI. However, some functions like
user quota setup are only available on the CLI..
Is there a GUI provided for Yes, via an Apache based web interface
administration?
Is there a GUI provided for clients? No, there is no data deduplication management
GUI available to clients.
What types of reports are available for Capacity, deduplication percentage (in general --
administrators? the File System, Library, Drive, Media, File, Backup
and many more).
What documentation is provided for the Online GUI help, man pages and administration
installation and setup? guides. Complete documentation can be found at:
http://www.quantum.com/ServiceandSupport/Softw
areandDocumentationDownloads/SNMS/Index.asp
x#Documentation

This includes the Planning and Installation Guides,


Product Use Guide, Compatibility Guide, and
Product Update Guide.
What are the staffing requirements? Once configured by the StorNext administrator the
staffing requirements are minimal. When changes
need to be made like adding devices and/or
directory changes an administrator will be required
Does the StorNext support Quotas? Yes, quotas are based on primary file system
Hard? Soft? capacity only
Manage direct attached storage – delete, No, only utilizes SAN devices, Management of the
move, replicate or consolidate based on devices are controlled by the SAN and the SAN
a specific action? presents those devices to the server and clients.
Diagnostic and Debugging Tools
Success or error results from all Yes, available through GUI, logs and command line
operations, including those on remote functions
systems, must be available to the
administrator at the conclusion of the
operation either through the functionality
of the tool or some other work around
(logged in log files, etc.)
How are notifications sent (success or Email can be configured for System Status,
failure)? RAS Tickets, Backup information and Policy Class

4
alerts for failures
Is there any mechanism to detect failure Not available at this time
due to access permissions prior to
executing an operation?
Are troubleshooting or diagnostic tools Yes, tickets are logged for data failures and an
available? analysis tool can be run against the ticket to give a
suggested fix
Are there trend or pattern reports Not available at this time
available on storage usage?
What documentation is provided for the A StorNext trouble-shooting guide is available
hardware and software troubleshooting?
Service and Support
Support Services Available Quantum offers a suite of support options up to
7x24x365 GOLD Support. . See
http://www.quantum.com/ServiceandSupport/Servic
es/Software/Index.aspx for more information.
How many service employees are Over 400 team members with targeted expertise in
employed? backup, recovery, and archive. Coverage is
available in >100 countries around the globe
How often is software and firmware Software updates are released approximately 2
updated and who performs the times per year. Upgrades can be preformed by the
upgrades? customer or Professional Services can be
purchased to perform upgrades
Do software and firmware updates Yes, once the upgrades have been completed all
require an outage? hardware/software will need to be rebooted.
Training & Professional Services
Training Services Available StorageCare Learning is Quantum's online learning
portal. Simply log on to StorageCare Learning to
enroll and sign up for an account. Once your
account has been activated you will be able to view
all available StorageCare Learning Courses,
register for courses, view a personalized training
calendar, take online courses, view transcripts and
print certificates once you have successfully
completed a training course.
Cost
What is the typical list price for your Data The StorNext Data Deduplication option has U.S.
Deduplication solution? list price of $12,500; there is also license capacity
requirement for all data written to secondary
storage (this varies depending on the amount of
data written and is a tiered structure)
What are the ongoing costs over the life Standard support costs, client count additions and
of your solution? secondary storage capacity increases
Vendor Information
Are there any Department of Defense Yes, Starfire Optical Range at the Air Force
(DoD) references available? Research Lab, Kirtland AFB. Success story
references can be found at:
http://salestools.quantum.com/dsp_doclistQDC.cfm
?c=20&sid=161&ssid=104&tid=35&lid=1

Other DoD customers include:


National Air & Space Intelligence Center (NASIC) –
Wright-Patterson AFB

5
SSES – Wright-Patterson AFB
DoD Distributed Common Ground System (DCGS)
National Geospatial-Intelligence Agency (NGA)
National Aeronautics and Space Administration
(NASA)
National Oceanic and Atmospheric Administration
(NOAA)

Number of deployed sites? Over 4,000


Site Management
The system must be able to provide Yes, StorNext has a variety of utilities that assist
troubleshooting tools to site level with trouble shooting of issues that may arise. In
administrators (i.e., replicated site particular the GUI has a health check screen, a
unavailable) state capture function and a system status tool.
The system should provide the ability to Snapshot reports of system usage can be used to
report trends or patterns back to each of determine trends
the sites regarding storage usage for the
system administrators to manage.
Administrator Interface Tool
Does the tool provide a method to Yes, using de-duplicated storage disk or tape drive
compress files and directories? compression
Does the tool provide a method to No, but it will be available in StorNext v4.0
replicate files and directories?
Does the tool provide a method to No
encrypt files?
Does the tool report specific types of files Not available at this time
which are typically duplicated (Unix file
type – directory, link, etc.)?
Does the tool report the number of files Not at this time
and their storage location in which
deduplication is most effective?
Diagnostic tools specific to the data Yes, from the viewpoint of the complete StorNext
deduplication system should be available configuration and not just data deduplication
to the administrator - determine such function the system status tool of the GUI will log
things as cache full. tickets on each problem and allow the administrator
to view the details, review StorNext recommended
actions to correct the problem and allow the
administrator to notate any special analysis for that
may be useful at a later time.
Hardware Interfaces
The system should be able to manage Yes. However the files will be on a SAN and not
direct attached system (DAS) (HPC DAS. StorNext does not distinguish file types that
native files). are written to the file system. Each file is handled
as a binary file with no changes made to the file
contents.
Communications Interfaces
Can the system must interface with HPC Yes. The StorNext file system may be accessed by
system remotely? remote means with NFS/CIFS exports, ftp, http, etc.
Hardware/Software Requirements
Does the system function in a Yes. StorNext file system utilizes SCSI-3 devices
heterogeneous hardware environment and is not dependent on the disk type or protocol
(i.e., hardware agnostic)? It should not be (FC, iSCSI, etc) as long as the storage device is
dependent on what it can interface with. presented as a SCSI-3 compliant target Please see
supported platforms list for specifics.

6
http://www.quantum.com/ServiceandSupport/Softw
areandDocumentationDownloads/SNMS/Index.asp
x#Documentation
The system must support Linux and Unix Yes, the server was Linux based along with clients
operating system(s) – POSIX compliant? running Linux and Solaris.
Operational Requirements
Data should retain its normal structure in Yes, unless using storage disk. Within the primary
order to maintain interoperability with tier of disk storage, StorNext does not manipulate
other systems. any of the data in the file. When the file is written to
secondary targets, it is stored in a StorNext format.
Since that data is not accessed directly by the
application, this does not pose any interoperability
issues.
Deduplication should have minimal Yes, data deduplication processing is run in a post-
impact on additional storage. process manor so that the data streams being
written will not be altered until the write has been
completely stored..
Audit Trail
Does the system keep an audit trail of No, but in the next release StorNext v4.0
administrator transactions? administrator transactions are logged.

3. Key Evaluation Criteria


The following evaluation criteria are assessed:

1. The ability to administer systems, files and directories


2. The ability for the software to interface with HPC hardware (HSM, HPC, Tape libraries, etc.)
3. The security and privacy of other users – permission management
4. The ability to report storage gain/loss from eliminating redundancy
5. The flexibility of command line interface versus Graphical User Interface (GUI)

4. Evaluation Setup

On behalf of AFRL DSRC, the following unclassified data was researched and tested to perform a
deduplication:

• Scientific benchmark data from AFRL DSRC


• Scientific data from Arctic Region Super Computing Center (ARSC)
• Scientific weather data from National Oceanic and Atmospheric Administration (NOAA)

StorNext was configured with data storage on a SAN and the metadata transfer across the public TCP/IP
network. In addition, NFS protocol to the clients was tested as well.

The hardware platform consisted of the following hardware:

♦ Hewlett Packard Proliant DL1820 G5 with Dual 2.5 Ghz Intel Xeon E5420 Quad-core
processors and 16 GB of Memory with Red Hat Linux 5.3 as the operating environment.

♦ The SAN infrastructure receives 4GB/Sec, the Data Direct Networks S2A3000 storage
receives 1 GB/sec max throughput as configured. A striped LUN was used for both the
shared file system the data deduplication storage.

7
♦ Clients consisted of a Sun X4200 with Dual 2.2Ghz AMD Opteron 275 Dual-core
processors and 4 GB Memory running Solaris 10 along with a Dell 2850 Dual 3.2 Hz
Xeon Dual –core processors with 4 GB Memory running CentOS Linux.

Figure 1 below depicts the solution setup used at the Avetec site.

Figure 1

8
5. Evaluation Results

5.1 Evaluation Findings

5.1.1 Software Installation and Setup

System Setup
The DICE team installed the StorNext software on a server located at Avetec. Normally, all testing for
AFRL is conducted with the onsite DICE hardware configuration. However, since a large part of system
configuration is done via http protocol, there was an expectation that StorNext (as with most vendor
software) would not pass security testing performed at AFRL. In turn, this would prevent evaluation of the
software completely on the AFRL network. Security testing was still performed at Avetec. To read more
about security issues encountered please see “Problems Encountered/Resolution” below.

The StorNext software consists of two components: the StorNext File System and the Storage Manager.
These components are installed on a server known as the Metadata controller (MDC). The operating
environments currently supported by StorNext include Red Hat Linux Enterprise 5, SUSE Linux
Enterprise Server 10 and Sun Solaris 10. Please consult the StorNext Installation Guide on specifics
regarding support kernel. However, currently data deduplication is only supported on Red Hat and SUSE
Linux.

The hardware requirement for Storage Manager depends upon the number of file systems in the planned
configuration. These dependencies are outlined in table 1-1 along with some additional notes.

Table 1-1

File Systems RAM Disk Space

1–4* 2 GB •For application binaries, log files


and documentation: up to 30GB
5–8** 4 GB (depending on system activity)
•For support directories: 3 GB per
million files stored†
•For metadata: 25GB minimum

*Two CPUs recommended for best performance.


**Two CPUs required for best performance.
†For non-managed file systems, the requirement is 1GB per million files
stored.

9
Note: If a file system uses de-duplicated storage disks (DDisks), note
the following additional requirements:

•Requires 2 GB RAM per DDisk in addition to the base RAM noted in


Table 2.

•Requires an additional 5GB of disk space for application binaries and


log files.

•Deduplication is supported only for file systems running on a Linux


operating system.

•An Intel Pentium 4 or later processor (or an equivalent AMD


processor) is required. For best performance, Quantum recommends
an extra CPU per blockpool.

Partitioning Local Hard Disks

StorNext can be installed on any local file system (including the root file system) on the MDC. However,
for optimal performance, as well as to aid disaster recovery, follow these recommendations:

• Avoid installing StorNext on the root file system.


• Partition local hard disks so that the MDC has four available local file systems (other than the root
file system) located on four separate hard drives.

Configuration for Storage Devices

All SAN logical unit numbers (LUN) need to be visible to the MDC before configuration. LUNs that you
plan to use in the same stripe group must be the same size. StorNext also does not support connection
of multiple devices through fibre channel hubs. Fibre channel switches must be used. For more
information on configuration of storage devices please consult the StorNext 3.5.1 Users Guide.

When configuring LUNs greater than 1 TB, follow the required disk LUN labeling in the StorNext
Installation Guide. Quantum also suggested to create persistent binding for disk LUNs. For more
information, contact the vendor of your HBA (host bus adapter).

LAN requirements for the MDC server configuration include:

• In cases where gigabit networking hardware is used and maximum StorNext performance is
required, a separate, dedicated switched Ethernet LAN is recommended for the StorNext
metadata network. If maximum StorNext performance is not required, shared gigabit networking
is acceptable.
• A separate, dedicated switched Ethernet LAN is mandatory for the metadata network if 100 Mbit/s
or slower networking hardware is used.
• The MDC and all clients must have static IP addresses. Verify network connectivity with pings,
and verify entries in the /etc/hosts file. Alternatively, telnet or ssh between machines to verify
connectivity.
• If using Gigabit Ethernet, disable jumbo frames and TOE (TCP offload engine).

StorNext does not support file system metadata on the same network as iSCSI, NFS, CIFS or VLAN data
when 100 Mbit/s or slower networking hardware is used.

Installing the Linux Kernel Source Code

10
For management servers running Red Hat Enterprise Linux version 4 or 5, before installing SNFS and
SNSM, you must first install the kernel header files (shipped as the kernel-devel or kernel-devel-smp
RPM, depending on your Linux distribution).

For servers running SUSE Linux Enterprise Server, you must install the first kernel source code (shipped
as the kernel-source RPM). StorNext will not operate correctly if these packages are not installed. You
can install the kernel header files or kernel source RPMs by using the installation disks for your operating
system.

Other installation notes

The maximum hostname length for a StorNext server is limited to 25 characters. Before beginning the
installation, verify that the destination hostname is not longer than 25 characters. (The hostname is read
during the installation process, and if the hostname is longer than 25 characters the installation process
could fail.)

Software installation

StorNext comes with a pre-installation script (snPreinstall) on the installation CD. Running snPreInstall,
you are prompted for information about the system. The pre-installation script uses this information to
estimate the amount of local disk space required for SNFS and SNSM support directories. In addition, the
script recommends the optimal locations for support directories.

StorNext uses five directories to store application support information. These directories are stored locally
on the metadata controller, except for the Backup directory, which is stored on the managed file system.

The StorNext support directories are described in Table 1-2.

Table 1-2

Support Directory Description


Database Records information about where and how data
/adic/database files are stored.
Journal Records changes made to the database.
/adic/database_jnl
Mapping Contains index information that enables quick
/adic/mapping_dir searches on the file system.
Metadata Stores metadata dumps (backups of file
/adic/database_meta metadata).
Backup Contains configuration files and support data
/backup required for disaster recovery.
Web Server Apache configuration use by StorNext
/usr/adic/apache

The pre-installation script will need the following information to help prepare for the installation:

• Is this an upgrade installation?


• What local file systems can be used to store support information?
• Which version of StorNext will be installed?
• What is the maximum number of directories expected (in millions)?
• What is the maximum number of files expected (in millions)?
• How many copies will be stored for each file?
• How many versions will be retained for each file?

11
Quantun suggests that storage needs typically grow rapidly. Consider increasing the maximum number of
expected directories and files by a factor of 2.5x to ensure room for future growth.

The pre-installation script ignores un-mounted file systems. Before running snPreInstall, be sure to mount
all local file systems that will hold StorNext support information.

After entering all requested information, the pre-installation script outputs the following results:

• Estimated disk space required for each support directory.


• Recommended file system location for each support directory.

Quantum recommends that each support directory (other than the Backup directory) should be located on
its own local file system, and each local file system should reside on a separate physical hard disk in the
MDC. This will allow StorNext to perform optimally.

The pre-installation script bases directory location recommendations on the following criteria:

• To aid disaster recovery, the Database and Journal directories should be located on different file
systems.
• For optimal performance, the Metadata directory should not be located on the same file system
as (in order of priority) the Journal, Database or Mapping directory.

Do not change the location of support directories manually. Instead, use the installation script to specify
the location for support directories.

To begin the StorNext software installation from the installation CD ./install.stornext.

The following menu will appear:

Choosing option 1 will get the following configuration menu:

12
Assuming that the default installation directories will be used, the only option of interest to change will be
option 15, the default media type.

Changing the Default Media Type

If a different media typeis not specified, the StorNext installation script selects LTO as the default media
type for storage devices. If storage devices in your system use a different media type, change the default
media type before installing StorNext.

On the Configuration Menu, type 15 and press <Enter>.


A list of valid default media types is shown. The valid media types are: DDISK, SDISK, LTO, LTOW,
3590, 3592, 9840, 9940, AITW, AIT, DLT4 and T10K.

For the deduplication purposes, choose the DDISK option.

Return to main menu and choose option 2 to install the software. Once installation is complete, select
option 4 to exit.

StorNext 3.5.1 server configuration

The StorNext Management GUI has a configuration wizard that was used to complete the configuration
and is shown below:

13
After using the software serial number as the 30 day evaluation license for step 1, only steps 2, 6 and 7
were needed to configure a file system that will perform deduplication. The configuration wizard will
follow in order each step to completion. It is best to exit the configuration wizard and choose the steps
needed individually from the config menu show below.

14
Configure a file system to be shared

Choose the Add File System option from the Home Config menu and follow the prompts for the
information needed:

15
The following screen shows that it sees disks available to configure.

Choose Next > for the Add New File System screen.

16
Now enter the name of your shared file system (take note of this name because you will use it in the
/etc/fstab file of the clients).

Select a mount point to mount this file system to on the server. Select Browse to find or create a new
directory.

Select the option Enable Data Migration.

Select Enable Distributed LAN Server to serve clients who do not have a SAN connectivity and still want
to be a SNFS client. (This option was not tested in this evaluation.)

Choose Next > for the Disk Settings screen.

The next screen is disk settings in this evaluation. We kept the defaults:

In the evaluation, we kept the defaults for the disk settings.

Choose Next > to proceed to the Customize Strip Group screen.

17
Choose Next > after selecting your choices.

Then a review screen appears with your choices:

18
Choose Next > to create the file system.

Create a second file system for Deduplication storage

A second file system is now needed which will be used to create the data deduplication storage disk.
This file system is only mounted on the StorNext server. Go through the same options but do not choose
Enable Data Migration on the first wizard screen.

After the file system is created, select Add Storage Disk from the Home Config menu.

This Add Storage Disk wizard will display the following introduction screen:

19
Select Next > for the Add Storage Disk screen.

20
Select Enable Deduplication.

Select 1 for the copy # used for all policy classes.

Select the mount point that the storage disk will be mounted too. This is the second file system you just
created.

Click Browse to create a directory under that local file system you chose for the disk files to reside.

Choose Next > to proceed to the Complete Add Storage Disk Task screen:

21
Review you settings and click Next > to apply the settings.

Create a Storage Policy


Now a storage policy will need to be created that will do the data deduplication migration.

From the Home Config select Add Storage Policy to get to the Storage Policy Introduction screen.

22
The Storage Policy Introduction screen appears, showing any previously configured policy classes if any.

Choose Next > to continue to the Policy Class and Directory screen.

23
Give your new policy class a name and select Browse to select the shared file system that will be de-
duplicated.

Do not select any of the other choices.

Click Next > for The Store, Truncate and Relocate Time screen.

24
Configure the following here:

• Minimum Store Time (Minutes): The minimum amount of time a file must remain unaccessed
before it is considered a candidate for storage

• Minimum Truncation Time (Days): The minimum number of days a file must remain unaccessed
before it is considered a candidate for truncation

• Minimum Relocation Time (Days): The minimum number of days a file must remain unaccessed
on the primary affinity before it is considered a candidate for relocation to a secondary affinity
(this option does not appear when you select the Enable Stub Files option on the Policy Class
and Directory screen)

Click Next > for The Number of File Copies and Media Type screen:

25
Select the media type for File Copy 1 to be SDISK.

Click Next > to review your selections.

26
Choose Next > to apply the settings.

Deduplication should now be enabled for the shared file system.

StorNext 3.5.1 SAN and NFS client configuration

NFS client configuration

NFS can be used to share the distributed file system but it will not perform as well as it would across a
SAN on a fibre channel architecture. Follow the standard NFS implementation procedures for the
installed operating system. Performance of the evaluation tests are outlined in the Deduplication Results
section below.

SAN client configuration

On the client system, point a web browser to the URL (host name and port number) of the MDC (e.g.
http://servername:81).

When prompted, type the username and password for the MDC and then click OK. (The default value for
both username and password is admin.)

The StorNext home page appears.

27
Do one of the following:

• For a MDC running SNFS and SNSM: On the Admin menu, click Download Client Software.
• For a MDC running SNFS only: On the home page, click Download Client Software.

The Select Platform window appears:

In the list, click the operating system running on the client system, and then click Next >.

The Download Client Software window appears:

28
Click the download link that corresponds to your operating system version and hardware platform.
(Depending on the operating system, you may have only one choice.)

For example, for Red Hat Linux 4 running on an x86 64-bit platform, click Linux Redhat AS 4.0 (Intel
64bit).

When prompted, click Save or OK to download the file to the client system. Make sure to note the file
name and the location where you save the file.

After the download is complete, click Cancel to close the Download Client Software window.

Do not follow the onscreen installation instructions. Instead, continue with the correct procedure for your
operating system in the StorNext Installation Guide.

After the appropriate software has been installed on the client operating system one of the last steps in
the StorNext Installation Guide is to edit the fstab or vfstab (Solaris) file to add a line for the shared file
system.

As mentioned earlier, this is where you will need the name of the shared file system which was first
created on the MDC server is used. This name will not contain a leading slash like an NFS mount point
does.

29
Linux /etc/fstab example:

<shared file system name> <mount point> cvfs verbose=yes 0 0

Solaris /etc/vfstab example:

<shared file system name> - <mount point> cvfs 0 auto rw

main - /mainstore cvfs 0 auto rw

Reboot the client and the file system should be mounted.

LAN Client configuration

StorNext supports distributed LAN clients. Unlike a traditional StorNext SAN client, a distributed LAN
client does not connect directly to StorNext via fibre channel or iSCSI but rather across a LAN through a
gateway system called a distributed LAN server. The distributed LAN server is itself a directly connected
StorNext client, but it processes requests from distributed LAN clients in addition to running applications.

Any number of distributed LAN clients can connect to multiple distributed LAN servers. StorNext File
System supports Distributed LAN client environments in excess of 1000 clients and should support
deployments as large as 5000 clients.

File system aggregate throughput is not adversely impacted. Besides the obvious cost-savings benefit of
using distributed LAN clients, there will be performance improvements as well.

Using the distributed LAN client approach did show more than double the read/write throughput
compared to using NFS.

Special Notes

When using the StorNext software for a client or server running any Linux distribution, be sure to disable
any firewall or Security-Enhanced Linux (SELinux) if issues are encountered. It is possible to add the
ports necessary for SNFS to firewalls and write SELinux policies, see the SNFS documentation
for more information.

User Setup

StorNext has three classes of user accounts:


• The admin class
• The operator class
• The general user class

The Add New User screen below shows the multiple amount of functions that can be chosen for a user to
have access to:

30
31
Two user accounts were created for testing purposes:

• admin – used for administrative purposes


• general user – used for client end testing

Users will need to do one of the following in order to setup the user environment for StorNext commands.

If you are running sh, ksh, or bash, type: . /usr/adic/.profile.

For all other shells type: source /usr/adic/.cshrc.

Summary of installation experience

Setting up data deduplication was not easy or straight forward by following the published documentation.
The documentation tends to lack a good flow for installation due to the nature of explaining several
operating environments supported.

Problems Encountered / Resolution

Security guidelines set forth by AFRL required all network services to be up to date on known security
issues. The scanning of the StorNext 3.5.1 software with Nessus revealed out of date Apache software
distributed with StorNext 3.5.1. Several vulnerabilities existed with the Apache 2.0.54 installed which
include sending credentials in clear text, several cross-site scripting issues, buffer overflow attacks and
denial of service attacks. Due to these security issues testing was conducted at Avetec. As a test the
DICE team successfully upgraded the MDC server with the latest Apache web server during the
evaluation , and scanned the servers. This corrected all outstanding security issues. This upgrade did
not reveal any operational problems to the StorNext 3.5.1 install however, this Apache web server
upgrade would need to be proposed to Quantum for a completely supported StorNext configuration.
.

5.1.2 Management
Client

The client (end-user) will see no changes in their daily operations unless the network becomes bogged
down causing latency in data delivery. This would be similar to NFS, Clients will continue to send data to
their home directories and the data will be de-duplicated in the StorNext storage disk. The clients will
only see the mount point for the shared file system. The deduplication storage disk is not mounted to the
clients.

Administration

Once StorNext has been installed and configured, it can be managed by way of the StorNext Storage
Manager (SNSM). This tool combines the Tertiary Storage Manager (TSM) and Media Storage Manager
(MSM). TSM manages policy configuration along with the movement of data between primary disk and
secondary storage via disk or tape. The MSM tool is used to manage the media and archives. The use of
SNSM provides high-performance file migration and management services and to manage automated
and manual media libraries, including library volumes.

The graphics below provide insight into the SNSM GUI interface. Options provided by the File tab include
the following:

• Store→ Store files to a storage medium


• Version→ Show the version(s) of files stored on storage medium

32
• Recover File→ Recover deleted files
• Recover Directory→ Recursively recover deleted directories
• Retrieve File→ Retrieve truncated files from a storage medium
• Retrieve Directory→ Recursively retrieve truncated directories
• Free Disk Blocks→ Truncate files
• Move→ Move files from one media to another
• Attributes→ Change file attributes

Options provided by the Media tab include the following:

• Library→ Move media within a library


• Assign Policy→ Add media types to a policy class
• Remove→ Remove media from StorNext
• Assign Policy→ Associate blank media with a policy class
• Transcribe→ Copy media
• Attributes→ Alter the media’s state or attributes
• Reclassify→ Reclassify a media to a new media class
• Clean→ Clean a media by policy class, file system, or media identifier

Options provided by the Admin tab include the following:

• Library→ Perform library tasks such as Config Library, Audit Library,


• Library State, and Cancel Eject
• Drive→ Carry out drive tasks such as Config Drive, Change Drive State,
• and Clean Drive
• Storage Disk→ Perform storage disks tasks such as Config Storage
• Disk, Change Storage Disk State, and Clean Storage Disk
• Disk Space→ Perform an immediate file system storage or truncation
• policy
• Policy Class→ Add, modify, or delete a policy class
• Backup→ Configure backup procedure parameters
• Relation→ Add or remove directory relation points to a policy class
• Water Mark Parameter→ Set water mark parameters

33
Options provided by the Reports tab also include other basic metrics similar to all other StorNext reports.
The Help tab is another consecutive tab throughout the StorNext software. The Help tab helps with error
handling and other topics covered by a standard Help future.

Server Operation

Deduplication

Deduplication is done in a post process manner with StorNext 3.5.1. Deduplication is done when file
system activity is at a minimum or the need for free disk space is needed. Once files have been de-
duplicated, the extra copies being stored are truncated according to these default policies which can be
configured:

• After the file life is beyond 21 days


• When the high watermark has been reached for the shared file system.
• Highest priority if set in the Storage Manager Admin - Manage Disk Space as shown below:

Once that high watermark has been reached the file truncation will only continue until the storage space
has decreased on the shared file system low watermark. By default, the high watermark is 85% of the
shared file system and the low watermark is 75% of the shared file system. These parameters are
configurable in the Storage Manager screen under admin.

34
The following graphic shows the Change Watermark Parameters screen:

Scheduled Events
To change the scheduled events select Scheduled Events under admin on the Home page:

35
The Scheduled Events screen will appear and you can select an event to change the configuration. Only
1 can be chosen at a time:

StorNext events are tasks that are scheduled to run automatically based on a specified schedule. The
following events can be scheduled:

• Clean Info: This scheduled background operation removes from StorNext knowledge of media.
• Clean Versions: This scheduled event cleans old, inactive versions of files.
• Full Backup: By default, a full backup is run once a week to back up the entire database,
configuration files, and the file system metadata dump file.
• Health Check: By default, health checks are set up to run every day of the week, starting at 7:00
a.m.

36
• Partial Backup: By default, a partial backup is run on all other days of the week that the full
backup is not run. This backup includes database journals, configuration files, and file system
journal files.
• Rebuild Policy: This scheduled event rebuilds the internal candidate lists (for storing, truncation,
and relocation) by scanning the file system for files that need to be stored.

The Scheduler does not dynamically update when dates and times are changed significantly from the
current setting. You must reboot the system for the Scheduler to pick up the changes.

StorNext Logs

To view the StorNext logs choose Access StorNext Logs under admin on the Home page:

You can access and view any of the following types of logs:
• SNFS Logs: Logs about each configured file system
• StorNext Database Logs: Logs that track changes to the internal database
• SNSM - File Manager Logs: Logs that track storage errors, etc. of the Storage Manager

37
• SNSM - Library Manager Logs: Logs that track library events and status
• Server System Logs: Logs that record system messages
• StorNext Web Server Logs: Various logs related to the web server

Other GUI features

The GUI within StorNext is easily navigational and user-friendly. The following section gives insight to the
Graphical user interface used.

Below are four service options that will help monitor and capture system status information. This overview
gives insight as to how the GUI has been laid out. Once an option has been selected, the request will be
carried out leading the admin to the results page. If an error occurs during the request, the software will
guide the admin through the debugging process in order to solve the issue.

• Health Check: Execute one or more health checks and view recent health check results.

• State Capture: Acquire and save detailed system state information.

• System Status: Examine the fault tickets

38
• Admin Alerts: Examine alerts regarding system activities.

Other reports integrated in the GUI include the following:

• Backup Information Report


• Drive State Information Report
• File Information Report
• Library Information Report
• Library Space Used Report
• Media Information Report
• Policy Class Information Report
• Directory Affinity Report
• File System Statistics Report
• Stripe Group Statistics Report
• File System Client Report
• File System LAN Client Report

CLI features/functionality of note

By default, the CLI can be used as root from the MDC or StorNext client. This can be changed in the
SNFS Config Globals screen by unchecking the Global Super User box.

Quotas can also be enable on from the SNFS Config Globals screen shown below:

39
The CLI admin command, cvadmin, will need to be used for complete set up of user quotas.

The following information will be needed to setup individual user quotas:

• User or group name


• Hard limit for the quota
• Soft Limit for the quota
• Time Limit for the soft quota expiration – If the user exceeds the soft limit and does do not reduce
the space below that soft limit in this amount of time then the soft limit becomes the hard limit for
that account. As a result the user cannot use anymore space for the account directory.

Quotas are based on actual usage, and are not enforced based on space allocated.

Example using cvadmin to setup a user quota:

cvadmin quotas set [user | group] <username> hardlim softlim timelim

cvadmin quotas get [user | group] <username>

The cvadmin command also has a nice tool to test latency between client and server across the SAN.

40
cvadmin –H MDCserver –F sharedfilesystem –e latency-test all

Example Output:

Test started on client 3 (avtchp1.dice.avetec.org)... latency 54us


Test started on client 23 (avtcms1.dice.avetec.org)... latency 259us

Also of note is the performance output of the StorNext cvcp command. The cvcp command works similar
to the Unix cp command but in addition to copying files performance statistics during the copy are
displayed. By default the cvcp command has recursive copy turned on.

cvcp <source filename | directory> <target SNFS filesystem>

cvcp <source SNFS filesystem > <target filesystem>

Example output:

439 Files, 19.80 GB, 215.2409 Sec, 94.19 MB/Sec 2.04 Files/Sec

5.1.3 Testing Results


Data Deduplication Results

The data deduplication testing involved three unclassified data sets. These data sets were copied onto
the deduplication SAN mount point using the StorNext cvcp command and separate test using the Unix
tar command in the following consecutive order:

• Scientific benchmark data from AFRL (698 MB in size)


• Scientific weather data from NOAA (20 GB in size)
• Scientific data from ARSC (424 GB in size)

After all the data sets were copied to the shared file system and the resulting file sizes were noted, the
files were deleted and a clean operation was performed.

Table 5.1.3 Data Deduplication Results

Data Sets Used Data Deduplication Ratio Percent savings after


Data Deduplication
AFRL data 1.29 to 1 23.01%
NOAA data 1.10 to 1 9.27%
ARSC data 1.49 to 1 33.01%

The screen shot below is an actual output of the web GUI home page showing percent savings of the
deduplication achieved on all data sets. It is displayed in the lower right had corner.

41
5.1.4 Functional Testing

Table 5.1.4 below describes the test requirement, evaluation results and evaluation ranking for the critical
and high requirements defined by the Storage Initiative team.

The following criteria are used for the evaluation ranking:


Met The solution offered the minimum functionality.
Surpass The solution offered more than expected functionality.
Missed The solution offered less than minimum functionality.
N/A The requirement for this solution is not applicable.

Requirement Evaluation Results Evaluation


Ranking
Functionality
Attribute Requirements
The system must provide a Metadata can be re-created using the Met
method to recover a file, directory metadata dump option from the GUI.
or metadata that has been Recovery of a file or directory is
identified as corrupt. accomplished by creating additional
copies of the file through policies that
copy the files to disk or tape media for

42
backup.

Interfaces
Interface Tool
The tool must be able to perform The tool configures the SAN devices, Met
configuration modifications (add policies and shared file systems that
systems, files, directories). are exported to the clients. Client
software also needs to be downloaded
and configured on each client that
needs access to the shared file
systems.
The tool must be able to modify Once the SAN file system is mounted, Met
specifics regarding files and users can create their own sub-
directories to perform directory, create, edit or copy files into
deduplication (eliminate certain the directory. Once the files are stored
files/directories). to SAN mount point the files will be de-
duplicated in a post processing fashion
and stored on the de-duplicated
storage disk. Truncation of files on the
SAN file system only occur when a
high watermark has been reached for
the shared file system and reduce the
storage capacity to a low watermark.
Truncation can also occur by setting up
a policy.
The tool must provide the ability to Copying a file from the SAN file system Met
un-de-duplicate files and to a different working directory invokes
directories. files to be un-de-duplicated.
The tool must be able to operate Using the standard HPC Unix tools to Met
on single file/directory, lists of copy single or multiple files including
files/directories or directory tree. recursive options that work on
directories can be acted upon. The
cvcp command provided with the
StorNext software operates with single
or multiple files recursively as well as
display the progress and performance
of the I/O.
The tool must be able to report In the web GUI, the % savings of Met
percentage of storage gained from deduplication is displayed for the
eliminating redundancy. Storage Disk Monitor.
Hardware Interfaces
Communications Interfaces
The communication software must The system was configured using IPv4. Met
be configured to run on IPv4.
Hardware / Software Requirements
The system must be able to have The Quantum StorNext client software Met
Data De-dupe actions available will need to be loaded on each client
from the HPC file systems. that mounts the StorNext Shared File
System. De-dupe actions are available
when using Unix commands to move
or copy files from HPC file systems to
the SAN or LAN mount point of the
StorNext file system. StorNext client
software includes a cvcp command

43
that works like the Unix cp command
but measures the progress and
performance of the copy.
The system must interface with a The SAN will need to be configured so Met
remote or network file system link that each client can see the SAN
to HPC systems (like NFS) or a logical unit number (lun) configured for
direct attached file system with a the shared file system. The clients do
client. not need to see the SAN lun that are
used for the actual de-duplicate
storage area. The file system will then
be mounted similar to an NFS protocol
on the client.

NFS is also supported as a means to


distribute the shared file system.
The solution must be commercially The software is currently available for Met
available at the time of the purchase today.
installation.
The metadata system must The StorNext solution can be Met
provide failover and/or recovery configured with dual MetaData
capabilities. Controller servers for high availability
in the event of failure. They will run in
an active/passive configuration.
Operational
The data portion of all duplicate A file originating in the home directory Met
files flagged must be identical to was copied to the SAN shared file
the original file. system. The same file was copied to
the SAN shared file system under a
different filename. After waiting an
appropriate amount of time for the post
processing deduplication to complete.
That file was then copied from the SAN
shared file system back to a working
directory. A diff command then
showed that the original file was
identical to the copy in the working
directory.

Determine the limitation in the • The maximum number of disks per Met
number of files, where metadata is file system is 512
stored and hash table growth. • The maximum number of disks per
data stripe group is 128
• The maximum number of stripe
groups per file system is 256
• The maximum number of tape drives
is 256

For managed file systems only, the


maximum directory capacity is 50,000
files per single directory.

Limitations for growth are limited to the


size of the SAN file systems
implemented.

44
Security and Privacy
The Data de-dupe administrator (if The StorNext administrator is its own Met
not system administrator) must be user type defined only in the StorNext
able to perform functions without management GUI. However, root
having system level root access is needed for certain setup
privileges. tasks or if using the CLI for
administrative purposes.
The system must be able to detect The system will generate admin alerts Met
data corruption and react either and generate internal tickets of the
automatically or report to the site failures with suggested fixes.
system administrator.
The system must pass CSA Nessus scans report vulnerability in Miss
scans. Apache web service outlined in the
Problems/Resolutions above.

The system must provide the The system follows client NFS ACL Met
ability that all operations are privileges. All NFS ACL privileges’ will
subject to access permissions, need to be setup in the SAN shared file
authorizations of target objects, system by root user on the MDC
and user privileges on accounts server.
for all systems involved in any
operation.
System functions requiring Miss
elevated privilege must be The StorNext administrator, StorNext
properly documented to allow operator and StorNext general user
understanding and limitation of the passwords are static. They can only
risks. be changed by the StorNext
administrator unless the user has been
System configurations must meet
that privilege on setup by the StorNext
the DoD ports and protocols
administrator. No integration is
guidance and management
available for automation with Kerberos.
processes. Static passwords are
not permitted by DoD.
One of the following three
methods must be met to
provide/demonstrate proper
security protections:
HPCMP prefers Kerberos
protected services to secure
information transfer and
communications along with
SecureID or Common Access
Card/Public Key Infrastructure
(CAC/PKI) for single sign on
authentication.
or
Independent Certification: Meet
the requirements set forth by NAIP

45
CCEFS documentation.

or
Systems that do not use
Kerberized services and/or
SecureID/PKI authentication must
be documented and approved to
operate in the HPCMP
architecture (prior to installation)
according to the HPCMP Access
Guidelines.
Assurance that the software is The privileged StorNext admin Met
properly using and protecting commands are not available by any
those privileged actions and general user.
credentials is required.
All communications between Only clients with StorNext client Met
services must be properly software have access to the StorNext
authenticated and protected from file systems. The clients must also
intrusion (i.e., replicate over have been granted access to the file
WAN). system storage disk luns by the SAN.

Audit Trail
The StorNext administrator has access Met
The system must keep log files on
to log files for audit proposes.
Data de-dupe server.

General Performance
A test of 60 simultaneous writes and a Met
The system must be able to
second test of 60 simultaneous reads
handle multiple simultaneous
was performed successfully.
transactions without effecting
performance (single stream vs.
multiple stream).
A 2.2 TB file was created on a Met
The system must be able to
StorNext shared file system. The
handle file sizes of 2 TB.
StorNext software can handle file sizes
greater than 2 TB.

6. Summary

StorNext has proven to be a great product with functionality and usability being top-rated. Administration
of the software came with ease as the GUI provides a user-friendly interface. StorNext also offers a
custom command-line interface for those who prefer the CLI approach. The CLI does allow scripting for
automation if needed. However, it is not a fully functional CLI so the web GUI will still need to be used for
administrative purposes. This solution includes a great degree of reporting and functionality within the
tool. Although StorNext offers a great deal more than just data deduplication, this was the task of DICE
within this evaluation.
.
The Quantum StorNext software has proven to be a great tool in dealing with the problem of out of control
data growth. In this test case scenario, it really comes down to the datasets and how well they can be de-

46
duplicated. Through this study, DICE has found that the scientific data sets (defined in section 4) did not
de-dupe as well as desired. Although the StorNext software proved to be a powerful tool the datasets
continue to have a low deduplication factor. For a summary on the different compression ratios
and test results view Deduplication and Compression under Testing Results.

Two key requirements involving security were also missed in the testing. The ability to adapt CAC/PKI or
Kerberos logins or automatic password expiration and the security advisory scans revealed vulnerable
services due to older revisions of open source Apache software. Moving to the latest version of the out of
date software will fix the security scan results, but adding CAC/PKI/KRB5 support will need to be
proposed to Quantum for a fully supported configuration. Another issue worth mentioning would be
upgrading firmware or software from some older versions will require a server reboot and thus causing a
StorNext service outage.

Integration into the HPC environment takes little effort for SAN storage topology but for clients not having
Fibre Channel host bus adapters, the StorNext distributed LAN client is recommended over a slower
performing NFS architecture. The StorNext software does not have support for AFRL’s current archive
HSM SAM/QFS because it has its own file system structures.

47

You might also like