Data Deduplication Quantum Stornext: Real Tes NG Real Data Real Results
Data Deduplication Quantum Stornext: Real Tes NG Real Data Real Results
Data Deduplication Quantum Stornext: Real Tes NG Real Data Real Results
Quantum StorNext
1 Introduction .............................................................................................................................................. 1
2. Background............................................................................................................................................. 1
2.1 Offered Capabilities......................................................................................................................... 2
3. Key Evaluation Criteria ......................................................................................................................... 7
4. Evaluation Setup.................................................................................................................................... 7
5. Evaluation Results ................................................................................................................................. 9
5.1 Evaluation Findings ......................................................................................................................... 9
5.1.1 Software Installation and Setup ............................................................................................. 9
5.1.2 Management ........................................................................................................................... 32
5.1.3 Testing Results ....................................................................................................................... 41
5.1.4 Functional Testing.................................................................................................................. 41
6. Summary ............................................................................................................................................... 46
High Performance Computing Modernization Program
Air Force Research Lab
DoD SuperComputing Resource Center
Data Deduplication
Quantum
1. Introduction
The project will include an in-depth investigation of various data deduplication (de-dupe) technologies that
will identify the following: capabilities, user and center impacts, security issues and inter-operability issues
within a single location.
Quantum provides a software solution for intelligent data management called StorNext. StorNext is a self-
archiving, self-protecting shared file system for heterogeneous high-speed data sharing. It has a built-in
policy engine for automated file archival with the feature of data deduplication. Since the Data Intensive
Computing Environment (DICE) team has been tasked with reviewing the specific requirements regarding
data deduplication functionality, this report will not focus on other features offered by the StorNext
product.
The StorNext product contains both a command line interface (CLI) and a browser-based graphical user
interface (GUI) for configuration and management. Most of the configuration and management is
accomplished using the GUI with a subset of capability in the CLI. The product is designed to work with a
storage area network (SAN or LAN clients) architecture and rely on the underlying server/storage
hardware for data integrity. It also offers some recoverability features in the event of failures.
For the purposes of this investigation, we will use the latest version of StorNext 3.5.1 software as an
online storage solution similar to what is used for home directories of the HPC users at the AFRL DSRC.
2. Background
The Department of Defense (DoD) High Performance Computing Modernization Program (HPCMP) has
formed a Storage Initiative (SI) team to investigate the program’s current storage architectures across the
centers. Today, HPCMP is responsible for five major centers and two disaster recovery sites. One of the
key areas of concern to the SI team is the storing, managing and organizing of user data.
A data deduplication solution could provide a cost savings in storage needs for user data files. The
HPCMP SI team partnered with the DICE Program Management Team to conduct a technical evaluation
of the Quantum product within an HPC environment to gain a better understanding of the functionality and
integration requirements.
As a leading global specialist in backup, recovery and archive for more than 27 years, Quantum focuses
on helping IT departments address data protection and data retention challenges by incorporating
innovative solution sets with world-class service and support. Quantum also sells a series of hardware
appliances for Network Attached Storage and Virtual Tape Library that have data deduplication capability
called the DXi series.
1
2.1 Offered Capabilities
Table 2.1 below describes the capabilities of the Quantum StorNext product. The data is generally
available on Quantum‘s website (www.quantum.com), through documentation or questions directed to
Quantum’s personnel.
Quantum StorNext
General
Name & version of data deduplication StorNext version 3.5.1
software
General architecture Software solution that runs on multiple server
platforms and storage devices, preserving user
choice.
Can function in a heterogeneous Yes, A broad range of server platforms and storage
environment? for greater collaboration and fewer delays
Service and ports used There are several ports that StorNext utilizes. The
StorNext GUI uses Apache on port 81. The
StorNext file system uses configurable TCP/UDP
2
port ranges as defined in the /usr/cvfs/config/fsports
file.
Can the system detect intrusion No. StorNext is a File System with standard POSIX
(unauthorized access to file) and react and Windows permissions.
either automatically (log entry) or report
to the site system administrator?
Performance
Available benchmark data Enterprise Strategy Group Lab Validation Report
published on 05/11/2007:
http://salestools.quantum.com/dsp_doclistQDC.cfm
?c=20&sid=161&ssid=120&tid=35&lid=1
Throughput #’s for read/write Throughput varies with the solution configuration.
transactions In this evaluation the data was written to a stripped
LUN on the SAN that only has a 1GB/Sec interface
that resulted in about 95MB/Sec in read/write
performance. Using the Distributed Lan Client
functionality resulted in about 60 MB/Sec for
read/write performance. As a comparison NFS
was also tested which resulted in an 18-24 MB/Sec
in read/write performance.
Determine maximum file size - can Maximum file size can be up to the size of the file
handle file sizes of 25 TB system; the file system size can be greater than
1PB.
Scalability
Optimizing scalability Yes, scales with underlying hardware implemented
with the ability to access hundreds of millions of
shared files and petabytes of storage.
Reliability and Availability
Single points of failure StorNext supports dual node High Availability
Remote capabilities StorNext can be accessed by SAN clients, LAN
clients and CIFS/NFS exports (including replication
in StorNext 4.0).
Is there any mechanism for detecting Yes, StorNext uses a proactive monitoring and
data corruption? alerting to automatically monitor system health and
send email alerts, simplifying service and reducing
downtime. The Web GUI also has the capability to
not only view StorNext logs but system logs as well.
File data corruption must be reported, File corruption is corrected only if a copy of the data
corrected and completely understood if it exists in a backup tier to be restored. A RAS ticket
is the fault of the data deduplication is generated with recommended actions including
framework checking for error logs that are outside the StorNext
framework.
Metadata corruption should be reported, Metadata corruption triggers a RAS event
corrected and completely understood if it notification and instructs the admin to run metadata
is the fault of the data deduplication dump to re-create the metadata.
framework?
Does the system provide failover StorNext increases data availability through
capabilities for each component? optional Metadata Controller failover capabilities,
multipath to storage and multiple secondary tier
copies.
Capacity Planning and Performance
Analysis
What tools are available to determine Information is available about the capacities within
future capacity needs? each tier of storage.
3
Does StorNext have the capability to add Yes, the StorNext File System can be increased by
capacity on demand? adding more disk/storage.
What is the complexity of adding or Beyond typical SAN device setup, it is fairly simple
removing storage devices? to add/remove storage devices to StorNext.
What is the complexity of migrating data StorNext uses a policy-based mechanism to define
from disk to tape? data workflow and move data between storage tires
without user intervention. This includes movement
between primary, secondary (deduplication tier),
and tertiary (tape).
Are there tools to see how StorNext is Yes, tools are available to determine the amount of
performing (transactions / minute, # of data read/written to the secondary tier; however, it
users / transaction, etc.)? does not measure transactions or track the users
that generated the I/O.
Data Deduplication Management
Are ACLs from HPC system managed For NFS, the ACL’s are managed through the HPC
separately? system.
Is there a command line interface? Yes, the CLI has a subset of functionality of the
StorNext Web GUI. However, some functions like
user quota setup are only available on the CLI..
Is there a GUI provided for Yes, via an Apache based web interface
administration?
Is there a GUI provided for clients? No, there is no data deduplication management
GUI available to clients.
What types of reports are available for Capacity, deduplication percentage (in general --
administrators? the File System, Library, Drive, Media, File, Backup
and many more).
What documentation is provided for the Online GUI help, man pages and administration
installation and setup? guides. Complete documentation can be found at:
http://www.quantum.com/ServiceandSupport/Softw
areandDocumentationDownloads/SNMS/Index.asp
x#Documentation
4
alerts for failures
Is there any mechanism to detect failure Not available at this time
due to access permissions prior to
executing an operation?
Are troubleshooting or diagnostic tools Yes, tickets are logged for data failures and an
available? analysis tool can be run against the ticket to give a
suggested fix
Are there trend or pattern reports Not available at this time
available on storage usage?
What documentation is provided for the A StorNext trouble-shooting guide is available
hardware and software troubleshooting?
Service and Support
Support Services Available Quantum offers a suite of support options up to
7x24x365 GOLD Support. . See
http://www.quantum.com/ServiceandSupport/Servic
es/Software/Index.aspx for more information.
How many service employees are Over 400 team members with targeted expertise in
employed? backup, recovery, and archive. Coverage is
available in >100 countries around the globe
How often is software and firmware Software updates are released approximately 2
updated and who performs the times per year. Upgrades can be preformed by the
upgrades? customer or Professional Services can be
purchased to perform upgrades
Do software and firmware updates Yes, once the upgrades have been completed all
require an outage? hardware/software will need to be rebooted.
Training & Professional Services
Training Services Available StorageCare Learning is Quantum's online learning
portal. Simply log on to StorageCare Learning to
enroll and sign up for an account. Once your
account has been activated you will be able to view
all available StorageCare Learning Courses,
register for courses, view a personalized training
calendar, take online courses, view transcripts and
print certificates once you have successfully
completed a training course.
Cost
What is the typical list price for your Data The StorNext Data Deduplication option has U.S.
Deduplication solution? list price of $12,500; there is also license capacity
requirement for all data written to secondary
storage (this varies depending on the amount of
data written and is a tiered structure)
What are the ongoing costs over the life Standard support costs, client count additions and
of your solution? secondary storage capacity increases
Vendor Information
Are there any Department of Defense Yes, Starfire Optical Range at the Air Force
(DoD) references available? Research Lab, Kirtland AFB. Success story
references can be found at:
http://salestools.quantum.com/dsp_doclistQDC.cfm
?c=20&sid=161&ssid=104&tid=35&lid=1
5
SSES – Wright-Patterson AFB
DoD Distributed Common Ground System (DCGS)
National Geospatial-Intelligence Agency (NGA)
National Aeronautics and Space Administration
(NASA)
National Oceanic and Atmospheric Administration
(NOAA)
6
http://www.quantum.com/ServiceandSupport/Softw
areandDocumentationDownloads/SNMS/Index.asp
x#Documentation
The system must support Linux and Unix Yes, the server was Linux based along with clients
operating system(s) – POSIX compliant? running Linux and Solaris.
Operational Requirements
Data should retain its normal structure in Yes, unless using storage disk. Within the primary
order to maintain interoperability with tier of disk storage, StorNext does not manipulate
other systems. any of the data in the file. When the file is written to
secondary targets, it is stored in a StorNext format.
Since that data is not accessed directly by the
application, this does not pose any interoperability
issues.
Deduplication should have minimal Yes, data deduplication processing is run in a post-
impact on additional storage. process manor so that the data streams being
written will not be altered until the write has been
completely stored..
Audit Trail
Does the system keep an audit trail of No, but in the next release StorNext v4.0
administrator transactions? administrator transactions are logged.
4. Evaluation Setup
On behalf of AFRL DSRC, the following unclassified data was researched and tested to perform a
deduplication:
StorNext was configured with data storage on a SAN and the metadata transfer across the public TCP/IP
network. In addition, NFS protocol to the clients was tested as well.
♦ Hewlett Packard Proliant DL1820 G5 with Dual 2.5 Ghz Intel Xeon E5420 Quad-core
processors and 16 GB of Memory with Red Hat Linux 5.3 as the operating environment.
♦ The SAN infrastructure receives 4GB/Sec, the Data Direct Networks S2A3000 storage
receives 1 GB/sec max throughput as configured. A striped LUN was used for both the
shared file system the data deduplication storage.
7
♦ Clients consisted of a Sun X4200 with Dual 2.2Ghz AMD Opteron 275 Dual-core
processors and 4 GB Memory running Solaris 10 along with a Dell 2850 Dual 3.2 Hz
Xeon Dual –core processors with 4 GB Memory running CentOS Linux.
Figure 1 below depicts the solution setup used at the Avetec site.
Figure 1
8
5. Evaluation Results
System Setup
The DICE team installed the StorNext software on a server located at Avetec. Normally, all testing for
AFRL is conducted with the onsite DICE hardware configuration. However, since a large part of system
configuration is done via http protocol, there was an expectation that StorNext (as with most vendor
software) would not pass security testing performed at AFRL. In turn, this would prevent evaluation of the
software completely on the AFRL network. Security testing was still performed at Avetec. To read more
about security issues encountered please see “Problems Encountered/Resolution” below.
The StorNext software consists of two components: the StorNext File System and the Storage Manager.
These components are installed on a server known as the Metadata controller (MDC). The operating
environments currently supported by StorNext include Red Hat Linux Enterprise 5, SUSE Linux
Enterprise Server 10 and Sun Solaris 10. Please consult the StorNext Installation Guide on specifics
regarding support kernel. However, currently data deduplication is only supported on Red Hat and SUSE
Linux.
The hardware requirement for Storage Manager depends upon the number of file systems in the planned
configuration. These dependencies are outlined in table 1-1 along with some additional notes.
Table 1-1
9
Note: If a file system uses de-duplicated storage disks (DDisks), note
the following additional requirements:
StorNext can be installed on any local file system (including the root file system) on the MDC. However,
for optimal performance, as well as to aid disaster recovery, follow these recommendations:
All SAN logical unit numbers (LUN) need to be visible to the MDC before configuration. LUNs that you
plan to use in the same stripe group must be the same size. StorNext also does not support connection
of multiple devices through fibre channel hubs. Fibre channel switches must be used. For more
information on configuration of storage devices please consult the StorNext 3.5.1 Users Guide.
When configuring LUNs greater than 1 TB, follow the required disk LUN labeling in the StorNext
Installation Guide. Quantum also suggested to create persistent binding for disk LUNs. For more
information, contact the vendor of your HBA (host bus adapter).
• In cases where gigabit networking hardware is used and maximum StorNext performance is
required, a separate, dedicated switched Ethernet LAN is recommended for the StorNext
metadata network. If maximum StorNext performance is not required, shared gigabit networking
is acceptable.
• A separate, dedicated switched Ethernet LAN is mandatory for the metadata network if 100 Mbit/s
or slower networking hardware is used.
• The MDC and all clients must have static IP addresses. Verify network connectivity with pings,
and verify entries in the /etc/hosts file. Alternatively, telnet or ssh between machines to verify
connectivity.
• If using Gigabit Ethernet, disable jumbo frames and TOE (TCP offload engine).
StorNext does not support file system metadata on the same network as iSCSI, NFS, CIFS or VLAN data
when 100 Mbit/s or slower networking hardware is used.
10
For management servers running Red Hat Enterprise Linux version 4 or 5, before installing SNFS and
SNSM, you must first install the kernel header files (shipped as the kernel-devel or kernel-devel-smp
RPM, depending on your Linux distribution).
For servers running SUSE Linux Enterprise Server, you must install the first kernel source code (shipped
as the kernel-source RPM). StorNext will not operate correctly if these packages are not installed. You
can install the kernel header files or kernel source RPMs by using the installation disks for your operating
system.
The maximum hostname length for a StorNext server is limited to 25 characters. Before beginning the
installation, verify that the destination hostname is not longer than 25 characters. (The hostname is read
during the installation process, and if the hostname is longer than 25 characters the installation process
could fail.)
Software installation
StorNext comes with a pre-installation script (snPreinstall) on the installation CD. Running snPreInstall,
you are prompted for information about the system. The pre-installation script uses this information to
estimate the amount of local disk space required for SNFS and SNSM support directories. In addition, the
script recommends the optimal locations for support directories.
StorNext uses five directories to store application support information. These directories are stored locally
on the metadata controller, except for the Backup directory, which is stored on the managed file system.
Table 1-2
The pre-installation script will need the following information to help prepare for the installation:
11
Quantun suggests that storage needs typically grow rapidly. Consider increasing the maximum number of
expected directories and files by a factor of 2.5x to ensure room for future growth.
The pre-installation script ignores un-mounted file systems. Before running snPreInstall, be sure to mount
all local file systems that will hold StorNext support information.
After entering all requested information, the pre-installation script outputs the following results:
Quantum recommends that each support directory (other than the Backup directory) should be located on
its own local file system, and each local file system should reside on a separate physical hard disk in the
MDC. This will allow StorNext to perform optimally.
The pre-installation script bases directory location recommendations on the following criteria:
• To aid disaster recovery, the Database and Journal directories should be located on different file
systems.
• For optimal performance, the Metadata directory should not be located on the same file system
as (in order of priority) the Journal, Database or Mapping directory.
Do not change the location of support directories manually. Instead, use the installation script to specify
the location for support directories.
12
Assuming that the default installation directories will be used, the only option of interest to change will be
option 15, the default media type.
If a different media typeis not specified, the StorNext installation script selects LTO as the default media
type for storage devices. If storage devices in your system use a different media type, change the default
media type before installing StorNext.
Return to main menu and choose option 2 to install the software. Once installation is complete, select
option 4 to exit.
The StorNext Management GUI has a configuration wizard that was used to complete the configuration
and is shown below:
13
After using the software serial number as the 30 day evaluation license for step 1, only steps 2, 6 and 7
were needed to configure a file system that will perform deduplication. The configuration wizard will
follow in order each step to completion. It is best to exit the configuration wizard and choose the steps
needed individually from the config menu show below.
14
Configure a file system to be shared
Choose the Add File System option from the Home Config menu and follow the prompts for the
information needed:
15
The following screen shows that it sees disks available to configure.
Choose Next > for the Add New File System screen.
16
Now enter the name of your shared file system (take note of this name because you will use it in the
/etc/fstab file of the clients).
Select a mount point to mount this file system to on the server. Select Browse to find or create a new
directory.
Select Enable Distributed LAN Server to serve clients who do not have a SAN connectivity and still want
to be a SNFS client. (This option was not tested in this evaluation.)
The next screen is disk settings in this evaluation. We kept the defaults:
17
Choose Next > after selecting your choices.
18
Choose Next > to create the file system.
A second file system is now needed which will be used to create the data deduplication storage disk.
This file system is only mounted on the StorNext server. Go through the same options but do not choose
Enable Data Migration on the first wizard screen.
After the file system is created, select Add Storage Disk from the Home Config menu.
This Add Storage Disk wizard will display the following introduction screen:
19
Select Next > for the Add Storage Disk screen.
20
Select Enable Deduplication.
Select the mount point that the storage disk will be mounted too. This is the second file system you just
created.
Click Browse to create a directory under that local file system you chose for the disk files to reside.
Choose Next > to proceed to the Complete Add Storage Disk Task screen:
21
Review you settings and click Next > to apply the settings.
From the Home Config select Add Storage Policy to get to the Storage Policy Introduction screen.
22
The Storage Policy Introduction screen appears, showing any previously configured policy classes if any.
Choose Next > to continue to the Policy Class and Directory screen.
23
Give your new policy class a name and select Browse to select the shared file system that will be de-
duplicated.
Click Next > for The Store, Truncate and Relocate Time screen.
24
Configure the following here:
• Minimum Store Time (Minutes): The minimum amount of time a file must remain unaccessed
before it is considered a candidate for storage
• Minimum Truncation Time (Days): The minimum number of days a file must remain unaccessed
before it is considered a candidate for truncation
• Minimum Relocation Time (Days): The minimum number of days a file must remain unaccessed
on the primary affinity before it is considered a candidate for relocation to a secondary affinity
(this option does not appear when you select the Enable Stub Files option on the Policy Class
and Directory screen)
Click Next > for The Number of File Copies and Media Type screen:
25
Select the media type for File Copy 1 to be SDISK.
26
Choose Next > to apply the settings.
NFS can be used to share the distributed file system but it will not perform as well as it would across a
SAN on a fibre channel architecture. Follow the standard NFS implementation procedures for the
installed operating system. Performance of the evaluation tests are outlined in the Deduplication Results
section below.
On the client system, point a web browser to the URL (host name and port number) of the MDC (e.g.
http://servername:81).
When prompted, type the username and password for the MDC and then click OK. (The default value for
both username and password is admin.)
27
Do one of the following:
• For a MDC running SNFS and SNSM: On the Admin menu, click Download Client Software.
• For a MDC running SNFS only: On the home page, click Download Client Software.
In the list, click the operating system running on the client system, and then click Next >.
28
Click the download link that corresponds to your operating system version and hardware platform.
(Depending on the operating system, you may have only one choice.)
For example, for Red Hat Linux 4 running on an x86 64-bit platform, click Linux Redhat AS 4.0 (Intel
64bit).
When prompted, click Save or OK to download the file to the client system. Make sure to note the file
name and the location where you save the file.
After the download is complete, click Cancel to close the Download Client Software window.
Do not follow the onscreen installation instructions. Instead, continue with the correct procedure for your
operating system in the StorNext Installation Guide.
After the appropriate software has been installed on the client operating system one of the last steps in
the StorNext Installation Guide is to edit the fstab or vfstab (Solaris) file to add a line for the shared file
system.
As mentioned earlier, this is where you will need the name of the shared file system which was first
created on the MDC server is used. This name will not contain a leading slash like an NFS mount point
does.
29
Linux /etc/fstab example:
StorNext supports distributed LAN clients. Unlike a traditional StorNext SAN client, a distributed LAN
client does not connect directly to StorNext via fibre channel or iSCSI but rather across a LAN through a
gateway system called a distributed LAN server. The distributed LAN server is itself a directly connected
StorNext client, but it processes requests from distributed LAN clients in addition to running applications.
Any number of distributed LAN clients can connect to multiple distributed LAN servers. StorNext File
System supports Distributed LAN client environments in excess of 1000 clients and should support
deployments as large as 5000 clients.
File system aggregate throughput is not adversely impacted. Besides the obvious cost-savings benefit of
using distributed LAN clients, there will be performance improvements as well.
Using the distributed LAN client approach did show more than double the read/write throughput
compared to using NFS.
Special Notes
When using the StorNext software for a client or server running any Linux distribution, be sure to disable
any firewall or Security-Enhanced Linux (SELinux) if issues are encountered. It is possible to add the
ports necessary for SNFS to firewalls and write SELinux policies, see the SNFS documentation
for more information.
User Setup
The Add New User screen below shows the multiple amount of functions that can be chosen for a user to
have access to:
30
31
Two user accounts were created for testing purposes:
Users will need to do one of the following in order to setup the user environment for StorNext commands.
Setting up data deduplication was not easy or straight forward by following the published documentation.
The documentation tends to lack a good flow for installation due to the nature of explaining several
operating environments supported.
Security guidelines set forth by AFRL required all network services to be up to date on known security
issues. The scanning of the StorNext 3.5.1 software with Nessus revealed out of date Apache software
distributed with StorNext 3.5.1. Several vulnerabilities existed with the Apache 2.0.54 installed which
include sending credentials in clear text, several cross-site scripting issues, buffer overflow attacks and
denial of service attacks. Due to these security issues testing was conducted at Avetec. As a test the
DICE team successfully upgraded the MDC server with the latest Apache web server during the
evaluation , and scanned the servers. This corrected all outstanding security issues. This upgrade did
not reveal any operational problems to the StorNext 3.5.1 install however, this Apache web server
upgrade would need to be proposed to Quantum for a completely supported StorNext configuration.
.
5.1.2 Management
Client
The client (end-user) will see no changes in their daily operations unless the network becomes bogged
down causing latency in data delivery. This would be similar to NFS, Clients will continue to send data to
their home directories and the data will be de-duplicated in the StorNext storage disk. The clients will
only see the mount point for the shared file system. The deduplication storage disk is not mounted to the
clients.
Administration
Once StorNext has been installed and configured, it can be managed by way of the StorNext Storage
Manager (SNSM). This tool combines the Tertiary Storage Manager (TSM) and Media Storage Manager
(MSM). TSM manages policy configuration along with the movement of data between primary disk and
secondary storage via disk or tape. The MSM tool is used to manage the media and archives. The use of
SNSM provides high-performance file migration and management services and to manage automated
and manual media libraries, including library volumes.
The graphics below provide insight into the SNSM GUI interface. Options provided by the File tab include
the following:
32
• Recover File→ Recover deleted files
• Recover Directory→ Recursively recover deleted directories
• Retrieve File→ Retrieve truncated files from a storage medium
• Retrieve Directory→ Recursively retrieve truncated directories
• Free Disk Blocks→ Truncate files
• Move→ Move files from one media to another
• Attributes→ Change file attributes
33
Options provided by the Reports tab also include other basic metrics similar to all other StorNext reports.
The Help tab is another consecutive tab throughout the StorNext software. The Help tab helps with error
handling and other topics covered by a standard Help future.
Server Operation
Deduplication
Deduplication is done in a post process manner with StorNext 3.5.1. Deduplication is done when file
system activity is at a minimum or the need for free disk space is needed. Once files have been de-
duplicated, the extra copies being stored are truncated according to these default policies which can be
configured:
Once that high watermark has been reached the file truncation will only continue until the storage space
has decreased on the shared file system low watermark. By default, the high watermark is 85% of the
shared file system and the low watermark is 75% of the shared file system. These parameters are
configurable in the Storage Manager screen under admin.
34
The following graphic shows the Change Watermark Parameters screen:
Scheduled Events
To change the scheduled events select Scheduled Events under admin on the Home page:
35
The Scheduled Events screen will appear and you can select an event to change the configuration. Only
1 can be chosen at a time:
StorNext events are tasks that are scheduled to run automatically based on a specified schedule. The
following events can be scheduled:
• Clean Info: This scheduled background operation removes from StorNext knowledge of media.
• Clean Versions: This scheduled event cleans old, inactive versions of files.
• Full Backup: By default, a full backup is run once a week to back up the entire database,
configuration files, and the file system metadata dump file.
• Health Check: By default, health checks are set up to run every day of the week, starting at 7:00
a.m.
36
• Partial Backup: By default, a partial backup is run on all other days of the week that the full
backup is not run. This backup includes database journals, configuration files, and file system
journal files.
• Rebuild Policy: This scheduled event rebuilds the internal candidate lists (for storing, truncation,
and relocation) by scanning the file system for files that need to be stored.
The Scheduler does not dynamically update when dates and times are changed significantly from the
current setting. You must reboot the system for the Scheduler to pick up the changes.
StorNext Logs
To view the StorNext logs choose Access StorNext Logs under admin on the Home page:
You can access and view any of the following types of logs:
• SNFS Logs: Logs about each configured file system
• StorNext Database Logs: Logs that track changes to the internal database
• SNSM - File Manager Logs: Logs that track storage errors, etc. of the Storage Manager
37
• SNSM - Library Manager Logs: Logs that track library events and status
• Server System Logs: Logs that record system messages
• StorNext Web Server Logs: Various logs related to the web server
The GUI within StorNext is easily navigational and user-friendly. The following section gives insight to the
Graphical user interface used.
Below are four service options that will help monitor and capture system status information. This overview
gives insight as to how the GUI has been laid out. Once an option has been selected, the request will be
carried out leading the admin to the results page. If an error occurs during the request, the software will
guide the admin through the debugging process in order to solve the issue.
• Health Check: Execute one or more health checks and view recent health check results.
38
• Admin Alerts: Examine alerts regarding system activities.
By default, the CLI can be used as root from the MDC or StorNext client. This can be changed in the
SNFS Config Globals screen by unchecking the Global Super User box.
Quotas can also be enable on from the SNFS Config Globals screen shown below:
39
The CLI admin command, cvadmin, will need to be used for complete set up of user quotas.
Quotas are based on actual usage, and are not enforced based on space allocated.
The cvadmin command also has a nice tool to test latency between client and server across the SAN.
40
cvadmin –H MDCserver –F sharedfilesystem –e latency-test all
Example Output:
Also of note is the performance output of the StorNext cvcp command. The cvcp command works similar
to the Unix cp command but in addition to copying files performance statistics during the copy are
displayed. By default the cvcp command has recursive copy turned on.
Example output:
439 Files, 19.80 GB, 215.2409 Sec, 94.19 MB/Sec 2.04 Files/Sec
The data deduplication testing involved three unclassified data sets. These data sets were copied onto
the deduplication SAN mount point using the StorNext cvcp command and separate test using the Unix
tar command in the following consecutive order:
After all the data sets were copied to the shared file system and the resulting file sizes were noted, the
files were deleted and a clean operation was performed.
The screen shot below is an actual output of the web GUI home page showing percent savings of the
deduplication achieved on all data sets. It is displayed in the lower right had corner.
41
5.1.4 Functional Testing
Table 5.1.4 below describes the test requirement, evaluation results and evaluation ranking for the critical
and high requirements defined by the Storage Initiative team.
42
backup.
Interfaces
Interface Tool
The tool must be able to perform The tool configures the SAN devices, Met
configuration modifications (add policies and shared file systems that
systems, files, directories). are exported to the clients. Client
software also needs to be downloaded
and configured on each client that
needs access to the shared file
systems.
The tool must be able to modify Once the SAN file system is mounted, Met
specifics regarding files and users can create their own sub-
directories to perform directory, create, edit or copy files into
deduplication (eliminate certain the directory. Once the files are stored
files/directories). to SAN mount point the files will be de-
duplicated in a post processing fashion
and stored on the de-duplicated
storage disk. Truncation of files on the
SAN file system only occur when a
high watermark has been reached for
the shared file system and reduce the
storage capacity to a low watermark.
Truncation can also occur by setting up
a policy.
The tool must provide the ability to Copying a file from the SAN file system Met
un-de-duplicate files and to a different working directory invokes
directories. files to be un-de-duplicated.
The tool must be able to operate Using the standard HPC Unix tools to Met
on single file/directory, lists of copy single or multiple files including
files/directories or directory tree. recursive options that work on
directories can be acted upon. The
cvcp command provided with the
StorNext software operates with single
or multiple files recursively as well as
display the progress and performance
of the I/O.
The tool must be able to report In the web GUI, the % savings of Met
percentage of storage gained from deduplication is displayed for the
eliminating redundancy. Storage Disk Monitor.
Hardware Interfaces
Communications Interfaces
The communication software must The system was configured using IPv4. Met
be configured to run on IPv4.
Hardware / Software Requirements
The system must be able to have The Quantum StorNext client software Met
Data De-dupe actions available will need to be loaded on each client
from the HPC file systems. that mounts the StorNext Shared File
System. De-dupe actions are available
when using Unix commands to move
or copy files from HPC file systems to
the SAN or LAN mount point of the
StorNext file system. StorNext client
software includes a cvcp command
43
that works like the Unix cp command
but measures the progress and
performance of the copy.
The system must interface with a The SAN will need to be configured so Met
remote or network file system link that each client can see the SAN
to HPC systems (like NFS) or a logical unit number (lun) configured for
direct attached file system with a the shared file system. The clients do
client. not need to see the SAN lun that are
used for the actual de-duplicate
storage area. The file system will then
be mounted similar to an NFS protocol
on the client.
Determine the limitation in the • The maximum number of disks per Met
number of files, where metadata is file system is 512
stored and hash table growth. • The maximum number of disks per
data stripe group is 128
• The maximum number of stripe
groups per file system is 256
• The maximum number of tape drives
is 256
44
Security and Privacy
The Data de-dupe administrator (if The StorNext administrator is its own Met
not system administrator) must be user type defined only in the StorNext
able to perform functions without management GUI. However, root
having system level root access is needed for certain setup
privileges. tasks or if using the CLI for
administrative purposes.
The system must be able to detect The system will generate admin alerts Met
data corruption and react either and generate internal tickets of the
automatically or report to the site failures with suggested fixes.
system administrator.
The system must pass CSA Nessus scans report vulnerability in Miss
scans. Apache web service outlined in the
Problems/Resolutions above.
The system must provide the The system follows client NFS ACL Met
ability that all operations are privileges. All NFS ACL privileges’ will
subject to access permissions, need to be setup in the SAN shared file
authorizations of target objects, system by root user on the MDC
and user privileges on accounts server.
for all systems involved in any
operation.
System functions requiring Miss
elevated privilege must be The StorNext administrator, StorNext
properly documented to allow operator and StorNext general user
understanding and limitation of the passwords are static. They can only
risks. be changed by the StorNext
administrator unless the user has been
System configurations must meet
that privilege on setup by the StorNext
the DoD ports and protocols
administrator. No integration is
guidance and management
available for automation with Kerberos.
processes. Static passwords are
not permitted by DoD.
One of the following three
methods must be met to
provide/demonstrate proper
security protections:
HPCMP prefers Kerberos
protected services to secure
information transfer and
communications along with
SecureID or Common Access
Card/Public Key Infrastructure
(CAC/PKI) for single sign on
authentication.
or
Independent Certification: Meet
the requirements set forth by NAIP
45
CCEFS documentation.
or
Systems that do not use
Kerberized services and/or
SecureID/PKI authentication must
be documented and approved to
operate in the HPCMP
architecture (prior to installation)
according to the HPCMP Access
Guidelines.
Assurance that the software is The privileged StorNext admin Met
properly using and protecting commands are not available by any
those privileged actions and general user.
credentials is required.
All communications between Only clients with StorNext client Met
services must be properly software have access to the StorNext
authenticated and protected from file systems. The clients must also
intrusion (i.e., replicate over have been granted access to the file
WAN). system storage disk luns by the SAN.
Audit Trail
The StorNext administrator has access Met
The system must keep log files on
to log files for audit proposes.
Data de-dupe server.
General Performance
A test of 60 simultaneous writes and a Met
The system must be able to
second test of 60 simultaneous reads
handle multiple simultaneous
was performed successfully.
transactions without effecting
performance (single stream vs.
multiple stream).
A 2.2 TB file was created on a Met
The system must be able to
StorNext shared file system. The
handle file sizes of 2 TB.
StorNext software can handle file sizes
greater than 2 TB.
6. Summary
StorNext has proven to be a great product with functionality and usability being top-rated. Administration
of the software came with ease as the GUI provides a user-friendly interface. StorNext also offers a
custom command-line interface for those who prefer the CLI approach. The CLI does allow scripting for
automation if needed. However, it is not a fully functional CLI so the web GUI will still need to be used for
administrative purposes. This solution includes a great degree of reporting and functionality within the
tool. Although StorNext offers a great deal more than just data deduplication, this was the task of DICE
within this evaluation.
.
The Quantum StorNext software has proven to be a great tool in dealing with the problem of out of control
data growth. In this test case scenario, it really comes down to the datasets and how well they can be de-
46
duplicated. Through this study, DICE has found that the scientific data sets (defined in section 4) did not
de-dupe as well as desired. Although the StorNext software proved to be a powerful tool the datasets
continue to have a low deduplication factor. For a summary on the different compression ratios
and test results view Deduplication and Compression under Testing Results.
Two key requirements involving security were also missed in the testing. The ability to adapt CAC/PKI or
Kerberos logins or automatic password expiration and the security advisory scans revealed vulnerable
services due to older revisions of open source Apache software. Moving to the latest version of the out of
date software will fix the security scan results, but adding CAC/PKI/KRB5 support will need to be
proposed to Quantum for a fully supported configuration. Another issue worth mentioning would be
upgrading firmware or software from some older versions will require a server reboot and thus causing a
StorNext service outage.
Integration into the HPC environment takes little effort for SAN storage topology but for clients not having
Fibre Channel host bus adapters, the StorNext distributed LAN client is recommended over a slower
performing NFS architecture. The StorNext software does not have support for AFRL’s current archive
HSM SAM/QFS because it has its own file system structures.
47