Ready-To-Use Virtual Appliance For Hands-On IBM Spectrum Archive Evaluation
Ready-To-Use Virtual Appliance For Hands-On IBM Spectrum Archive Evaluation
Ready-To-Use Virtual Appliance For Hands-On IBM Spectrum Archive Evaluation
Ready-to-use Virtual
Appliance for Hands-on IBM
Spectrum Archive Evaluation
Hiroyuki Miyoshi
Hiroshi Araki
Takeshi Ishimoto
Redpaper
Introduction
IBM® Spectrum Archive Enterprise Edition for the IBM TS4500, IBM TS3500, IBM TS4300,
and IBM TS3310 tape libraries provides seamless integration of IBM Linear Tape File System
(LTFS) with IBM Spectrum® Scale by creating an LTFS tape tier. You can run any application
that is designed for disk files on tape by using IBM Spectrum Archive. IBM Spectrum Archive
can play an important role in reducing the cost of storage for data that does not need the
access performance of primary disk.
For more information about IBM Spectrum Archive, see the product information at:
https://www.ibm.com/us-en/marketplace/data-archive
The IBM Spectrum Archive Virtual Appliance can be deployed in minutes and key features
can be tried along with this user guide. The virtual machine (VM) has a pre-configured IBM
Spectrum Scale and a virtual tape library that allows to quickly test the IBM Spectrum Archive
features without connecting to a physical tape library. The virtual appliance is provided as a
VirtualBox .ova file.
In this IBM Redpaper publication, we show you how to set up a virtual appliance and include
typical use cases with instructions. This publication includes the following topics:
How to migrate files from disk to tape manually and automatically.
How to recall files from tape to disk.
Briefly introduces tape management commands, such as reconcile, reclaim, export, and
import.
How REST API is supported in IBM Spectrum Archive.
This version of the trial virtual appliance includes the following software:
IBM Spectrum Archive Enterprise Edition 1.3
IBM Spectrum Scale 5.0.0
CPU 1 Core
IP addresses 127.0.0.1
Note: You can connect the VM from the host via ssh -p 2022
root@localhost.
Installation
Installation of IBM Spectrum Archive Virtual Appliance consists of the following steps:
Configure VirtualBox
Import the OVA file
Correct the network adapter
3
Configure VirtualBox
Complete the following steps to install the IBM Spectrum Archive Virtual Appliance:
1. Start VirtualBox.
2. Click File → Host Network Manager.
3. Ensure that there is a Host-only Adapter with IPv4 Address of 192.168.56.1 and IPv4
Network Mask of 255.255.255.0. DHCP Server must be enabled. If such a network does
not exist, create a new Host-only Adapter, as shown in Figure 1.
Note: During the import process, select the Include only NAT network adapter MAC
address for MAC Address Policy option.
2. If the Host-only Adapter does not match, click Settings icon from the VirtualBox window.
Click Network in the Settings dialog, select Adapter 2, and change the Name field of the
adapter to the one configured in step 2 in “Configure VirtualBox” on page 4. Press OK to
apply the change.
3. Start the virtual machine by clicking the Start icon.
5
Setting up IBM Spectrum Archive Virtual Appliance
In this section, we describe how to set up IBM Spectrum Archive Virtual Appliance after the
installation is complete.
When the virtual machine boots, login by using the following credentials:
User: root
Password: ibm
The VM can also be connected from the host via ssh -p 2022 root@localhost.
IBM Spectrum Archive then can be configured by completing the initial setup process that is
described next. This set up must be completed whenever the virtual machine starts.
2. Ensure that the IBM Spectrum Scale service is active and /ibm/fs1is mounted, as shown
in Example 2. It may take approximately five minutes for fs1 to be mounted.
4. Verify the status of IBM Spectrum Archive by issuing the commands that are shown in
Example 4.
A virtual tape library (lib1) with four virtual tape drives (VDRIVE0000 - VDRIVE0003) and
10 virtual tape cartridges (10 GB capacity for each) VTAPE0L5 - VTAPE9L5 are attached.
The drives are pre-configured.
5. Create a tape pool for data migration and assign tapes to the pool, as shown in
Example 5. This process is required only once when the virtual appliance is first created
and set up.
7
[root@spectrumscale ~]# eeadm tape assign VTAP00L5 VTAP01L5 VTAP02L5 -p pool1 -f
2019-06-27 03:51:30 GLESL700I: Task tape_assign was created successfully, task id is 1000.
2019-06-27 03:51:31 GLESL087I: Tape VTAP01L5 successfully formatted.
2019-06-27 03:51:31 GLESL360I: Assigned tape VTAP01L5 to pool pool1 successfully.
2019-06-27 03:51:31 GLESL087I: Tape VTAP00L5 successfully formatted.
2019-06-27 03:51:31 GLESL360I: Assigned tape VTAP00L5 to pool pool1 successfully.
2019-06-27 03:51:32 GLESL087I: Tape VTAP02L5 successfully formatted.
2019-06-27 03:51:32 GLESL360I: Assigned tape VTAP02L5 to pool pool1 successfully.
Note: If a tape status is not OK and the state indicates errors such as check_tape_library,
it is most likely a temporary error caused by the tape library emulation. In such a case, the
following workaround might fix the tape status:
1. eeadm tape unassign <tape IDs> -p <pool ID>
2. eeadm library rescan
3. eeadm tape assign -<tape IDs> -p <pool ID> -f
4. eeadm cluster stop
5. eeadm cluster start
Manual migration
Manual migration includes the following objectives:
Check the file status to determine whether the file data is on disk or tape
Understand the migration policy rule file
Manually run the migration policy and verify that the file data was migrated
2. To determine whether file data is on disk or tape, run the dsmls command or eeadm file
state command.
Example 7 shows the dsmls command output. The “r” indicates that the files are resident
(on disk).
/ibm/fs1/archive:
10485760 10485760 10240 r - 10Mfile_1
10485760 10485760 10240 r - 10Mfile_2
10485760 10485760 10240 r - 10Mfile_3
2097152 2097152 2048 r - text
Example 8 shows the eeadm file state command output. The command also has a
shorter output version. All files are resident. The file data is on disk only.
Name: /ibm/fs1/archive/10Mfile_2
State: resident
Name: /ibm/fs1/archive/10Mfile_3
State: resident
Name: /ibm/fs1/archive/text
State: resident
9
The following file statuses are also available:
– Premigrated (p): Data is on disk and tape
– Migrated (m): Data is on tape only
3. Define a policy rule file to migrate these test files to tapes. The policy file is found at:
/root/policy/policy_migArchiveDir
The contents of the policy file are shown in Example 9.
define(
is_migrated,
MISC_ATTRIBUTES LIKE '%V%'
)
When this policy is run, it migrates all resident and pre-migrated files that are in the
/ibm/fs1/archive directory from disk to tape.
Note: For best practice, the following files are excluded from migration:
Files that are less than 1 MB in size
Files that have been modified within the last two minutes
4. Wait two minutes after the test file creation. Run the mmapplypolicy command to run the
policy, as shown in Example 10.
11
[I] A total of 4 files have been migrated, deleted or processed by an EXTERNAL
EXEC/script;
0 'skipped' files and/or errors.
When the mmapplypolicy command completes, the files should be migrated to tapes, as
shown in Example 11.
If the files are read, they are recalled from the tapes, as shown in Example 12. The recall
is triggered by file access.
The file status is p (premigrated). After a read of the migrated file, the data is recalled on to
disks. The data on the tape is still valid.
If you modify the migrated (or premigrated) file, the status is resident (r), as shown in
Example 13.
5. The migration can also be invoked by directly passing file names to the eeadm migrate
command, as shown in Example 14.
6. The eeadm task commands show the results of the completed commands and the status
of the running commands. For instance, the completed commands are shown in
Example 15.
13
1008 transparent_recall succeeded 2019-06-27_04:10:33 2019-06-27_04:10:33
2019-06-27_04:10:33
1009 transparent_recall succeeded 2019-06-27_04:10:33 2019-06-27_04:10:33
2019-06-27_04:10:33
1010 migrate succeeded 2019-06-27_04:11:19 2019-06-27_04:11:19
2019-06-27_04:11:20
Automatic migration
Automatic migration includes the following objectives:
Understand the policy rule for automatic migration
Configure the automatic migration
15
[root@spectrumscale ~]# cd /ibm/fs1/archive
[root@spectrumscale archive]# dd if=/dev/urandom of=10Mfile_1 count=1 bs=10M
[root@spectrumscale archive]# for i in $(seq 2 3); do cp 10Mfile_1 10Mfile_$i;
done
[root@spectrumscale archive]# sleep 120; mmapplypolicy /ibm/fs1/archive -P
/root/policy/policy_migArchiveDir
3. Define a policy rule for the auto migration. The policy file to be applied is found at
/root/policy/policy_migAuto. Unlike the policy_migArchiveDir file, the policy_migAuto
does not pick up files in premigrate state, as shown in Example 19. The THRESHOLD
statement is described later in this section.
define(
is_premigrated,
MISC_ATTRIBUTES LIKE '%M%' AND MISC_ATTRIBUTES NOT LIKE '%V%'
)
define(
is_migrated,
MISC_ATTRIBUTES LIKE '%V%'
)
4. Apply the policy by running the mmchpolicy command, as shown in Example 20.
The migration is started for the newly created files after approximately 2 minutes. The file
status changes from resident (R) to migrated (M), as shown in Example 21.
Example 22 Cleaning up
[root@spectrumscale archive]# mmdelcallback MIGRATION
mmdelcallback: mmsdrfs propagation completed.
17
[root@spectrumscale archive]# cd
[root@spectrumscale ~]# rm -rf /ibm/fs1/archive
Selective Recall
Selective recall includes running the eeadm recall command to start selective recalls.
Opposed to the transparent recall, a selective recall is an explicit recall of a specified group of
files. The specified files will be recalled in a performance optimized manner.
2. Run the eeadm recall command by passing the file names, as shown in Example 24.
Verify the file status as shown in the same example.
3. The eeadm recall command also accepts a file list. To test the command using a file list,
run a manual migration to migrate all test files, as shown in Example 25 on page 19.
4. Create a file list of the migrated test files and pass it to the eeadm recall command, as
shown in Example 26.
If a customer has millions of files to migrate, the find command can be slow. The
mmapplypolicy command can be used to generate the file list faster. As shown in
Example 27, the following policy rule can be used to find the files and run the eeadm
recall command: /root/policy/policy_recallArchiveDir.
Note: The rule ee_recall indicates data movement from pool1 (LTFS/tape) to system
(GPFS/disk).
19
5. To test the eeadm recall command by using mmapplypolicy, run a manual migration to
migrate all test files, as shown in Example 28.
6. Run the policy for selective recall and verify the status, as shown in Example 29.
7. By executing the selective recall command, the status of the specified files have
changed from migrated to premigrated. There is also an option to change the status from
migrated or premigrated to resident status. To test the resident status option, run eeadm
recall with the --resident option, as shown in Example 30.
8. Run eeadm task commands to see the results, as shown in Example 31.
21
Progress: 5 completed (or failed) files / 5 total files.
5 completed (or failed) files / 5 total files. [subtask:1028]
Result Summary: Selective Recall result: 5 succeeded, 0 failed, 0 duplicate, 0 not migrated, 0
not found.
- [subtask:1028]
REST API
IBM Spectrum Archive EE provides REST API support to obtain system configuration in
JSON format via HTTP/HTTPS. By default, port 7100 is used for the REST API. In the current
version, only GET method is supported and the configuration cannot be changed via REST
API. On this virtual appliance, the REST API is ready to use without additional configuration.
http://<host>:7100/ibmsa/v1/<resource>/<id>
Valid values for <resource> are “libraries”, “nodegroups”, “nodes”, “drives”, “pools”,
“tapes” and “tasks”.
3. Get JSON array of all pools via REST API, as shown in Example 35. A parameter pretty
can be used to make the output in a more human-readable format.
23
{
"active_space": 9,
"capacity": 29796335616,
"device_type": "LTO",
"fill_policy": "Default",
"format_class": "default",
"free_space": 29786898432,
"id": "0f68d0cb-3a99-4289-8ed3-f36cb6efea61",
"library_id": "1a5e8ded-ff95-4375-89a9-0c086f0cdcc8",
"library_name": "lib1",
"low_space_warning_enable": false,
"low_space_warning_threshold": 0,
"media_restriction": "^.{8}$",
"mode": "normal",
"mount_limit": 0,
"name": "pool1",
"no_space_warning_enable": false,
"nodegroup_name": "G0",
"non_appendable_space": 0,
"num_of_tapes": 3,
"owner": "System",
"reclaimable_space": 9437175,
"used_space": 9437184,
"worm": "no"
}
]
4. Similar JSON based output can be obtained by eeadm pool list command with the -json
option, as shown in Example 36.
For more information about the REST API feature, see IBM Knowledge Center here.
This section describes tips for using the mmapplypolicy command for IBM Spectrum Archive.
If the same files are passed to two different eeadm migrate commands at the same time, one
will fail as a duplicate.
Other tips
This section describes other migration tips:
How to write a rule to migrate to multiple pools:
Up to 3 pools can be specified by adding to OPTS definition in the policy rule file as shown
in Example 37.
25
The file size can be obtained by FILE_SIZE:
FILE_SIZE < 1024
If you want to select files that have not been accessed in the last 30 days, the following
condition can be used:
DAYS(CURRENT_TIMESTAMP) - DAYS(ACCESS_TIME) > 30
How to control the order of files to migrate, especially for auto migration:
WEIGHT can be used to control the order of files to migrate. Example 38 makes the
migration run from the files with the oldest POSIX access time.
Note: In physical multi-library configuration, if tapes with the same barcodes exist in
different libraries, the library ID also needs to be specified in the previous expression.
The policy rule can accept input arguments from mmapplypolicy -M <key>=<value>.
The /root/policy/policy_listFilesByTape is the full policy rule that accept the tape
name from the mmapplypolicy command. The executed example is shown in Example 39.
For more information about the specific policy rule syntax and the built-in features, see the
IBM Spectrum Scale Knowledge Center here.
27
Re-formatting tapes
You may quickly run out of tape space because the capacity of each virtual tape is much less
than a physical tape. When tapes become full, take the following steps to re-format the tapes:
1. Delete some GPFS files that are migrated to the pool.
2. Run the eeadm reconcile command.
3. Run the eeadm reclaim command.
Furthermore, if all GPFS files can be deleted, the following steps would be faster:
1. Delete all GPFS files that are migrated to the pool.
2. Run the eeadm tape unassign <tapes> with -E option.
3. Run the eeadm tape assign <tapes> with -f option.
Detailed logs
The following files include detailed messages and log information:
/var/log/messages
/var/log/ltfsee_trc.log
This log file is encoded. Run following to see the decoded log:
cat /var/log/ltfsee_trc.log | /opt/ibm/ltfsee/bin/ltfsee_catcsvlog
In addition, adding -L 6 to the mmapplypolicy command gives more information about the
policy execution.
More information
For more information about IBM Spectrum Archive, see the following resources:
IBM Knowledge Center:
https://www.ibm.com/support/knowledgecenter/ja/ST9MBR_1.3.0/ltfs_ee_ichome.html
IBM Redbooks publication IBM Spectrum Archive Enterprise Edition V1.3.0: Installation
and Configuration Guide, SG24-8333.
http://www.redbooks.ibm.com/abstracts/sg248333.html
For more information about IBM Spectrum Scale, see the following resource:
IBM Knowledge Center:
https://www.ibm.com/support/knowledgecenter/ja/STXKQY_5.0.3/ibmspectrumscale503
_welcome.html
Authors
This Redpaper was produced by a team of specialists from around the world working with the
International Technical Support Organization, Tucson Center.
Hiroyuki Miyoshi is an IBM software developer in Tokyo, who is currently engaged in IBM
Spectrum Archive development. He joined IBM in 2001 with a master’s degree in Electrical
Science at Waseda University and has been developing various storage products, such as
RAID subsystems, BladeCenter storage and switch, TS7700, SONAS, and IBM Spectrum
Scale. His areas of expertise include HSM (Hierarchical Storage Management) with tapes,
distributed file system, DR solution for scalable storage systems, and RAS features, such as
resource monitoring and callhome.
Hiroshi Araki is an IBM software developer in Tokyo, who is currently engaged in IBM
Spectrum Archive development. He joined IBM in 2008 with a master’s degree in Computer
Science at Keio University and worked in development of storage products, such as TS7700,
SONAS, V7000 Unified and IBM Spectrum Scale. In 2015, he joined IBM Spectrum Archive
development and has been working on various components, especially CLI command, REST
API, and Monitoring server. His areas of expertise include RAS and DR solutions for
enterprise storage systems.
Takeshi Ishimoto is the lead developer of Linear Tape File System (LTFS) and IBM
Spectrum Archive products since 2009, and he oversees the software architecture, usability,
and performance of IBM Spectrum Archive software. He is a contributor to the SNIA LTFS
Format Specification, and is currently co-chair of the SNIA LTFS Technical Working Group.
He is located in Tokyo and helping tape clients in Asia Pacific and other regions. He was the
speaker at IBM Systems Technical University (TechU) in 2017 and 2018, and speaker at SNIA
Storage Developer Conference 2019. Before storage development, he was developing
firmware and device drivers for mobile devices and their operating systems, such as
ThinkPad, WorkPad, Apple’s PowerBook, Windows, and Linux for IBM PowerPC®.
Larry Coyne
IBM Systems Worldwide Client Experience Centers
Yuka Sasaki
IBM Systems
29
Now you can become a published author, too
Here’s an opportunity to spotlight your skills, grow your career, and become a published
author, all at the same time. Join an ITSO residency project and help write a book in your area
of expertise, while honing your experience using leading-edge technologies. Your efforts will
help to increase product acceptance and customer satisfaction, as you expand your network
of technical contacts and relationships. Residencies run from two to six weeks in length, and
you can participate either in person or as a remote resident working from your home base.
Find out more about the residency program, browse the residency index, and apply online at:
ibm.com/redbooks/residencies.html
This information was developed for products and services offered in the US. This material might be available
from IBM in other languages. However, you may be required to own a copy of the product or product version in
that language in order to access it.
IBM may not offer the products, services, or features discussed in this document in other countries. Consult
your local IBM representative for information on the products and services currently available in your area. Any
reference to an IBM product, program, or service is not intended to state or imply that only that IBM product,
program, or service may be used. Any functionally equivalent product, program, or service that does not
infringe any IBM intellectual property right may be used instead. However, it is the user’s responsibility to
evaluate and verify the operation of any non-IBM product, program, or service.
IBM may have patents or pending patent applications covering subject matter described in this document. The
furnishing of this document does not grant you any license to these patents. You can send license inquiries, in
writing, to:
IBM Director of Licensing, IBM Corporation, North Castle Drive, MD-NC119, Armonk, NY 10504-1785, US
This information could include technical inaccuracies or typographical errors. Changes are periodically made
to the information herein; these changes will be incorporated in new editions of the publication. IBM may make
improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time
without notice.
Any references in this information to non-IBM websites are provided for convenience only and do not in any
manner serve as an endorsement of those websites. The materials at those websites are not part of the
materials for this IBM product and use of those websites is at your own risk.
IBM may use or distribute any of the information you provide in any way it believes appropriate without
incurring any obligation to you.
The performance data and client examples cited are presented for illustrative purposes only. Actual
performance results may vary depending on specific configurations and operating conditions.
Information concerning non-IBM products was obtained from the suppliers of those products, their published
announcements or other publicly available sources. IBM has not tested those products and cannot confirm the
accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the
capabilities of non-IBM products should be addressed to the suppliers of those products.
Statements regarding IBM’s future direction or intent are subject to change or withdrawal without notice, and
represent goals and objectives only.
This information contains examples of data and reports used in daily business operations. To illustrate them
as completely as possible, the examples include the names of individuals, companies, brands, and products.
All of these names are fictitious and any similarity to actual people or business enterprises is entirely
coincidental.
COPYRIGHT LICENSE:
This information contains sample application programs in source language, which illustrate programming
techniques on various operating platforms. You may copy, modify, and distribute these sample programs in
any form without payment to IBM, for the purposes of developing, using, marketing or distributing application
programs conforming to the application programming interface for the operating platform for which the sample
programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore,
cannot guarantee or imply reliability, serviceability, or function of these programs. The sample programs are
provided “AS IS”, without warranty of any kind. IBM shall not be liable for any damages arising out of your use
of the sample programs.
The following terms are trademarks or registered trademarks of International Business Machines Corporation,
and might also be trademarks or registered trademarks in other countries.
Redbooks (logo) ® IBM Spectrum® Redbooks®
IBM® PowerPC®
The registered trademark Linux® is used pursuant to a sublicense from the Linux Foundation, the exclusive
licensee of Linus Torvalds, owner of the mark on a worldwide basis.
Windows, and the Windows logo are trademarks of Microsoft Corporation in the United States, other
countries, or both.
Other company, product, or service names may be trademarks or service marks of others.
REDP-5384-01
ISBN 0738457922
Printed in U.S.A.
®
ibm.com/redbooks