Veritas Files System Adm Guide
Veritas Files System Adm Guide
Veritas Files System Adm Guide
Administrator's Guide
Solaris
N18516F
Legal Notice
Copyright 2006 Symantec Corporation.
All rights reserved.
Federal acquisitions: Commercial Software - Government Users Subject to Standard License
Terms and Conditions.
Symantec, the Symantec Logo, and Storage Foundation are trademarks or registered
trademarks of Symantec Corporation or its affiliates in the U.S. and other countries. Other
names may be trademarks of their respective owners.
The product described in this document is distributed under licenses restricting its use,
copying, distribution, and decompilation/reverse engineering. No part of this document
may be reproduced in any form by any means without prior written authorization of
Symantec Corporation and its licensors, if any.
THE DOCUMENTATION IS PROVIDED "AS IS" AND ALL EXPRESS OR IMPLIED CONDITIONS,
REPRESENTATIONS AND WARRANTIES, INCLUDING ANY IMPLIED WARRANTY OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR NON-INFRINGEMENT,
ARE DISCLAIMED, EXCEPT TO THE EXTENT THAT SUCH DISCLAIMERS ARE HELD TO
BE LEGALLY INVALID. SYMANTEC CORPORATION SHALL NOT BE LIABLE FOR INCIDENTAL
OR CONSEQUENTIAL DAMAGES IN CONNECTION WITH THE FURNISHING PERFORMANCE,
OR USE OF THIS DOCUMENTATION. THE INFORMATION CONTAINED IN THIS
DOCUMENTATION IS SUBJECT TO CHANGE WITHOUT NOTICE.
The Licensed Software and Documentation are deemed to be "commercial computer software"
and "commercial computer software documentation" as defined in FAR Sections 12.212 and
DFARS Section 227.7202.
Symantec Corporation 20330 Stevens Creek Blvd. Cupertino, CA 95014 USA
http://www.symantec.com
Printed in the United States of America.
10 9 8 7 6 5 4 3 2 1
Contents
Chapter 1
11
12
12
12
12
14
16
16
17
18
19
19
19
20
20
20
20
21
21
22
22
22
23
23
23
24
25
25
26
27
Contents
Chapter 2
Chapter 3
Extent attributes
About extent attributes .................................................................
Reservation: preallocating space to a file .....................................
Fixed extent size ....................................................................
Other controls .......................................................................
Commands related to extent attributes ............................................
Failure to preserve extent attributes ...........................................
Chapter 4
29
29
30
30
31
32
32
33
33
33
34
35
36
37
39
39
40
40
41
41
42
42
43
44
45
52
55
56
56
57
58
59
Contents
Chapter 5
Storage Checkpoints
About Storage Checkpoints ............................................................
How Storage Checkpoints differ from snapshots ...........................
How a Storage Checkpoint works .....................................................
Copy-on-write .......................................................................
Types of Storage Checkpoints .........................................................
Data Storage Checkpoints ........................................................
nodata Storage Checkpoints .....................................................
Removable Storage Checkpoints ................................................
Non-mountable Storage Checkpoints .........................................
Storage Checkpoint administration ..................................................
Creating a Storage Checkpoint ..................................................
Removing a Storage Checkpoint ................................................
Accessing a Storage Checkpoint ................................................
Converting a data Storage Checkpoint to a nodata Storage
Checkpoint .....................................................................
Space management considerations ..................................................
Restoring a file system from a Storage Checkpoint ..............................
Restoring a file from a Storage Checkpoint ..................................
Storage Checkpoint quotas .............................................................
Chapter 6
63
63
64
65
65
65
65
66
67
68
69
71
72
72
73
74
74
74
75
76
77
78
86
87
87
92
93
94
95
96
97
98
98
99
Contents
Chapter 7
Quotas
About quota limits ......................................................................
About quota files on &ProductNameLong; .......................................
About quota commands ...............................................................
About quota checking with Veritas File System .................................
................................................................................................
Using quotas .............................................................................
Turning on quotas ................................................................
Turning on quotas at mount time ............................................
Editing user and group and group and group and group
quotas ..........................................................................
Modifying time limits ............................................................
Viewing disk quotas and usage ................................................
Displaying blocks owned by users or groups ..............................
Turning off quotas ................................................................
Chapter 8
106
107
107
108
108
Chapter 9
101
102
103
104
104
104
104
105
109
110
112
114
118
118
118
119
120
120
121
122
123
124
124
125
125
126
127
127
128
Contents
Chapter 10
129
129
130
131
131
132
133
133
134
134
137
139
140
140
140
141
141
141
142
142
143
143
156
156
160
162
173
174
178
178
180
181
181
181
Contents
Chapter 11
Appendix A
185
186
186
186
186
187
187
187
188
188
189
190
190
191
192
192
193
195
196
197
197
199
199
Quick Reference
Command summary ....................................................................
Online manual pages ...................................................................
Creating a VxFS file system ..........................................................
Example of creating a file system .............................................
Converting a file system to VxFS ....................................................
Example of converting a file system .........................................
Mounting a file system ................................................................
Mount options .....................................................................
Example of mounting a file system ...........................................
Editing the vfstab file ............................................................
Unmounting a file system ............................................................
Example of unmounting a file system .......................................
Displaying information on mounted file systems ..............................
Example of displaying information on mounted file systems .........
201
205
211
211
212
212
213
214
215
215
216
216
216
217
Contents
Appendix B
Diagnostic messages
File system response to problems ...................................................
Recovering a disabled file system .............................................
About kernel messages ................................................................
About global message IDs .......................................................
Kernel messages .........................................................................
About unique message identifiers ..................................................
Unique message identifiers ..........................................................
Appendix C
Index
229
230
230
230
231
272
272
Disk layout
About disk layouts ......................................................................
About disk space allocation ..........................................................
VxFS Version 4 disk layout ...........................................................
VxFS Version 5 disk layout ...........................................................
VxFS Version 6 disk layout ...........................................................
VxFS Version 7 disk layout ...........................................................
Using UNIX Commands on File Systems Larger than One TB ..............
Glossary
217
218
218
219
220
220
221
222
223
223
224
225
225
225
226
227
227
277
278
279
281
282
283
283
10
Contents
Chapter
Logging
Extents
12
Logging
A key aspect of any file system is how to recover if a system crash occurs. Earlier
methods required a time-consuming scan of the entire file system. A better solution
is the method logging (or journaling) the metadata of files.
VxFS logs new attribute information into a reserved area of the file system,
whenever file system changes occur. The file system writes the actual data to disk
only after the write of the metadata to the log is complete. If and when a system
crash occurs, the system recovery code analyzes the metadata log and try to clean
up only those files. Without logging, a file system check (fsck) must look at all of
the metadata.
Intent logging minimizes system downtime after abnormal shutdowns by logging
file system transactions. When the system is halted unexpectedly, this log can be
replayed and outstanding transactions completed. The check and repair time for
file systems can be reduced to a few seconds, regardless of the file system size.
By default, VxFS file systems log file transactions before they are committed to
disk, reducing time spent checking and repairing file systems after the system is
halted unexpectedly.
Extents
An extent is a contiguous area of storage in a computer file system, reserved for
a file. When starting to write to a file, a whole extent is allocated. When writing
to the file again, the data continues where the previous write left off. This reduces
or eliminates file fragmentation.
Since VxFS is an extent-based file system, addressing is done through extents
(which can consist of multiple blocks) rather than in single blocks segments.
Extents can therefore enhance file system throughput.
Extent-based allocation
Extents allow disk I/O to take place in units of multiple blocks if storage is
allocated in consecutive blocks.
Extent attributes
Extent attributes are the extent allocation policies associated with a file.
Storage Checkpoints
Backup and restore applications can leverage Storage Checkpoint, a disk- and
I/O-efficient copying technology for creating periodic frozen images of a file
system.
Online backup
VxFS provides online data backup using the snapshot feature.
Quotas
VxFS supports quotas, which allocate per-user and per-group quotas and limit
the use of two principal resources: files and data blocks.
13
14
Multi-volume support
The multi-volume support feature allows several volumes to be represented
by a single logical object.
Note: VxFS supports all UFS file system features and facilities except for linking,
removing, or renaming . and .. directory entries. These operations may
disrupt file system operations.
Extent-based allocation
Disk space is allocated in 512-byte sectors to form logical blocks. VxFS supports
logical block sizes of 1024, 2048, 4096, and 8192 bytes. The default block size
is 1K. For file systems up to 4 TB, the block size is 1K. 2K for file systems up to
8TB, 4K for file systems up to 16 TB, and 8K for file systems beyond this size.
An extent is defined as one or more adjacent blocks of data within the file system.
An extent is presented as an address-length pair, which identifies the starting
block address and the length of the extent (in file system or logical blocks). VxFS
allocates storage in groups of extents rather than a block at a time.
Extents allow disk I/O to take place in units of multiple blocks if storage is allocated
in consecutive blocks. For sequential I/O, multiple block operations are
considerably faster than block-at-a-time operations; almost all disk drives accept
I/O operations of multiple blocks.
Extent allocation only slightly alters the interpretation of addressed blocks from
the inode structure compared to block based inodes. A VxFS inode references 10
direct extents, each of which are pairs of starting block addresses and lengths in
blocks.
The VxFS inode supports different types of extents, namely ext4 and typed. Inodes
with ext4 extents also point to two indirect address extents, which contain the
addresses of first and second extents:
first
Used for single indirection. Each entry in the extent indicates the
starting block number of an indirect data extent
second
Used for double indirection. Each entry in the extent indicates the
starting block number of a single indirect address extent.
Each indirect address extent is 8K long and contains 2048 entries. All indirect
data extents for a file must be the same size; this size is set when the first indirect
data extent is allocated and stored in the inode. Directory inodes always use an
8K indirect data extent size. By default, regular file inodes also use an 8K indirect
data extent size that can be altered with vxtunefs; these inodes allocate the
indirect data extents in clusters to simulate larger extents.
Typed extents
VxFS has an inode block map organization for indirect extents known as typed
extents. Each entry in the block map has a typed descriptor record containing a
type, offset, starting block, and number of blocks.
Indirect and data extents use this format to identify logical file offsets and physical
disk locations of any given extent.
The extent descriptor fields are defined as follows:
type
offset
Represents the logical file offset in blocks for a given descriptor. Used
to optimize lookups and eliminate hole descriptor entries.
starting block
number of blocks
15
16
Indirect address blocks are fully typed and may have variable lengths up to a
maximum and optimum size of 8K. On a fragmented file system, indirect
extents may be smaller than 8K depending on space availability. VxFS always
tries to obtain 8K indirect extents but resorts to smaller indirects if necessary.
Indirect data extents are variable in size to allow files to allocate large,
contiguous extents and take full advantage of optimized I/O in VxFS.
Holes in sparse files require no storage and are eliminated by typed records.
A hole is determined by adding the offset and length of a descriptor and
comparing the result with the offset of the next record.
While there are no limits on the levels of indirection, lower levels are expected
in this format since data extents have variable lengths.
This format uses a type indicator that determines its record format and content
and accommodates new requirements and functionality for future types.
The current typed format is used on regular files and directories only when
indirection is needed. Typed records are longer than the previous format and
require less direct entries in the inode. Newly created files start out using the old
format, which allows for ten direct extents in the inode. The inode's block map is
converted to the typed format when indirection is needed to offer the advantages
of both formats.
Extent attributes
VxFS allocates disk space to files in groups of one or more extents. VxFS also
allows applications to control some aspects of the extent allocation. Extent
attributes are the extent allocation policies associated with a file.
The setext and getext commands allow the administrator to set or view extent
attributes associated with a file, as well as to preallocate space for a file.
See the setext(1) and getext(1) manual pages.
The vxtunefs command allows the administrator to set or view the default indirect
data extent size.
See the vxtunefs(1M) manual page.
17
18
However, recent changes made to a system can be lost if a system failure occurs.
Specifically, attribute changes to files and recently created files may disappear.
The mount -o log intent logging option guarantees that all structural changes to
the file system are logged to disk before the system call returns to the application.
With this option, the rename(2) system call flushes the source file to disk to
guarantee the persistence of the file data before renaming it. The rename() call is
also guaranteed to be persistent when the system call returns. The changes to file
system data and metadata caused by the fsync(2) and fdatasync(2) system calls
are guaranteed to be persistent once the calls return.
19
20
Storage Checkpoints
To increase availability, recoverability, and performance, Veritas File System
offers on-disk and online backup and restore capabilities that facilitate frequent
and efficient backup strategies. Backup and restore applications can leverage a
Storage Checkpoint, a disk- and I/O-efficient copying technology for creating
periodic frozen images of a file system. Storage Checkpoints present a view of a
file system at a point in time, and subsequently identifies and maintains copies
of the original file system blocks. Instead of using a disk-based mirroring method,
Storage Checkpoints save disk space and significantly reduce I/O overhead by
using the free space pool available to a file system.
Storage Checkpoint functionality is separately licensed.
Online backup
VxFS provides online data backup using the snapshot feature. An image of a
mounted file system instantly becomes an exact read-only copy of the file system
at a specific point in time. The original file system is called the snapped file system,
the copy is called the snapshot.
When changes are made to the snapped file system, the old data is copied to the
snapshot. When the snapshot is read, data that has not changed is read from the
snapped file system, changed data is read from the snapshot.
Backups require one of the following methods:
Copying selected files from the snapshot file system (using find and cpio)
Quotas
VxFS supports quotas, which allocate per-user and per-group quotas and limit
the use of two principal resources: files and data blocks. You can assign quotas
for each of these resources. Each quota consists of two limits for each resource:
hard limit and soft limit.
The hard limit represents an absolute limit on data blocks or files. A user can
never exceed the hard limit under any circumstances.
The soft limit is lower than the hard limit and can be exceeded for a limited amount
of time. This allows users to exceed limits temporarily as long as they fall under
those limits before the allotted time expires.
See About quota limits on page 101.
Commercial database servers such as Oracle Server can issue kernel supported
asynchronous I/O calls on these pseudo devices but not on regular files.
Server can issue kernel
supported asynchronous I/O calls on these pseudo devices but not on regular
files.
read() and write() system calls issued by the database server can avoid the
acquisition and release of read/write locks inside the kernel that take place
on regular files.
VxFS can avoid double buffering of data already buffered by the database
server. This ability frees up resources for other purposes and results in better
performance.
21
22
Since I/O to these devices bypasses the system buffer cache, VxFS saves on
the cost of copying data between user space and kernel space when data is
read from or written to a regular file. This process significantly reduces CPU
time per I/O transaction compared to that of buffered I/O.
scan an entire file system searching for modifications since a previous scan. FCL
functionality is a separately licensed feature.
See About the File Change Log file on page 109.
Multi-volume support
The multi-volume support (MVS) feature allows several volumes to be represented
by a single logical object. All I/O to and from an underlying logical volume is
directed by way of volume sets. This feature can be used only in conjunction with
VxVM. MVS functionality is a separately licensed feature.
See About multi-volume support on page 118.
Caching advisories
23
24
VxVM integration
VxFS interfaces with VxVM to determine the I/O characteristics of the underlying
volume and perform I/O accordingly. VxFS also uses this information when using
mkfs to perform proper allocation unit alignments for efficient I/O operations
from the kernel. VxFS also uses this information when using mkfs to perform
proper allocation unit alignments for efficient I/O operations from the kernel.
As part of VxFS/VxVM integration, VxVM exports a set of I/O parameters to
achieve better I/O performance. This interface can enhance performance for
different volume configurations such as RAID-5, striped, and mirrored volumes.
Full stripe writes are important in a RAID-5 volume for strong I/O performance.
VxFS uses these parameters to issue appropriate I/O requests to VxVM.
Application-specific parameters
You can also set application specific parameters on a per-file system basis to
improve I/O performance.
25
26
Deframentation
About defragmentation
Free resources are initially aligned and allocated to files in an order that provides
optimal performance. On an active file system, the original order of free resources
is lost over time as files are created, removed, and resized. The file system is
spread farther along the disk, leaving unused gaps or fragments between areas
that are in use. This process is known as fragmentation and leads to degraded
performance because the file system has fewer options when assigning a free
extent to a file (a group of contiguous data blocks).
VxFS provides the online administration utility fsadm to resolve the problem of
fragmentation.
The fsadm utility defragments a mounted file system by performing the following
actions:
This utility can run on demand and should be scheduled regularly as a cron job.
Because these functions are provided using VxFS-specific ioctl system calls, most
existing UNIX system applications do not use them. The VxFS-specific cp, cpio,
and mv utilities use the functions to preserve extent attributes and allocate space
more efficiently. The current attributes of a file can be listed using the getext or
VxFS-specific ls command. The functions can also improve performance for
custom applications. For portability reasons, these applications must check which
file system type they are using before using these functions.
27
28
Chapter
VxFS performance:
creating, mounting, and
tuning File Systems
This chapter includes the following topics:
Tuning I/O
Block size
The unit of allocation in VxFS is a block. Unlike some other UNIX file systems,
VxFS does not make use of block fragments for allocation because storage is
allocated in extents that consist of one or more blocks.
30
You specify the block size when creating a file system by using the mkfs o bsize
option. The block size cannot be altered after the file system is created. The
smallest available block size for VxFS is 1K, which is also the default block size.
Choose a block size based on the type of application being run. For example, if
there are many small files, a 1K block size may save space. For large file systems,
with relatively few files, a larger block size is more appropriate. Larger block sizes
use less disk space in file system overhead, but consume more space for files that
are not a multiple of the block size. The easiest way to judge which block sizes
provide the greatest system efficiency is to try representative system loads against
various sizes and pick the fastest. For most applications, it is best to use the default
values.
For 64-bit kernels, which support 32 terabyte file systems, the block size
determines the maximum size of the file system you can create. File systems up
to 4 TB require a 1K block size. For four to eight terabyte file systems, the block
size is 2K, For file systems between 8 and 16 TB, block size is 4K, and for greater
than 16 TB, the block size is 8K. If you specify the file system size when creating
a file system, the block size defaults to these values.
log
delaylog
tmplog
logsize
nodatainlog
blkclear
minicache
convosync
ioerror
largefiles|nolorgefiles
cio
Caching behavior can be altered with the mincache option, and the behavior of
O_SYNC and D_SYNC writes can be altered with the convosync option.
See the fcntl(2) manual page.
The delaylog and tmplog modes can significantly improve performance. The
improvement over log mode is typically about 15 to 20 percent with delaylog; with
tmplog, the improvement is even higher. Performance improvement varies,
depending on the operations being performed and the workload. Read/write
intensive loads should show less improvement, while file system structure
intensive loads (such as mkdir, create, and rename) may show over 100 percent
improvement. The best way to select a mode is to test representative system loads
against the logging modes and compare the performance results.
Most of the modes can be used in combination. For example, a desktop machine
might use both the blkclear and mincache=closesync modes.
See the mount_vxfs(1M) manual page.
31
32
Persistence guarantees
In all logging modes, VxFS is fully POSIX compliant. The effects of the fsync(2)
and fdatasync(2) system calls are guaranteed to be persistent after the calls return.
The persistence guarantees for data or metadata modified by write(2), writev(2),
or pwrite(2) are not affected by the logging mount options. The effects of these
system calls are guaranteed to be persistent only if the O_SYNC, O_DSYNC,
VX_DSYNC, or VX_DIRECT flag, as modified by the convosync= mount option, has
been specified for the file descriptor.
The behavior of NFS servers on a VxFS file system is unaffected by the log and
tmplog mount options, but not delaylog. In all cases except for tmplog, VxFS
complies with the persistency requirements of the NFS v2 and NFS v3 standard.
Unless a UNIX application has been developed specifically for the VxFS file system
in log mode, it expects the persistence guarantees offered by most other file
systems and experiences improved robustness when used with a VxFS file system
mounted in delaylog mode. Applications that expect better persistence guarantees
than that offered by most other file systems can benefit from the log, mincache=,
and closesync mount options. However, most commercially available applications
work well with the default VxFS mount options, including the delaylog mode.
33
34
mincache=closesync
mincache=direct
mincache=dsync
mincache=unbuffered
mincache=tmpcache
If performance is more important than data integrity, you can use the
mincache=tmpcache mode. The mincache=tmpcache mode disables special delayed
extending write handling, trading off less integrity for better performance. Unlike
the other mincache modes, tmpcache does not flush the file to disk the file is
closed. When the mincache=tmpcache option is used, bad data can appear in a
file that was being extended when a crash occurred.
convosync=closesync
Note: The convosync=closesync mode converts synchronous and data
synchronous writes to non-synchronous writes and flushes the changes to the
file to disk when the file is closed.
convosync=delay
convosync=direct
convosync=dsync
Note: The convosync=dsync option violates POSIX guarantees for synchronous
I/O.
convosync=unbuffered
35
36
As with closesync, the direct, unbuffered, and dsync modes flush changes to the
file to disk when it is closed. These modes can be used to speed up applications
that use synchronous I/O. Many applications that are concerned with data integrity
specify the O_SYNC fcntl in order to write the file data synchronously. However,
this has the undesirable side effect of updating inode times and therefore slowing
down performance. The convosync=dsync, convosync=unbuffered, and
convosync=direct modes alleviate this problem by allowing applications to take
advantage of synchronous writes without modifying inode times as well.
Before using convosync=dsync, convosync=unbuffered, or convosync=direct,
make sure that all applications that use the file system do not require synchronous
inode time updates for O_SYNC writes.
disable
nodisable
wdisable
mwdisable
mdisable
VxFS supports files larger than two terabytes. Files larger than 32 terabytes can
be created only on 64-bit kernel operating systems and on a Veritas Volume
Manager volume.
37
38
Note: Applications and utilities such as backup may experience problems if they
are not aware of large files. In such a case, create your file system without large
file capability.
Specifying largefiles sets the largefiles flag. This lets the file system to hold files
that are two terabytes or larger. This is the default option.
To clear the flag and prevent large files from being created, type the following
command:
# mkfs -F vxfs -o nolargefiles special_device size
To determine the current status of the largefiles flag, type either of the following
commands:
# mkfs -F vxfs -m special_device
This guarantees that when a file is closed, its data is synchronized to disk and
cannot be lost. Thus, after an application has exited and its files are closed, no
data is lost even if the system is immediately turned off.
To mount a temporary file system or to restore from backup, type the following:
# mount -F vxfs -o tmplog,convosync=delay,mincache=tmpcache \
/dev/dsk/c1t3d0s1 /mnt
This combination might be used for a temporary file system where performance
is more important than absolute data integrity. Any O_SYNC writes are performed
as delayed writes and delayed extending writes are not handled. This could result
in a file that contains corrupted data if the system crashes. Any file written 30
seconds or so before a crash may contain corrupted data or be missing if this
mount combination is in effect. However, such a file system does significantly
39
40
less disk writes than a log file system, and should have significantly better
performance, depending on the application.
To mount a file system for synchronous writes, type the following:
# mount -F vxfs -o log,convosync=dsync /dev/dsk/c1t3d0s1 /mnt
vx_maxlink
VxFS caches inodes in an inode table. The tunable for VxFS to determine the
number of entries in its inode table is vxfs_ninode.
VxFS uses the value of vxfs_ninode in /etc/system as the number of entries in the
VxFS inode table. By default, the file system uses a value of vxfs_ninode, which
is computed based on system memory size.
To increase the internal inode table size
It may be necessary to tune the dnlc (directory name lookup cache) size to keep
the value within an acceptable range relative to vxfs_ninode. It must be within
80% of vxfs_ninode to avoid spurious ENFILE errors or excessive CPU
consumption, but must be more than 50% of vxfs_ninode to maintain good
performance. The variable ncsize determines the size of dnlc. The default value
vx_maxlink
The VxFS vx_maxlink tunable determines the number of sub-directories that can
be created under a directory.
A VxFS file system obtains the value of vx_maxlink from the system configuration
file /etc/system. By default, vx_maxlink is 32K. To change the computed value of
vx_maxlink, you can add an entry to the system configuration file. For example:
set vxfs:vx_maxlink = 65534
vol_maxio
The vol_maxio parameter controls the maximum size of logical I/O operations
that can be performed without breaking up a request. Logical I/O requests larger
than this value are broken up and performed synchronously. Physical I/Os are
broken up based on the capabilities of the disk device and are unaffected by
changes to the vol_maxio logical request limit.
Raising the vol_maxio limit can cause problems if the size of an I/O requires more
memory or kernel mapping space than exists. The recommended maximum for
vol_maxio is 20% of the smaller of physical memory or kernel virtual memory. It
is not advisable to go over this limit. Within this limit, you can generally obtain
the best results by setting vol_maxio to the size of your largest stripe. This applies
to both RAID-0 striping and RAID-5 striping.
To increase the value of vol_maxio, add an entry to /etc/system (after the entry
forceload:drv/vxio) and reboot for the change to take effect. For example, the
following line sets the maximum I/O size to 16 MB:
41
42
set vxio:vol_maxio=32768
Monitoring fragmentation
Fragmentation reduces performance and availability. Regular use of fsadm's
fragmentation reporting and reorganization facilities is therefore advisable.
The easiest way to ensure that fragmentation does not become a problem is to
schedule regular defragmentation runs using the cron command.
Defragmentation scheduling should range from weekly (for frequently used file
systems) to monthly (for infrequently used file systems). Extent fragmentation
should be monitored with fsadm command.
To determine the degree of fragmentation, use the following factors:
Less than 1 percent of free space in extents of less than 8 blocks in length
Less than 5 percent of free space in extents of less than 64 blocks in length
More than 5 percent of the total file system size available as free extents in
lengths of 64 or more blocks
A badly fragmented file system has one or more of the following characteristics:
Greater than 5 percent of free space in extents of less than 8 blocks in length
More than 50 percent of free space in extents of less than 64 blocks in length
Less than 5 percent of the total file system size available as free extents in
lengths of 64 or more blocks
The optimal period for scheduling of extent reorganization runs can be determined
by choosing a reasonable interval, scheduling fsadm runs at the initial interval,
and running the extent fragmentation report feature of fsadm before and after
the reorganization.
The before result is the degree of fragmentation prior to the reorganization. If
the degree of fragmentation is approaching the figures for bad fragmentation,
reduce the interval between fsadm runs. If the degree of fragmentation is low,
increase the interval between fsadm runs.
The after result is an indication of how well the reorganizer has performed. The
degree of fragmentation should be close to the characteristics of an unfragmented
file system. If not, it may be a good idea to resize the file system; full file systems
tend to fragment and are difficult to defragment. It is also possible that the
reorganization is not being performed at a time during which the file system in
question is relatively idle.
Directory reorganization is not nearly as critical as extent reorganization, but
regular directory reorganization improves performance. It is advisable to schedule
directory reorganization for file systems when the extent reorganization is
scheduled. The following is a sample script that is run periodically at 3:00 A.M.
from cron for a number of file systems:
outfile=/usr/spool/fsadm/out./bin/date +'%m%d'
for i in /home /home2 /project /db
do
/bin/echo "Reorganizing $i"
/bin/timex fsadm -F vxfs -e -E -s $i
/bin/timex fsadm -F vxfs -s -d -D $i
done > $outfile 2>&1
Tuning I/O
The performance of a file system can be enhanced by a suitable choice of I/O sizes
and proper alignment of the I/O requests based on the requirements of the
underlying special device. VxFS provides tools to tune the file systems.
43
44
Note: The following tunables and the techniques work on a per file system basis.
Use them judiciously based on the underlying device properties and characteristics
of the applications that use the file system.
VxVM queries
VxVM receives the following queries during configuration:
The file system queries VxVM to determine the geometry of the underlying
volume and automatically sets the I/O parameters.
Note: When using file systems in multiple volume sets, VxFS sets the VxFS
tunables based on the geometry of the first component volume (volume 0) in
the volume set.
The mkfs command queries VxVM when the file system is created to
automatically align the file system to the volume geometry. If the default
alignment from mkfs is not acceptable, the -o align=n option can be used to
override alignment information obtained from VxVM.
The mount command queries VxVM when the file system is mounted and
downloads the I/O parameters.
If the default parameters are not acceptable or the file system is being used without
VxVM, then the /etc/vx/tunefstab file can be used to set values for I/O parameters.
The mount command reads the /etc/vx/tunefstab file and downloads any
parameters specified for a file system. The tunefstab file overrides any values
obtained from VxVM. While the file system is mounted, any I/O parameters can
be changed using the vxtunefs command which can have tunables specified on
the command line or can read them from the /etc/vx/tunefstab file. For more
details,
See the vxtunefs(1M) and tunefstab(4) manual pages.
The vxtunefs command can be used to print the current values of the I/O
parameters.
Parameter
Description
read_pref_io
The preferred read request size. The file system uses this
in conjunction with the read_nstream value to determine
how much data to read ahead. The default value is 64K.
write_pref_io
The preferred write request size. The file system uses this
in conjunction with the write_nstream value to determine
how to do flush behind on writes. The default value is 64K.
read_nstream
45
46
Table 2-1
Parameter
Description
write_nstream
default_indir_ size
discovered_direct_iosz
Table 2-1
Parameter
Description
fcl_keeptime
fcl_maxalloc
47
48
Table 2-1
Parameter
Description
fcl_winterval
hsm_write_ prealloc
Table 2-1
Parameter
Description
initial_extent_size
inode_aging_count
inode_aging_size
49
50
Table 2-1
Parameter
Description
max_direct_iosz
max_diskq
max_seqio_extent_size
qio_cache_enable
Table 2-1
Parameter
Description
read_ahead
51
52
Table 2-1
Parameter
Description
write_throttle
Note: VxFS does not query VxVM with multiple volume sets. To improve I/O
performance when using multiple volume sets, use the vxtunefs command.
If the file system is being used with a hardware disk array or volume manager
other than VxVM, try to align the parameters to match the geometry of the logical
disk. With striping or RAID-5, it is common to set read_pref_io to the stripe unit
size and read_nstream to the number of columns in the stripe. For striped arrays,
use the same values for write_pref_io and write_nstream, but for RAID-5 arrays,
set write_pref_io to the full stripe size and write_nstream to 1.
For an application to do efficient disk I/O, it should use the following formula to
issue read requests:
53
54
Chapter
Extent attributes
This chapter includes the following topics:
56
Extent attributes
About extent attributes
Some of the extent attributes are persistent and become part of the on-disk
information about the file, while other attributes are temporary and are lost after
the file is closed or the system is rebooted. The persistent attributes are similar
to the file's permissions and are written in the inode for the file. When a file is
copied, moved, or archived, only the persistent attributes of the source file are
preserved in the new file.
See Other controls on page 57.
In general, the user will only set extent attributes for reservation. Many of the
attributes are designed for applications that are tuned to a particular pattern of
I/O or disk alignment.
See the mkfs_vxfs(1M) manual page.
See About VxFS I/O on page 61.
Extent attributes
About extent attributes
because the unused space fragments free space by breaking large extents into
smaller pieces. By erring on the side of minimizing fragmentation for the file
system, files may become so non-contiguous that their I/O characteristics would
degrade.
Fixed extent sizes are particularly appropriate in the following situations:
If a file is large and contiguous, a large fixed extent size can minimize the
number of extents in the file.
Custom applications may also use fixed extent sizes for specific reasons, such as
the need to align extents to cylinder or striping boundaries on disk.
Other controls
The auxiliary controls on extent attributes determine the following conditions:
When the space reserved for a file will actually become part of the file
Alignment
Specific alignment restrictions coordinate a file's allocations with a particular
I/O pattern or disk alignment. Alignment can only be specified if a fixed extent
size has also been set. Setting alignment restrictions on allocations is best left to
well-designed applications.
See the mkfs_vxfs(1M) manual page.
See About VxFS I/O on page 61.
Contiguity
A reservation request can specify that its allocation remain contiguous (all one
extent). Maximum contiguity of a file optimizes its I/O characteristics.
Note: Fixed extent sizes or alignment cause a file system to return an error message
reporting insufficient space if no suitably sized (or aligned) extent is available.
This can happen even if the file system has sufficient free space and the fixed
extent size is large.
57
58
Extent attributes
Commands related to extent attributes
Reservation trimming
A reservation request can specify that any unused reservation be released when
the file is closed. The file is not completely closed until all processes open against
the file have closed it.
Reservation persistence
A reservation request can ensure that the reservation does not become a persistent
attribute of the file. The unused reservation is discarded when the file is closed.
Extent attributes
Commands related to extent attributes
source file system, or lacks free extents appropriate to satisfy the extent attribute
requirements.
The -e option takes any of the following keywords as an argument:
warn
force
ignore
The file system receiving a copied, moved, or restored file from an archive is
not a VxFS type. Since other file system types do not support the extent
attributes of the VxFS file system, the attributes of the source file are lost
during the migration.
The file system receiving a copied, moved, or restored file is a VxFS type but
does not have enough free space to satisfy the extent attributes. For example,
consider a 50K file and a reservation of 1 MB. If the target file system has 500K
free, it could easily hold the file but fail to satisfy the reservation.
The file system receiving a copied, moved, or restored file from an archive is
a VxFS type but the different block sizes of the source and target file system
make extent attributes impossible to maintain. For example, consider a source
file system of block size 1024, a target file system of block size 4096, and a file
that has a fixed extent size of 3 blocks (3072 bytes). This fixed extent size
adapts to the source file system but cannot translate onto the target file system.
The same source and target file systems in the preceding example with a file
carrying a fixed extent size of 4 could preserve the attribute; a 4 block (4096
byte) extent on the source file system would translate into a 1 block extent on
the target.
On a system with mixed block sizes, a copy, move, or restoration operation
may or may not succeed in preserving attributes. It is recommended that the
same block size be used for all file systems on a given system.
59
60
Extent attributes
Commands related to extent attributes
Chapter
Cache advisories
Sequential
For sequential I/O, VxFS employs a read-ahead policy by default when the
application is reading data. For writing, it allocates contiguous blocks if possible.
In most cases, VxFS handles I/O that is sequential through buffered I/O. VxFS
handles random or nonsequential I/O using direct I/O without buffering.
VxFS provides a set of I/O cache advisories for use when accessing files.
See the Veritas File System Programmer's Reference Guide.
See the vxfsio(7) manual page.
62
Direct I/O
Direct I/O is an unbuffered form of I/O. If the VX_DIRECT advisory is set, the user
is requesting direct data transfer between the disk and the user-supplied buffer
for reads and writes. This bypasses the kernel buffering of data, and reduces the
CPU overhead associated with I/O by eliminating the data copy between the kernel
buffer and the user's buffer. This also avoids taking up space in the buffer cache
that might be better used for something else. The direct I/O feature can provide
significant performance gains for some applications.
The direct I/O and VX_DIRECT advisories are maintained on a per-file-descriptor
basis.
The ending file offset must be aligned to a 512-byte boundary, or the length
must be a multiple of 512 bytes.
Unbuffered I/O
If the VX_UNBUFFERED advisory is set, I/O behavior is the same as direct I/O
with the VX_DIRECT advisory set, so the alignment constraints that apply to
direct I/O also apply to unbuffered I/O. For unbuffered I/O, however, if the file is
being extended, or storage is being allocated to the file, inode changes are not
updated synchronously before the write returns to the user. The VX_UNBUFFERED
advisory is maintained on a per-file-descriptor basis.
63
64
to the user. If the file is extended by the operation, the inode is written before the
write returns.
The direct I/O and VX_DSYNC advisories are maintained on a per-file-descriptor
basis.
Cache advisories
VxFS allows an application to set cache advisories for use when accessing files.
VxFS cache advisories enable applications to help monitor the buffer cache and
provide information on how better to tune the buffer cache to improve performance
gain.
The basic function of the cache advisory is to let you know whether you could
have avoided a later re-read of block X if the buffer cache had been a little larger.
Conversely, the cache advisory can also let you know that you could safely reduce
the buffer cache size without putting block X into jeopardy.
These advisories are in memory only and do not persist across reboots. Some
advisories are currently maintained on a per-file, not a per-file-descriptor, basis.
Only one set of advisories can be in effect for all accesses to the file. If two
conflicting applications set different advisories, both must use the advisories that
were last set.
All advisories are set using the VX_SETCACHE ioctl command. The current set of
advisories can be obtained with the VX_GETCACHE ioctl command.
See the vxfsio(7) manual page.
65
66
decisions about the I/O sizes issued to VxFS for a file or file device. For more
details on this ioctl, refer to the vxfsio(7) manual page.
For a discussion on various I/O parameters, refer to VxFS performance: creating,
mounting, and tuning File Systems on page 31 and the vxtunefs(1M) manual
page.
Chapter
Storage Checkpoints
This chapter includes the following topics:
The ability for data to be immediately writeable by preserving the file system
metadata, the directory hierarchy, and user data.
Storage Checkpoints are actually data objects that are managed and controlled
by the file system. You can create, remove, and rename Storage Checkpoints
because they are data objects with associated names.
68
Storage Checkpoints
About Storage Checkpoints
Create a stable image of the file system that can be backed upt backup to tape.
Provide a mounted, on-disk backup of the file system so that end users can
restore their own files in the event of accidental deletion. This is especially
useful in a home directory, engineering, or email environment.
Create an on-disk backup of the file system in that can be used addition to a
traditional tape-based backup to provide faster backup and restore capabilities.
Have multiple, read-only Storage Checkpoints that reduce I/O operations and
required storage space because the most recent Storage Checkpoint is the only
one that accumulates updates from the primary file system.
Storage Checkpoints
How a Storage Checkpoint works
69
70
Storage Checkpoints
How a Storage Checkpoint works
Figure 5-1
Primary fileset
Storage Checkpoint
/database
emp.dbf
/database
jun.dbf
emp.dbf
jun.dbf
In Figure 5-2, a square represents each block of the file system. This figure shows
a Storage Checkpoint containing pointers to the primary fileset at the time the
Storage Checkpoint is taken, as in Figure 5-1.
Figure 5-2
Primary fileset
Storage Checkpoints
How a Storage Checkpoint works
The Storage Checkpoint presents the exact image of the file system by finding
the data from the primary fileset. As the primary fileset is updated, the original
data is copied to the Storage Checkpoint before the new data is written. When a
write operation changes a specific data block in the primary fileset, the old data
is first read and copied to the Storage Checkpoint before the primary fileset is
updated. Subsequent writes to the specified data block on the primary fileset do
not result in additional updates to the Storage Checkpoint because the old data
needs to be saved only once. As blocks in the primary fileset continue to change,
the Storage Checkpoint accumulates the original data blocks.
Copy-on-write
In Figure 5-3, the third block originally containing C is updated.
Before the block is updated with new data, the original data is copied to the Storage
Checkpoint. This is called the copy-on-write technique, which allows the Storage
Checkpoint to preserve the image of the primary fileset when the Storage
Checkpoint is taken.
Every update or write operation does not necessarily result in the process of
copying data to the Storage Checkpoint. In this example, subsequent updates to
this block, now containing C', are not copied to the Storage Checkpoint because
the original image of the block containing C is already saved.
71
72
Storage Checkpoints
Types of Storage Checkpoints
Figure 5-3
Primary fileset
Storage Checkpoint
Storage Checkpoints
Types of Storage Checkpoints
limit the life of data Storage Checkpoints to minimize the impact on system
resources.
See Showing the difference between a data and a nodata Storage Checkpoint
on page 79.
See Showing the difference between a data and a nodata Storage Checkpoint
on page 79.
73
74
Storage Checkpoints
Storage Checkpoint administration
Storage Checkpoints
Storage Checkpoint administration
ctime
= Thu 3 Mar 2005 7:00:17 PM PST
mtime
= Thu 3 Mar 2005 7:00:17 PM PST
flags
= largefiles, mounted
# of inodes
= 23872
# of blocks
= 27867
.
.
.
# of overlay bmaps
= 0
# mkfs -m /mnt0
# mkfs -F vxfs -o \
bsize=1024,version=7,inosize=256,logsize=65536,\
largefiles /mnt0
mnt1
mnt2
mnt3
75
76
Storage Checkpoints
Storage Checkpoint administration
PST
PST
PST
PST
Storage Checkpoints
Storage Checkpoint administration
To mount a Storage Checkpoint of a file system, first mount the file system
itself.
Note: The vol1 file system must already be mounted before the Storage
Checkpoint can be mounted.
77
78
Storage Checkpoints
Storage Checkpoint administration
To mount this Storage Checkpoint automatically when the system starts up,
put the following entries in the /etc/vfstab file:
e c device
i v e d #to fsck
o
t
t n u om
mount point
FS
type
fsck
pass
mount at mount
boot
options
/ xv/dev/vx/rdsk/fsvol/vol1
/ v ed /
/ k s d
/ l ov s f
1 l o v
/fsvol
vxfs
yes
/ xv/ v ed /
/ l ovs f /ksd
: 1 l ov
3 2 _ y am
/fsvol_may_23
vxfs
yes
ckpt=
may_23
Note: You do not need to run the fsck utility on Storage Checkpoint pseudo devices
because pseudo devices are part of the actual file system.
Storage Checkpoints
Storage Checkpoint administration
79
80
Storage Checkpoints
Storage Checkpoint administration
Examine the content of the original file and the Storage Checkpoint file:
# cat /mnt0/file
hello, world
# cat /mnt0@5_30pm/file
hello, world
Examine the content of the original file and the Storage Checkpoint file. The
original file contains the latest data while the Storage Checkpoint file still
contains the data at the time of the Storage Checkpoint creation:
# cat /mnt0/file
goodbye
# cat /mnt0@5_30pm/file
hello, world
Storage Checkpoints
Storage Checkpoint administration
Examine the content of both files. The original file must contain the latest
data:
# cat /mnt0/file
goodbye
You can traverse and read the directories of the nodata Storage Checkpoint;
however, the files contain no data, only markers to indicate which block of
the file has been changed since the Storage Checkpoint was created:
# ls -l /mnt0@5_30pm/file
-rw-r--r-1 root
other 13 Jul 13 17:13 \
mnt0@5_30pm/file
# cat /mnt0@5_30pm/file
cat: /mnt0@5_30pm/file: I/O error
# ls -l /mnt0@5_30pm/file
-rw-r--r-1 root
other 13 Jul 13 17:13 \
# cat /mnt0@5_30pm/file
cat: read error: No such file or directory
# ls -l /mnt0@5_30pm/file
-rw-r--r-1 root
other 13 Jul 13 17:13 \
# cat /mnt0@5_30pm/file
cat: /mnt0@5_30pm/file: Input/output error
# ls -l /mnt0@5_30pm/file
-rw-r--r-1 root
other 13 Jul 13 17:13 \
# cat /mnt0@5_30pm/file
cat: input error on /mnt0@5_30pm/file: I/O error
# ls -l /mnt0@5_30pm/file
-rw-r--r-1 root
other 13 Jul 13 17:13 \
# cat /mnt0@5_30pm/file
cat: /mnt0@5_30pm/file: I/O error
81
82
Storage Checkpoints
Storage Checkpoint administration
Create four data Storage Checkpoints on this file system, note the order of
creation, and list them:
# fsckptadm
# fsckptadm
# fsckptadm
# fsckptadm
# fsckptadm
/mnt0
latest:
ctime
mtime
flags
old:
ctime
mtime
flags
older:
ctime
mtime
flags
oldest:
ctime
mtime
flags
=
=
=
=
=
=
=
=
=
=
=
=
Storage Checkpoints
Storage Checkpoint administration
You can instead convert the latest Storage Checkpoint to a nodata Storage
Checkpoint in a delayed or asynchronous manner.
# fsckptadm set nodata latest /mnt0
List the Storage Checkpoints, as in the following example. You will see that
the latest Storage Checkpoint is marked for conversion in the future.
# fsckptadm list /mnt0
/mnt0
latest:
ctime
= Mon 26 Jul 11:56:55 2004
mtime
= Mon 26 Jul 11:56:55
flags
= nodata, largefiles, delayed
old:
ctime
= Mon 26 Jul 11:56:51 2004
mtime
= Mon 26 Jul 11:56:51 2004
flags
= largefiles
older:
ctime
= Mon 26 Jul 11:56:46 2004
mtime
= Mon 26 Jul 11:56:46 2004
flags
= largefiles
oldest:
ctime
= Mon 26 Jul 11:56:41 2004
mtime
= Mon 26 Jul 11:56:41 2004
flags
= largefiles
83
84
Storage Checkpoints
Storage Checkpoint administration
11:56:51 2004
11:56:51 2004
11:56:46 2004
11:56:46 2004
11:56:41 2004
11:56:41 2004
2004
2004
delayed
2004
2004
2004
2004
2004
2004
Storage Checkpoints
Storage Checkpoint administration
2004
2004
delayed
2004
2004
2004
2004
2004
2004
85
86
Storage Checkpoints
Space management considerations
2004
2004
2004
2004
Note: After you remove the older and old Storage Checkpoints, the latest
Storage Checkpoint is automatically converted to a nodata Storage Checkpoint
because the only remaining older Storage Checkpoint (oldest) is already a
nodata Storage Checkpoint:
Storage Checkpoints
Restoring a file system from a Storage Checkpoint
87
88
Storage Checkpoints
Restoring a file system from a Storage Checkpoint
17:09
MyFile.txt
Mar 4
18:21
MyFile.txt
C
K
P
T
6
C
K
P
T
5
C
K
P
T
4
C
K
P
T
3
C
K
P
T
2
C
K
P
T
1
Storage Checkpoints
Restoring a file system from a Storage Checkpoint
89
90
Storage Checkpoints
Restoring a file system from a Storage Checkpoint
In this example, select the Storage Checkpoint CKPT3 as the new root fileset:
Select Storage Checkpoint for restore operation
or <Control/D> (EOF) to exit
or <Return> to list Storage Checkpoints: CKPT3
CKPT3:
ctime
= Thu 08 May 2004 06:28:31 PM PST
mtime
= Thu 08 May 2004 06:28:36 PM PST
flags
= largefiles
UX:vxfs fsckpt_restore: WARNING: V-3-24640: Any file system
changes or Storage Checkpoints made after
Thu 08 May 2004 06:28:31 PM PST will be lost.
Storage Checkpoints
Restoring a file system from a Storage Checkpoint
If the filesets are listed at this point, it shows that the former UNNAMED root
fileset and CKPT6, CKPT5, and CKPT4 were removed, and that CKPT3 is now
the primary fileset. CKPT3 is now the fileset that will be mounted by default.
C
K
P
T
3
C
K
P
T
2
C
K
P
T
1
91
92
Storage Checkpoints
Storage Checkpoint quotas
soft limit
Must be lower than the hard limit. If a soft limit is exceeded, no new
Storage Checkpoints can be created. The number of blocks used must
return below the soft limit before more Storage Checkpoints can be
created. An alert and console message are generated.
Chapter
Backup examples
94
95
96
Backup examples
In the following examples, the vxdump utility is used to ascertain whether
/dev/vx/dsk/fsvol/vol1 is a snapshot mounted as /backup/home and does the
appropriate work to get the snapshot data through the mount point.
These are typical examples of making a backup of a 300,000 block file system
named /home using a snapshot file system on /dev/vx/dsk/fsvol/vol1 with a
snapshot mount point of /backup/home.
To create a backup using a snapshop file system
97
98
Reads from the snapshot file system are impacted if the snapped file system is
busy because the snapshot reads are slowed by the disk I/O associated with the
snapped file system.
The overall impact of the snapshot is dependent on the read to write ratio of an
application and the mixing of the I/O operations. For example, a database
application running an online transaction processing (OLTP) workload on a
snapped file system was measured at about 15 to 20 percent slower than a file
system that was not snapped.
Snapshots
Storage Checkpoints
Are read-only
Are transient
Are persistent
Track changed blocks on the file system level Track changed blocks on each file in the file
system
Storage Checkpoints also serve as the enabling technology for two other Veritas
features: Block-Level Incremental Backups and Storage Rollback, which are used
extensively for backing up databases.
See About Storage Checkpoints on page 67.
A super-block
A bitmap
A blockmap
The following figure shows the disk structure of a snapshot file system.
Figure 6-1
bitmap
blockmap
data block
The super-block is similar to the super-block of a standard VxFS file system, but
the magic number is different and many of the fields are not applicable.
The bitmap contains one bit for every block on the snapped file system. Initially,
all bitmap entries are zero. A set bit indicates that the appropriate block was
copied from the snapped file system to the snapshot. In this case, the appropriate
position in the blockmap references the copied block.
The blockmap contains one entry for each block on the snapped file system.
Initially, all entries are zero. When a block is copied from the snapped file system
to the snapshot, the appropriate entry in the blockmap is changed to contain the
block number on the snapshot file system that holds the data from the snapped
file system.
The data blocks are filled by data copied from the snapped file system, starting
from the beginning of the data block area.
99
100
Initially, the snapshot file system satisfies read requests by finding the data on
the snapped file system and returning it to the requesting process. When an inode
update or a write changes the data in block n of the snapped file system, the old
data is first read and copied to the snapshot before the snapped file system is
updated. The bitmap entry for block n is changed from 0 to 1, indicating that the
data for block n can be found on the snapshot file system. The blockmap entry
for block n is changed from 0 to the block number on the snapshot file system
containing the old data.
A subsequent read request for block n on the snapshot file system will be satisfied
by checking the bitmap entry for block n and reading the data from the indicated
block on the snapshot file system, instead of from block n on the snapped file
system. This technique is called copy-on-write. Subsequent writes to block n on
the snapped file system do not result in additional copies to the snapshot file
system, since the old data only needs to be saved once.
All updates to the snapped file system for inodes, directories, data in files, extent
maps, and so forth, are handled in this fashion so that the snapshot can present
a consistent view of all file system structures on the snapped file system for the
time when the snapshot was created. As data blocks are changed on the snapped
file system, the snapshot gradually fills with data copied from the snapped file
system.
The amount of disk space required for the snapshot depends on the rate of change
of the snapped file system and the amount of time the snapshot is maintained. In
the worst case, the snapped file system is completely full and every file is removed
and rewritten. The snapshot file system would need enough blocks to hold a copy
of every block on the snapped file system, plus additional blocks for the data
structures that make up the snapshot file system. This is approximately 101
percent of the size of the snapped file system. Normally, most file systems do not
undergo changes at this extreme rate. During periods of low activity, the snapshot
should only require two to six percent of the blocks of the snapped file system.
During periods of high activity, the snapshot might require 15 percent of the
blocks of the snapped file system. These percentages tend to be lower for larger
file systems and higher for smaller ones.
Warning: If a snapshot file system runs out of space for changed data blocks, it is
disabled and all further attempts to access it fails. This does not affect the snapped
file system.
Chapter
Quotas
This chapter includes the following topics:
Using quotas
soft limit
Must be lower than the hard limit, and can be exceeded, but only for
a limited time. The time limit can be configured on a per-file system
basis only. The VxFS default limit is seven days.
Soft limits are typically used when a user must run an application that could
generate large temporary files. In this case, you can allow the user to exceed the
quota limit for a limited time. No allocations are allowed after the expiration of
102
Quotas
About quota files on &ProductNameLong;
Quotas
About quota commands
also requires a separate group quotas file. The VxFS group quota file is named
quotas.grp. The VxFS user quotas file is named quotas. This name was used to
distinguish it from the quotas.user file used by other file systems under Solaris.
Edits quota limits for users and groups. The limit changes made by
vxedquota are reflected both in the internal quotas file and the external
quotas file.
vxrepquota
vxquot
vxquota
vxquotaon
vxquotaoff
Beside these commands, the VxFS mount command supports a special mount
option (o quota), which can be used to turn on quotas at mount time.
For additional information on the quota commands, see the corresponding manual
pages.
Note:
Note:
103
104
Quotas
About quota checking with Veritas File System
Note:
Note: When VxFS file systems are exported via NFS, the VxFS quota commands
on the NFS client cannot query or edit quotas. You can use the VxFS quota
commands on the server to query or edit quotas.
Using quotas
The VxFS quota commands are used to manipulate quotas.
Turning on quotas
To use the quota functionality on a file system, quotas must be turned on. You
can turn quotas on at mount time or after a file system is mounted.
Quotas
Using quotas
Note: Before turning on quotas, the root directory of the file system must contain
a file for user quotas named quotas and a file for group quotas named quotas.grp
and a file for group quotas named quotas.grp and a file for group quotas named
quotas.grp and a file for group quotas named quotas.grp owned by root.
To turn on quotas
To turn on user and groupuser and groupuser and groupuser and group quotas
for a VxFS file system, enter:
# vxquotaonvxquotaonquotaonvxquotaonvxquotaon /mount_point
4
5
6
7
8
9
To turn on user or group user or group user or group user or group quotas
for a file system at mount time, enter:
# mount FFFtV vxfs o quota special /mount_point
105
106
Quotas
Using quotas
4
5
6
7
8
9
Editing user and group and group and group and group quotas
You can set up user and group quotas using the vxedquota command. You must
have superuser privileges to edit quotas.
vxedquota creates a temporary file for the given user; this file contains on-disk
quotas for each mounted file system that has a quotas file. It is not necessary that
quotas be turned on for vxedquota to work. However, the quota limits are
applicable only after quotas are turned on for a given file system.
To edit quotas
Specify the u option to edit the quotas of one or more users specified by
username:
# vxedquota [u] username
Editing the quotas of one or more users is the default behavior if the u option
is not specified.
Specify the g option to edit the quotas of one or more groups specified by
groupname:
# vxedquota g groupname
1
2
1
2
Quotas
Using quotas
1
2
Specify the -g and -t options to modify time limits for any group:
# vxedquota g t
To display a user's quotas and disk usage on all mounted VxFS file systems
where the quotas file exists, enter:
# vxquota v [u] username
To display a group's quotas and disk usage on all mounted VxFS file systems
where the quotas.grp file exists, enter:
# vxquota v g groupname
1
2
3
1
2
107
108
Quotas
Using quotas
1
2
To display the number of files and the space owned by each user, enter:
# vxquot [u] f filesystem
To display the number of files and the space owned by each group, enter:
# vxquot g f filesystem
1
2
1
2
1
2
To turn off only user quotas for a VxFS file system, enter:
# vxquotaoff u /mount_point
To turn off only group quotas for a VxFS file system, enter:
# vxquotaoff g /mount_point
Chapter
110
The FCL log file contains both the information about the FCL, which is stored in
the FCL superblock, and the changes to files and directories in the file system,
which is stored as FCL records.
See File Change Log programmatic interface on page 112.
In 4.1, the structure of the File Change Log file was exposed through the
/opt/VRTS/include/sys/fs/fcl.h header file. In this release, the internal
structure of the FCL file is opaque. The recommended mechanism to access the
FCL is through the API described by the /opt/VRTSfssdk/5.0/include/vxfsutil.h
header file.
The /opt/VRTS/include/sys/fs/fcl.h header file is included in this release to
ensure that applications accessing the FCL with the 4.1 header file do not break.
New applications should use the new FCL API described in
/opt/VRTSfssdk/5.0/include/vxfsutil.h. Existing applications should also be
modified to use the new FCL API.
With the addition of new record types, the FCL version in this release has been
updated to 4. To provide backward compatibility for the existing applications,
this release supports multiple FCL versions. Users now have the flexibility of
specifying of specifying the FCL version for new FCLs. The default FCL version is
4.
See the fcladm(1M) man page.
Enables the recording of the audit, open, close, and stats events
in the File Change Log file. Setting the audit option enables all
events to be recorded in the FCL file when the command is issued.
Setting the audit option also lists the struct fcl_accessinfo
identifier, which shows the user ID. The open option enables all
files opened when the command is issued along with the command
names to be recorded in the FCL file. The close option allows the
recording of all files that are closed in the FCL file. The stats option
enables the statistics of all events to be recorded to the FCL.
clear
Disables the recording of the audit, open, close, and stats events
after it has been set.
Specifies the duration in seconds that FCL records stay in the FCL
file before they can be purged. The first records to be purged are
the oldest ones, which are located at the beginning of the file.
Additionally, records at the beginning of the file can be purged if
allocation to the FCL file exceeds fcl_maxalloc bytes. The
default value is 0. Note that fcl_keeptime takes precedence
over fcl_maxalloc. No hole is punched if the FCL file exceeds
fcl_maxalloc bytes but the life of the oldest record has not
reached fcl_keeptime seconds.
fcl_maxalloc
fcl_winterval
Specifies the time in seconds that must elapse before the FCL
records an overwrite, extending write, or a truncate. This helps
to reduce the number of repetitive records in the FCL. The
fcl_winterval timeout is per inode. If an inode happens to go
out of cache and returns, its write interval is reset. As a result,
there could be more than one write record for that file in the same
write interval. The default value is 3600 seconds.
fcl_ointerval
Either or both fcl_maxalloc and fcl_keeptime must be set to activate the FCL
feature. The following are examples of using the FCL administration command.
To activate FCL for a mounted file system, enter:
# fcladm on mount_point
111
112
To remove the FCL file for a mounted file system, on which FCL must be turned
off, enter:
# fcladm rm mount_point
To obtain the current FCL state for a mounted file system, enter:
# fcladm state mount_point
Print the on-disk FCL super-block in text format to obtain information about the
FCL file by using offset 0. Because the FCL on-disk super-block occupies the first
block of the FCL file, the first and last valid offsets into the FCL file can be
determined by reading the FCL super-block and checking the fc_foff field. Enter:
# fcladm print 0 mount_point
To print the contents of the FCL in text format, of which the offset used must be
32-byte aligned, enter:
# fcladm print offset mount_point
Backward
compatibility
The following sample code fragment reads the FCL superblock, checks that the
state of the FCL is VX_FCLS_ON, issues a call to vxfs_fcl_sync to obtain a finishing
offset to read to, determines the first valid offset in the FCL file, then reads the
entries in 8K chunks from this offset. The section process fcl entries is what an
application developer must supply to process the entries in the FCL.
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/fcntl.h>
#include <errno.h>
#include <fcl.h>
#include <vxfsutil.h>
#define FCL_READSZ 8192
char* fclname = "/mnt/lost+found/changelog";
int read_fcl(fclname) char* fclname;
{
struct fcl_sb fclsb;
uint64_t off, lastoff;
size_t size;
char buf[FCL_READSZ], *bufp = buf;
int fd;
int err = 0;
if ((fd = open(fclname, O_RDONLY)) < 0) {
return ENOENT;
}
if ((off = lseek(fd, 0, SEEK_SET)) != 0) {
close(fd);
return EIO;
}
size = read(fd, &fclsb, sizeof (struct fcl_sb));
if (size < 0) {
close(fd);
return EIO;
}
if (fclsb.fc_state == VX_FCLS_OFF) {
close(fd);
return 0;
}
if (err = vxfs_fcl_sync(fclname, &lastoff)) {
close(fd);
return err;
113
114
}
if ((off = lseek(fd, off_t, uint64_t)) != uint64_t) {
close(fd);
return EIO;
}
while (off < lastoff) {
if ((size = read(fd, bufp, FCL_READSZ)) <= 0) {
close(fd);
return errno;
}
/* process fcl entries */
off += size;
}
close(fd);
return 0;
}
115
116
Chapter
Allocating data
Volume encapsulation
Load balancing
118
Controlling where files are stored can be selected at multiple levels so that
specific files or file hierarchies can be assigned to different volumes. This
Placing the VxFS intent log on its own volume to minimize disk head movement
and thereby increase performance. This functionality can be used to migrate
from the Veritas QuickLog feature.
To use the multi-volume file system features, Veritas Volume Manager must be
installed and the volume set feature must be accessible.
Volume availability
MVS guarantees the availability of some volumes even when others are unavailable.
This allows you to mount a multi-volume file system even if one or more
component dataonly volumes are missing.
The volumes are separated by whether metadata is allowed on the volume. An
I/O error on a dataonly volume does not affect access to any other volumes. All
VxFS operations that do not access the missing dataonly volume function normally,
including:
Mounting the multi-volume file system, regardless if the file system is read-only
or read/write.
Kernel operations.
Performing a fsck replay. Logged writes are converted to normal writes if the
corresponding volume is dataonly.
Using all other commands that do not access data on a missing volume.
Reading or writing file data if the file's data extents were allocated from the
missing dataonly volume.
119
120
Volume availability is supported only on a file system with disk layout Version 7
or later.
Note: Do not mount a multi-volume system with the ioerror=disable or
ioerror=wdisable mount options if the volumes have different availability
properties. Symantec recommends the ioerror=mdisable mount option for cluster
mounts and ioerror=mwdisable for local mounts.
Create two new volumes and add them to the volume set:
#
#
#
#
vxassist make
vxassist make
vxvset addvol
vxvset addvol
vol2 50m
vol3 50m
myvset vol2
myvset vol3
121
list myvset
INDEX
LENGTH
0
20480
1
102400
2
102400
STATE
ACTIVE
ACTIVE
ACTIVE
CONTEXT
-
Use the ls command to see that when a volume set is created, the volumes
contained by the volume set are removed from the namespace and are instead
accessed through the volume set name:
# ls -l /dev/vx/rdsk/rootdg/myvset
1 root root 108,70009 May 21 15:37 /dev/vx/rdsk/rootdg/myvset
Create a volume, add it to the volume set, and use the ls command to see that
when a volume is added to the volume set, it is no longer visible in the
namespace:
# vxassist make vol4 50m
# ls -l /dev/vx/rdsk/rootdg/vol4
crw-- 1 root root 108,70012 May 21 15:43
/dev/vx/rdsk/rootdg/vol4
# vxvset addvol myvset vol4
# ls -l /dev/vx/rdsk/rootdg/vol4
/dev/vx/rdsk/rootdg/vol4: No such file or directory
122
After a volume set is created, create a VxFS file system by specifying the
volume set name as an argument to mkfs:
# mkfs -F vxfs /dev/vx/rdsk/rootdg/myvset
version 7 layout
327680 sectors, 163840 blocks of size 1024,
log size 1024 blocks largefiles supported
After the file system is created, VxFS allocates space from the different
volumes within the volume set.
List the component volumes of the volume set using of the fsvoladm command:
# mount -F vxfs /dev/vx/dsk/rootdg/myvset /mnt1
# fsvoladm list /mnt1
devid
size
used
avail
name
0
10240
1280
8960
vol1
1
51200
16
51184
vol2
2
51200
16
51184
vol3
3
51200
16
51184
vol4
Add a new volume by adding the volume to the volume set, then adding the
volume to the file system:
# vxassist make vol5 50m
# vxvset addvol myvset vol5
# fsvoladm add /mnt1 vol5 50m
# fsvoladm list /mnt1
devid
size
used
avail
0
10240
1300
8940
1
51200
16
51184
2
51200
16
51184
3
51200
16
51184
4
51200
16
51184
name
vol1
vol2
vol3
vol4
vol5
queryflags /mnt1
flags
metadataok
dataonly
dataonly
dataonly
dataonly
Increase the metadata space in the file system using the fsvoladm command:
# fsvoladm
# fsvoladm
volname
vol1
vol2
vol3
vol4
vol5
123
124
Edit the /etc/vfstab file to replace the volume device name, vol1, with the
volume set name, vset1.
volume, use the fsck -o zapvol=volname command. The zapvol option performs
a full file system check and zaps all inodes that refer to the specified volume. The
fsck command prints the inode numbers of all files that the command destroys;
the file names are not printed. The zapvol option only affects regular files if used
on a dataonly volume. However, it could destroy structural files if used on a
metadataok volume, which can make the file system unrecoverable. Therefore,
the zapvol option should be used with caution on metadataok volumes.
125
126
CONTEXT
-
Create a file system on the myvset volume set and mount it:
# mkfs -F vxfs /dev/vx/rdsk/rootdg/myvset
version 7 layout
204800 sectors, 102400 blocks of size 1024,
log size 1024 blocks
largefiles supported
# mount -F vxfs /dev/vx/dsk/rootdg/myvset /mnt1
Assign the policies at the file system level. The data policy must be specified
before the metadata policy:
# fsapadm assignfs /mnt1 datapolicy metadatapolicy
# fsvoladm list /mnt1
devid
size
used
avail
name
0
51200
1250
49950
vol1
1
51200
16
51184
vol2
The assignment of the policies on a file system-wide basis ensures that any
metadata allocated is stored on the device with the policy metadatapolicy
(vol2) and all user data is be stored on vol1 with the associated datapolicy
policy.
Define two allocation policies called mp3data and mp3meta to refer to the vol1
and vol2 volumes:
# fsapadm define /mnt1 mp3data vol1
# fsapadm define /mnt1 mp3meta vol2
127
128
The following example shows how to assign pattern tables to a file system in a
volume set that contains two volumes from different classes of storage. The
pattern table is contained within the pattern file mypatternfile.
To assign pattern tables to directories
Define two allocation policies called mydata and mymeta to refer to the vol1
and vol2 volumes:
# fsapadm define /mnt1 mydata vol1
# fsapadm define /mnt1 mymeta vol2
Allocating data
The following script creates a large number of files to demonstrate the benefit of
allocating data:
i=1
while [ $i -lt 1000 ]
do
dd if=/dev/zero of=/mnt1/$i bs=65536 count=1
i=expr $i + 1
done
Before the script completes, vol1 runs out of space even though space is still
available on the vol2 volume:
# fsvoladm list /mnt1
devid
size
used
0
51200
51200
1
51200
221
avail
0
50979
name
vol1
vol2
The solution is to assign an allocation policy that allocates user data from the
vol1 volume to vol2 if space runs out.
You must have system administrator privileges to create, remove, change policies,
or set file system or Storage Checkpoint level policies. Users can assign a
pre-existing policy to their files if the policy allows that.
Policies can be inherited for new files. A file will inherit the allocation policy of
the directory in which it resides if you run the fsapadm assignfile -f inherit
command on the directory.
Assign an allocation policy that allocates user data from vol1 to vol2 if space
runs out on vol1:
# fsapadm define /mnt1 datapolicy vol1 vol2
Volume encapsulation
Multi-volume support enables the ability to encapsulate an existing raw volume
and make the volume contents appear as a file in the file system.
Encapsulating a volume involves the following actions:
Encapsulating a volume
The following example illustrates how to encapsulate a volume.
To encapsulate a volume
list myvset
INDEX
LENGTH
0
102400
1
102400
STATE
ACTIVE
ACTIVE
CONTEXT
-
Create a third volume and copy the passwd file to the third volume:
# vxassist make dbvol 100m
# dd if=/etc/passwd of=/dev/vx/rdsk/rootdg/dbvol count=1
1+0 records in
1+0 records out
The third volume will be used to demonstrate how the volume can be accessed
as a file, as shown later.
129
130
Encapsulate dbvol:
# fsvoladm encapsulate /mnt1/dbfile dbvol 100m
# ls -l /mnt1/dbfile
-rw------- 1 root other 104857600 May 22 11:30 /mnt1/dbfile
The passwd file that was written to the raw volume is now visible in the new
file.
Note: If the encapsulated file is changed in any way, such as if the file is
extended, truncated, or moved with an allocation policy or resized volume,
or the volume is encapsulated with a bias, the file cannot be de-encapsulated.
Deencapsulating a volume
The following example illustrates how to deencapsulate a volume.
To deencapsulate a volume
list myvset
INDEX
LENGTH
0
102400
1
102400
2
102400
STATE
ACTIVE
ACTIVE
ACTIVE
CONTEXT
-
Deencapsulate dbvol:
# fsvoladm deencapsulate /mnt1/dbfile
Use the find command to descend directories recursively and run fsmap on
the list of files:
# find . | fsmap Volume Extent Type
vol2
Data
vol1
Data
File
./file1
./file2
131
132
Report the extents of files that have either data or metadata on a single volume
in all Storage Checkpoints, and indicate if the volume has file system metadata:
# fsvmap -mvC /dev/vx/rdsk/fstest/testvset vol1
Meta Structural
vol1 //volume has filesystem metadata//
Data UNNAMED
vol1 /.
Data UNNAMED
vol1 /ns2
Data UNNAMED
vol1 /ns3
Data UNNAMED
vol1 /file1
Meta UNNAMED
vol1 /file1
Load balancing
An allocation policy with the balance allocation order can be defined and assigned
to files that must have their allocations distributed at random between a set of
specified volumes. Each extent associated with these files are limited to a maximum
size that is defined as the required chunk size in the allocation policy. The
distribution of the extents is mostly equal if none of the volumes are full or
disabled.
Load balancing allocation policies can be assigned to individual files or for all files
in the file system. Although intended for balancing data extents across volumes,
a load balancing policy can be assigned as a metadata policy if desired, without
any restrictions.
133
Note: If a file has both a fixed extent size set and an allocation policy for load
balancing, certain behavior can be expected. If the chunk size in the allocation
policy is greater than the fixed extent size, all extents for the file are limited by
the chunk size. For example, if the chunk size is 16 MB and the fixed extent size
is 3 MB, then the largest extent that satisfies both the conditions is 15 MB. If the
fixed extent size is larger than the chunk size, all extents are limited to the fixed
extent size. For example, if the chunk size is 2 MB and the fixed extent size is 3
MB, then all extents for the file are limited to 3 MB.
Rebalancing extents
Extents can be rebalanced by strictly enforcing the allocation policy. Rebalancing
is generally required when volumes are added or removed from the policy or when
the chunk size is modified. When volumes are removed from the volume set, any
extents on the volumes being removed are automatically relocated to other volumes
within the policy.
The following example redefines a policy that has four volumes by adding two
new volumes, removing an existing volume, and enforcing the policy for
rebalancing.
134
To rebalance extents
The single volume must be the first volume in the volume set
The first volume must have sufficient space to hold all of the data and file
system metadata
The volume cannot have any allocation policies that restrict the movement of
data
Determine if the first volume in the volume set, which is identified as device
number 0, has the capacity to receive the data from the other volumes that
will be removed:
# df /mnt1
/mnt1 (/dev/vx/dsk/dg1/vol1):16777216 blocks
3443528 files
If the first volume does not have sufficient capacity, grow the volume to a
sufficient size:
# fsvoladm resize /mnt1 vol1 150g
Remove all volumes except the first volume in the volume set:
#
#
#
#
Before removing a volume, the file system attempts to relocate the files on
that volume. Successful relocation requires space on another volume, and no
allocation policies can be enforced that pin files to that volume. The time for
the command to complete is proportional to the amount of data that must be
relocated.
Edit the /etc/vfstab file to replace the volume set name, vset1, with the
volume device name, vol1.
135
136
Chapter
10
Placement classes
138
Note: Some of the commands have changed or removed between the 4.1 release
and the 5.0 release to make placement policy management more user-friendly.
The following are the commands that have been removed: fsrpadm, fsmove, and
fssweep. The output of the queryfile, queryfs, and list options of the fsapadm
command now print the allocation order by name instead of number.
DST allows administrators of multi-volume VxFS file systems to manage the
placement of files on individual volumes in a volume set by defining placement
policies that control both initial file location and the circumstances under which
existing files are relocated. These placement policies cause the files to which they
apply to be created and extended on specific subsets of a file system's volume set,
known as placement classes. The files are relocated to volumes in other placement
classes when they meet the specified naming, timing, access rate, and storage
capacity-related conditions.
You make a VxVM volume part of a placement class by associating a volume tag
with it. For file placement purposes, VxFS treats all of the volumes in a placement
class as equivalent, and balances space allocation across them. A volume may
have more than one tag associated with it. If a volume has multiple tags, the
volume belongs to multiple placement classes and is subject to allocation and
relocation policies that relate to any of the placement classes. Multiple tagging
should be used carefully.
See Placement classes on page 139.
VxFS imposes no capacity, performance, availability, or other constraints on
placement classes. Any volume may be added to any placement class, no matter
what type the volume has nor what types other volumes in the class have. However,
a good practice is to place volumes of similar I/O performance and availability in
the same placement class.
Note: Dynamic Storage Tiering is a licensed feature. You must purchase a separate
license key for DST to operate. See the Veritas Storage Foundation Release Notes.
The Using Dynamic Storage Tiering Symantec Yellow Book provides additional
information regarding the Dynamic Storage Tiering feature, including the value
of DST and best practices for using DST. You can download Using Dynamic Storage
Tiering from the following webpage:
http://www.symantec.com/enterprise/yellowbooks/index.jsp
Placement classes
A placement class is a Dynamic Storage Tiering attribute of a given volume in a
volume set of a multi-volume file system. This attribute is a character string, and
is known as a volume tag. A volume may have different tags, one of which could
be the placment class. The placement class tag makes a volume distinguishable
by DST.
Volume tags are organized as hierarchical name spaces in which the levels of the
hierarchy are separated by periods. By convention, the uppermost level in the
volume tag hierarchy denotes the Storage Foundation component or application
that uses a tag, and the second level denotes the tags purpose. DST recognizes
volume tags of the form vxfs.placement_class.class_name. The prefix vxfs
identifies a tag as being associated with VxFS. placement_class identifies the
tag as a file placement class used by DST. class_name represents the name of the
file placement class to which the tagged volume belongs. For example, a volume
with the tag vxfs.placement_class.tier1 belongs to placement class tier1.
Administrators use the vxvoladm command to associate tags with volumes.
See the vxadm(1M) manual page.
VxFS policy rules specify file placement in terms of placement classes rather than
in terms of individual volumes. All volumes that belong to a particular placement
class are interchangeable with respect to file creation and relocation operations.
Specifying file placement in terms of placement classes rather than in terms of
specific volumes simplifies the administration of multi-tier storage in the following
ways:
Adding or removing volumes does not require a file placement policy change.
If a volume with a tag value of vxfs.placement_class.tier2 is added to a file
systems volume set, all policies that refer to tier2 immediately apply to the
newly added volume with no administrative action. Similarly, volumes can be
evacuated, that is, have data removed from them, and be removed from a file
system without a policy change. The active policy continues to apply to the
file systems remaining volumes.
File placement policies are not specific to individual file systems. A file
placement policy can be assigned to any file system whose volume set includes
volumes tagged with the tag values (placement classes) named in the policy.
This property makes it possible for data centers with large numbers of servers
to define standard placement policies and apply them uniformly to all servers
with a single administrative action.
139
140
vxvoladm
vxvoladm
vxvoladm
vxvoladm
-g
-g
-g
-g
cfsdg
cfsdg
cfsdg
cfsdg
settag
settag
settag
settag
vsavola
vsavolb
vsavolc
vsavold
vxfs.placement_class.tier1
vxfs.placement_class.tier2
vxfs.placement_class.tier3
vxfs.placement_class.tier4
for which each document is the current active policy. When a policy document is
updated, SFMS can assign the updated document to all file systems whose current
active policies are based on that document. By default, SFMS does not update file
system active policies that have been created or modified locally, that is by the
hosts that control the placement policies' file systems. If a SFMS administrator
forces assignment of a placement policy to a file system, the file system's active
placement policy is overwritten and any local changes that had been made to the
placement policy are lost.
141
142
143
Tier Name
tier4
tier3
tier2
tier1
Size (KB)
524288
524288
524288
524288
File
/mnt1/mds1/d1/file1
/mnt1/mds1/d1/file2
/mnt1/mds1/d1/d2/file3
/mnt1/mds1/d1/d2/file4
144
allocating and relocating files are expressed in the file system's file placement
policy.
A VxFS file placement policy defines the desired placement of sets of files on the
volumes of a VxFS multi-volume file system. A file placement policy specifies the
placement classes of volumes on which files should be created, and where and
under what conditions the files should be relocated to volumes in alternate
placement classes or deleted. You can create file placement policy documents,
which are XML text files, using either an XML or text editor, or a VxFS graphical
interface wizard.
The following output shows the overall structure of a placement policy:
<?xml version="1.0"?>
<!DOCTYPE PLACEMENT_POLICY [
<!-- The placement policy document definition file -->
<!-- Specification for PLACEMENT_POLICY element.
It can contain the following:
1. 0 or 1 COMMENT element
2. 1 or more RULE elements
-->
<!ELEMENT PLACEMENT_POLICY (COMMENT?, RULE+)>
<!-- The attributes of PLACEMENT_POLICY element -->
<!-- XML requires all attributes must be enclosed in double quotes -->
<!ATTLIST PLACEMENT_POLICY
Name CDATA #REQUIRED
Version (5.0) #REQUIRED
>
145
146
147
-->
<!-- XML requires all attributes must be enclosed in double quotes -->
<!ATTLIST PATTERN
Flags (recursive | nonrecursive) "nonrecursive"
>
148
>
149
150
2. KB
3. MB
4. GB
-->
<!-- XML requires all attributes must be enclosed in double quotes -->
<!ATTLIST BALANCE_SIZE
Units (bytes|KB|MB|GB) #REQUIRED
>
<!-- Specification for DELETE element. This describes deletion criteria.
It can contain the following:
1. 0 or 1 COMMENT element
2. 0 or 1 FROM element
3. 0 or 1 WHEN element
-->
<!ELEMENT DELETE (COMMENT?, FROM?, WHEN?)>
<!-- The attributes of DELETE element -->
<!-- XML requires all attributes must be enclosed in double quotes -->
<!ATTLIST DELETE
Name CDATA #IMPLIED
Flags (none) #IMPLIED
>
<!-- Specification for RELOCATE element. This describes relocation criteria.
It can contain the following:
1. 0 or 1 COMMENT element
2. 0 or 1 FROM element
3. 1 TO element
4. 0 or 1 WHEN element
The order of TO elements is significant. Earlier CLASSes would be
used before the latter ones.
-->
<!ELEMENT RELOCATE (COMMENT?, FROM?, TO, WHEN?)>
<!-- The attributes of RELOCATE element -->
<!-- XML requires all attributes must be enclosed in double quotes -->
<!ATTLIST RELOCATE
Name CDATA #IMPLIED
Flags (none) #IMPLIED
>
151
152
3.
4.
5.
0 or 1 MODAGE element
0 or 1 IOTEMP element
0 or 1 ACCESSTEMP element
1.
2.
153
0 or 1 MIN element
0 or 1 MAX element
-->
<!ELEMENT ACCAGE (MIN?, MAX?)>
<!-- The attributes of ACCAGE element -->
<!-- The possible and accepted values for Prefer are:
(THIS IS NOT IMPLEMENTED)
1. low
2. high
The possible and accepted values for Units are:
1. hours
2. days
-->
<!-- XML requires all attributes must be enclosed in double quotes -->
<!ATTLIST ACCAGE
Prefer (low|high) #IMPLIED
Units (hours|days) #REQUIRED
>
154
>
155
-->
<!ELEMENT ACCESSTEMP (MIN?, MAX?, PERIOD)>
<!-- The attributes of ACCESSTEMP element -->
<!-- The possible and accepted values for Prefer are:
(THIS IS NOT IMPLEMENTED)
1. low
2. high
-->
<!-- The possible and accepted values for Type are:
1. nreads
2. nwrites
3. nrws
-->
<!-- XML requires all attributes must be enclosed in double quotes -->
<!ATTLIST ACCESSTEMP
Prefer (low|high) #IMPLIED
Type (nreads|nwrites|nrws) #REQUIRED
>
156
<!ATTLIST MAX
Flags (lt|lteq) #REQUIRED
>
SELECT statement
The VxFS placement policy rule SELECT statement designates the collection of
files to which a rule applies.
The following XML snippet illustrates the general form of the SELECT statement:
<SELECT>
<DIRECTORY Flags="directory_flag_value">...value...
</DIRECTORY>
<PATTERN Flags="pattern_flag_value">...value...</PATTERN>
<USER>...value...</USER>
<GROUP>...value...</GROUP>
</SELECT>
A SELECT statement may designate files by using the following selection criteria:
<DIRECTORY>
A full path name relative to the file system mount point. The
Flags=directory_flag_value XML attribute must have a value
of nonrecursive, denoting that only files in the specified directory
are designated, or a value of recursive, denoting that files in all
subdirectories of the specified directory are designated. The Flags
attribute is mandatory.
The <DIRECTORY> criterion is optional, and may be specified more
than once.
157
158
<PATTERN>
User name of the file's owner. The user number cannot be specified
in place of the name.
The <USER> criterion is optional, and may be specified more than
once.
<GROUP>
Group name of the file's owner. The group number cannot be specified
in place of the group name.
The <GROUP> criterion is optional, and may be specified more than
once.
One or more instances of any or all of the file selection criteria may be specified
within a single SELECT statement. If two or more selection criteria of different
types are specified in a single statement, a file must satisfy one criterion of each
type to be selected.
In the following example, only files that reside in either the ora/db or the
crash/dump directory, and whose owner is either user1 or user2 are selected for
possible action:
<SELECT>
<DIRECTORY Flags="nonrecursive">ora/db</DIRECTORY>
<DIRECTORY Flags="nonrecursive">crash/dump</DIRECTORY>
<USER>user1</USER>
<USER>user2</USER>
</SELECT>
A rule may include multiple SELECT statements. If a file satisfies the selection
criteria of one of the SELECT statements, it is eligible for action.
In the following example, any files owned by either user1 or user2, no matter in
which directories they reside, as well as all files in the ora/db or crash/dump
directories, no matter which users own them, are eligible for action:
<SELECT>
<DIRECTORY Flags="nonrecursive">ora/db</DIRECTORY>
<DIRECTORY Flags="nonrecursive">crash/dump</DIRECTORY>
</SELECT>
<SELECT>
<USER>user1</USER>
<USER>user2</USER>
</SELECT>
When VxFS creates new files, VxFS applies active placement policy rules in the
order of appearance in the active placement policy's XML source file. The first
rule in which a SELECT statement designates the file to be created determines
the file's placement; no later rules apply. Similarly, VxFS scans the active policy
rules on behalf of each file when relocating files, stopping the rules scan when it
reaches the first rule containing a SELECT statement that designates the file. This
behavior holds true even if the applicable rule results in no action. Take for
example a policy rule that indicates that .dat files inactive for 30 days should be
relocated, and a later rule indicates that .dat files larger than 10 megabytes should
be relocated. A 20 megabyte .dat file that has been inactive for 10 days will not
be relocated because the earlier rule applied. The later rule is never scanned.
A placement policy rule's action statements apply to all files designated by any
of the rule's SELECT statements. If an existing file is not designated by a SELECT
statement in any rule of a file system's active placement policy, then DST does
not relocate or delete the file. If an application creates a file that is not designated
by a SELECT statement in a rule of the file system's active policy, then VxFS places
the file according to its own internal algorithms. If this behavior is inappropriate,
159
160
the last rule in the policy document on which the file system's active placement
policy is based should specify <PATTERN>*</PATTERN> as the only selection
criterion in its SELECT statement, and a CREATE statement naming the desired
placement class for files not selected by other rules.
CREATE statement
A CREATE statement in a file placement policy rule specifies one or more
placement classes of volumes on which VxFS should allocate space for new files
to which the rule applies at the time the files are created. You can specify only
placement classes, not individual volume names, in a CREATE statement.
A file placement policy rule may contain at most one CREATE statement. If a rule
does not contain a CREATE statement, VxFS places files designated by the rule's
SELECT statements according to its internal algorithms. However, rules without
CREATE statements can be used to relocate or delete existing files that the rules'
SELECT statements designate.
The following XML snippet illustrates the general form of the CREATE statement:
<CREATE>
<ON Flags="...flag_value...">
<DESTINATION>
<CLASS>...placement_class_name...</CLASS>
<BALANCE_SIZE Units="units_specifier">...chunk_size...
</BALANCE_SIZE>
</DESTINATION>
<DESTINATION>...additional placement class specifications...
</DESTINATION>
</ON>
</CREATE>
placement classes. Failing that, VxFS resorts to its internal space allocation
algorithms, so file allocation does not fail unless there is no available space
any-where in the file system's volume set.
The Flags=any attribute differs from the catchall rule in that this attribute
applies only to files designated by the SELECT statement in the rule, which may
be less inclusive than the <PATTERN>*</PATTERN> file selection specification
of the catchall rule.
In addition to the placement class name specified in the <CLASS> sub-element,
a <DESTINATION> XML element may contain a <BALANCE_SIZE> sub-element.
Presence of a <BALANCE_SIZE> element indicates that space allocation should
be distributed across the volumes of the placement class in chunks of the indicated
size. For example, if a balance size of one megabyte is specified for a placement
class containing three volumes, VxFS allocates the first megabyte of space for a
new or extending file on the first (lowest indexed) volume in the class, the second
megabyte on the second volume, the third megabyte on the third volume, the
fourth megabyte on the first volume, and so forth. Using the Units attribute in
the <BALANCE_SIZE> XML tag, the balance size value may be specified in the
following units:
bytes
Bytes
KB
Kilobytes
MB
Megabytes
GB
Gigabytes
161
162
</DESTINATION>
</ON>
</CREATE>
RELOCATE statement
The RELOCATE action statement of file placement policy rules specifies an action
that VxFS takes on designated files during periodic scans of the file system, and
the circumstances under which the actions should be taken. The fsppadm enforce
command is used to scan all or part of a file system for files that should be relocated
based on rules in the active placement policy at the time of the scan.
See the fsppadm(1M) manual page.
The fsppadm enforce scans file systems in path name order. For each file, VxFS
identifies the first applicable rule in the active placement policy, as determined
by the rules' SELECT statements. If the file resides on a volume specified in the
<FROM> clause of one of the rule's RELOCATE statements, and if the file meets
the criteria for relocation specified in the statement's <WHEN> clause, the file is
scheduled for relocation to a volume in the first placement class listed in the <TO>
clause that has space available for the file. The scan that results from issuing the
fsppadm enforce command runs to completion before any files are relocated.
The following XML snippet illustrates the general form of the RELOCATE
statement:
<RELOCATE>
<FROM>
<SOURCE>
<CLASS>...placement_class_name...</CLASS>
</SOURCE>
<SOURCE>...additional placement class specifications...
</SOURCE>
</FROM>
<TO>
<DESTINATION>
<CLASS>...placement_class_name...</CLASS>
<BALANCE_SIZE Units="units_specifier">
...chunk_size...
</BALANCE_SIZE>
</DESTINATION>
<DESTINATION>
...additional placement class specifications...
</DESTINATION>
</TO>
<WHEN>...relocation conditions...</WHEN>
</RELOCATE>
163
164
<TO>
<WHEN>
The following are the criteria that can be specified for the <WHEN> clause:
<ACCAGE>
This criterion is met when files are inactive for a designated period
or during a designated period relative to the time at which the
fsppadm enforce command was issued.
<MODAGE>
<SIZE>
<IOTEMP>
<ACCESSTEMP>
The following XML snippet illustrates the general form of the <WHEN> clause in
a RELOCATE statement:
<WHEN>
<ACCAGE Units="...units_value...">
<MIN Flags="...comparison_operator...">
...min_access_age...</MIN>
<MAX Flags="...comparison_operator...">
...max_access_age...</MAX>
</ACCAGE>
<MODAGE Units="...units_value...">
<MIN Flags="...comparison_operator...">
...min_modification_age...</MIN>
<MAX Flags="...comparison_operator...">
...max_modification_age...</MAX>
</MODAGE>
<SIZE " Units="...units_value...">
<MIN Flags="...comparison_operator...">
...min_size...</MIN>
<MAX Flags="...comparison_operator...">
...max_size...</MAX>
</SIZE>
<IOTEMP Type="...read_write_preference...">
165
166
<MIN Flags="...comparison_operator...">
...min_I/O_temperature...</MIN>
<MAX Flags="...comparison_operator...">
...max_I/O_temperature...</MAX>
<PERIOD>...days_of_interest...</PERIOD>
</IOTEMP>
<ACCESSTEMP Type="...read_write_preference...">
<MIN Flags="...comparison_operator...">
...min_access_temperature...</MIN>
<MAX Flags="...comparison_operator...">
...max_access_temperature...</MAX>
<PERIOD>...days_of_interest...</PERIOD>
</ACCESSTEMP>
</WHEN>
The access age (<ACCAGE>) element refers to the amount of time since a file was
last accessed. VxFS computes access age by subtracting a file's time of last access,
atime, from the time when the fsppadm enforce command was issued. The <MIN>
and <MAX> XML elements in an <ACCAGE> clause, denote the minimum and
maximum access age thresholds for relocation, respectively. These elements are
optional, but at least one must be included. Using the Units XML attribute, the
<MIN> and <MAX> elements may be specified in the following units:
hours
Hours
days
Both the <MIN> and <MAX> elements require Flags attributes to direct their
operation.
For <MIN>, the following Flags attributes values may be specified:
gt
The time of last access must be greater than the specified interval.
eq
gteq
The time of last access must be greater than or equal to the specified
interval.
The time of last access must be less than the specified interval.
lteq
The time of last access must be less than or equal to the specified
interval.
Bytes
KB
Kilobytes
MB
Megabytes
GB
Gigabytes
167
168
As with the other file relocation criteria, <IOTEMP> may be specified with a lower
threshold by using the <MIN> element, an upper threshold by using the <MAX>
element, or as a range by using both. However, I/O temperature is dimensionless
and therefore has no specification for units.
VxFS computes files' I/O temperatures over the period between the time when
the fsppadm enforce command was issued and the number of days in the past
specified in the <PERIOD> element, where a day is a 24 hour period. For example,
if the fsppadm enforce command was issued at 2 PM on Wednesday, and a
<PERIOD> value of 2 was specified, VxFS looks at file I/O activity for the period
between 2 PM on Monday and 2 PM on Wednesday. The number of days specified
in the <PERIOD> element should not exceed one or two weeks due to the disk
space used by the File Change Log (FCL) file.
See About the File Change Log file on page 109.
I/O temperature is a softer measure of I/O activity than access age. With access
age, a single access to a file resets the file's atime to the current time. In contrast,
a file's I/O temperature decreases gradually as time passes without the file being
accessed, and increases gradually as the file is accessed periodically. For example,
if a new 10 megabyte file is read completely five times on Monday and fsppadm
enforce runs at midnight, the file's two-day I/O temperature will be five and its
access age in days will be zero. If the file is read once on Tuesday, the file's access
age in days at midnight will be zero, and its two-day I/O temperature will have
dropped to three. If the file is read once on Wednesday, the file's access age at
midnight will still be zero, but its two-day I/O temperature will have dropped to
one, as the influence of Monday's I/O will have disappeared.
If the intention of a file placement policy is to keep files in place, such as on top-tier
storage devices, as long as the files are being accessed at all, then access age is
the more appropriate relocation criterion. However, if the intention is to relocate
files as the I/O load on them decreases, then I/O temperature is more appropriate.
The case for upward relocation is similar. If files that have been relocated to
lower-tier storage devices due to infrequent access experience renewed application
activity, then it may be appropriate to relocate those files to top-tier devices. A
policy rule that uses access age with a low <MAX> value, that is, the interval
between fsppadm enforce runs, as a relocation criterion will cause files to be
relocated that have been accessed even once during the interval. Conversely, a
policy that uses I/O temperature with a <MIN> value will only relocate files that
have experienced a sustained level of activity over the period of interest.
<RELOCATE>
<FROM>
<SOURCE>
<CLASS>tier1</CLASS>
</SOURCE>
</FROM>
<TO>
<DESTINATION>
<CLASS>tier2</CLASS>
</DESTINATION>
</TO>
</RELOCATE>
The files designated by the rule's SELECT statement that reside on volumes in
placement class tier1 at the time the fsppadm enforce command executes would
be unconditionally relocated to volumes in placement class tier2 as long as space
permitted. This type of rule might be used, for example, with applications that
create and access new files but seldom access existing files once they have been
processed. A CREATE statement would specify creation on tier1 volumes, which
are presumably high performance or high availability, or both. Each instantiation
of fsppadm enforce would relocate files created since the last run to tier2 volumes.
The following example illustrates a more comprehensive form of the RELOCATE
statement that uses access age as the criterion for relocating files from tier1
volumes to tier2 volumes. This rule is designed to maintain free space on tier1
volumes by relocating inactive files to tier2 volumes:
<RELOCATE>
<FROM>
<SOURCE>
<CLASS>tier1</CLASS>
</SOURCE>
</FROM>
<TO>
<DESTINATION>
<CLASS>tier2</CLASS>
</DESTINATION>
</TO>
<WHEN>
<SIZE Units="MB">
<MIN Flags="gt">1</MIN>
<MAX Flags="lt">1000</MAX>
</SIZE>
<ACCAGE Units="days">
169
170
<MIN Flags="gt">30</MIN>
</ACCAGE>
</WHEN>
</RELOCATE>
Files designated by the rule's SELECT statement are relocated from tier1 volumes
to tier2 volumes if they are between 1 MB and 1000 MB in size and have not been
accessed for 30 days. VxFS relocates qualifying files in the order in which it
encounters them as it scans the file system's directory tree. VxFS stops scheduling
qualifying files for relocation when when it calculates that already-scheduled
relocations would result in tier2 volumes being fully occupied.
The following example illustrates a possible companion rule that relocates files
from tier2 volumes to tier1 ones based on their I/O temperatures. This rule might
be used to return files that had been relocated to tier2 volumes due to inactivity
to tier1 volumes when application activity against them increases. Using I/O
temperature rather than access age as the relocation criterion reduces the chance
of relocating files that are not actually being used frequently by applications. This
rule does not cause files to be relocated unless there is sustained activity against
them over the most recent two-day period.
<RELOCATE>
<FROM>
<SOURCE>
<CLASS>tier2</CLASS>
</SOURCE>
</FROM>
<TO>
<DESTINATION>
<CLASS>tier1</CLASS>
</DESTINATION>
</TO>
<WHEN>
<IOTEMP Type="nrbytes">
<MIN Flags="gt">5</MIN>
<PERIOD>2</PERIOD>
</IOTEMP>
</WHEN>
</RELOCATE>
This rule relocates files that reside on tier2 volumes to tier1 volumes if their I/O
temperatures are above 5 for the two day period immediately preceding the issuing
of the fsppadm enforce command. VxFS relocates qualifying files in the order in
which it encounters them during its file system directory tree scan. When tier1
volumes are fully occupied, VxFS stops scheduling qualifying files for relocation.
VxFS file placement policies are able to control file placement across any number
of placement classes. The following example illustrates a rule for relocating files
with low I/O temperatures from tier1 volumes to tier2 volumes, and to tier3
volumes when tier2 volumes are fully occupied:
<RELOCATE>
<FROM>
<SOURCE>
<CLASS>tier1</CLASS>
</SOURCE>
</FROM>
<TO>
<DESTINATION>
<CLASS>tier2</CLASS>
</DESTINATION>
<DESTINATION>
<CLASS>tier3</CLASS>
</DESTINATION>
</TO>
<WHEN>
<IOTEMP Type="nrbytes">
<MAX Flags="lt">4</MAX>
<PERIOD>3</PERIOD>
</IOTEMP>
</WHEN>
</RELOCATE>
This rule relocates files whose 3-day I/O temperatures are less than 4 and which
reside on tier1 volumes. When VxFS calculates that already-relocated files would
result in tier2 volumes being fully occupied, VxFS relocates qualifying files to
tier3 volumes instead. VxFS relocates qualifying files as it encounters them in its
scan of the file system directory tree.
The <FROM> clause in the RELOCATE statement is optional. If the clause is not
present, VxFS evaluates files designated by the rule's SELECT statement for
relocation no matter which volumes they reside on when the fsppadm enforce
command is issued. The following example illustrates a fragment of a policy rule
that relocates files according to their sizes, no matter where they reside when the
fsppadm enforce command is issued:
<RELOCATE>
<TO>
<DESTINATION>
<CLASS>tier1</CLASS>
171
172
</DESTINATION>
</TO>
<WHEN>
<SIZE Units="MB">
<MAX Flags="lt">10</MAX>
</SIZE>
</WHEN>
</RELOCATE>
<RELOCATE>
<TO>
<DESTINATION>
<CLASS>tier2</CLASS>
</DESTINATION>
</TO>
<WHEN>
<SIZE Units="MB">
<MIN Flags="gteq">10</MIN>
<MAX Flags="lt">100</MAX>
</SIZE>
</WHEN>
</RELOCATE>
<RELOCATE>
<TO>
<DESTINATION>
<CLASS>tier3</CLASS>
</DESTINATION>
</TO>
<WHEN>
<SIZE Units="MB">
<MIN Flags="gteq">100</MIN>
</SIZE>
</WHEN>
</RELOCATE>
This rule relocates files smaller than 10 megabytes to tier1 volumes, files between
10 and 100 megabytes to tier2 volumes, and files larger than 100 megabytes to
tier3 volumes. VxFS relocates all qualifying files that do not already reside on
volumes in their DESTINATION placement classes when the fsppadm enforce
command is issued.
DELETE statement
The DELETE file placement policy rule statement is very similar to the RELOCATE
statement in both form and function, lacking only the <TO> clause. File placement
policy-based deletion may be thought of as relocation with a fixed destination.
Note: Use DELETE statements with caution.
The following XML snippet illustrates the general form of the DELETE statement:
<DELETE>
<FROM>
<SOURCE>
<CLASS>...placement_class_name...</CLASS>
</SOURCE>
<SOURCE>
...additional placement class specifications...
</SOURCE>
</FROM>
<WHEN>...relocation conditions...</WHEN>
</DELETE>
<WHEN>
173
174
<DELETE>
<FROM>
<SOURCE>
<CLASS>tier3</CLASS>
</SOURCE>
</FROM>
</DELETE>
<DELETE>
<FROM>
<SOURCE>
<CLASS>tier2</CLASS>
</SOURCE>
</FROM>
<WHEN>
<ACCAGE Units="days">
<MIN Flags="gt">120</MIN>
</ACCAGE>
</WHEN>
</DELETE>
The first DELETE statement unconditionally deletes files designated by the rule's
SELECT statement that reside on tier3 volumes when the fsppadm enforce
command is issued. The absence of a <WHEN> clause in the DELETE statement
indicates that deletion of designated files is unconditional.
The second DELETE statement deletes files to which the rule applies that reside
on tier2 volumes when the fsppadm enforce command is issued and that have not
been accessed for the past 120 days.
Access age is a binary measure. The time since last access of a file is computed
by subtracting the time at which the fsppadm enforce command is issued
from the POSIX atime in the file's metadata. If a file is opened the day before
the fsppadm enforce command, its time since last access is one day, even
though it may have been inactive for the month preceding. If the intent of a
policy rule is to relocate inactive files to lower tier volumes, it will perform
badly against files that happen to be accessed, however casually, within the
interval defined by the value of the <ACCAGE> pa-rameter.
175
176
As its name implies, the File Change Log records information about changes made
to files in a VxFS file system. In addition to recording creations, deletions,
extensions, the FCL periodically captures the cumulative amount of I/O activity
(number of bytes read and written) on a file-by-file basis. File I/O activity is
recorded in the FCL each time a file is opened or closed, as well as at timed intervals
to capture information about files that remain open for long periods.
If a file system's active file placement policy contains <IOTEMP> clauses, execution
of the fsppadm enforce command begins with a scan of the FCL to extract I/O
activity information over the period of interest for the policy. The period of interest
is the interval between the time at which the fsppadm enforce command was
issued and that time minus the largest interval value specified in any <PERIOD>
element in the active policy.
For files with I/O activity during the largest interval, VxFS computes an
approximation of the amount of read, write, and total data transfer (the sum of
the two) activity by subtracting the I/O levels in the oldest FCL record that pertains
to the file from those in the newest. It then computes each file's I/O temperature
by dividing its I/O activity by its size at Tscan. Dividing by file size is an implicit
acknowledgement that relocating larger files consumes more I/O resources than
relocating smaller ones. Using this algorithm requires that larger files must have
more activity against them in order to reach a given I/O temperature, and thereby
justify the resource cost of relocation.
While this computation is an approximation in several ways, it represents an easy
to compute, and more importantly, unbiased estimate of relative recent I/O activity
upon which reasonable relocation decisions can be based.
File relocation and deletion decisions can be based on read, write, or total I/O
activity.
The following XML snippet illustrates the use of IOTEMP in a policy rule to specify
relocation of low activity files from tier1 volumes to tier2 volumes:
<RELOCATE>
<FROM>
<SOURCE>
<CLASS>tier1</CLASS>
</SOURCE>
</FROM>
<TO>
<DESTINATION>
<CLASS>tier2</CLASS>
</DESTINATION>
</TO>
<WHEN>
<IOTEMP Type="nrbytes">
<MAX Flags="lt">3</MAX>
<PERIOD Units="days">4</PERIOD>
</IOTEMP>
</WHEN>
</RELOCATE>
This snippet specifies that files to which the rule applies should be relocated from
tier1 volumes to tier2 volumes if their I/O temperatures fall below 3 over a period
of 4 days. The Type=nrbytes XML attribute specifies that total data transfer
activity, which is the the sum of bytes read and bytes written, should be used in
the computation. For example, a 50 megabyte file that experienced less than 150
megabytes of data transfer over the 4-day period immediately preceding the
fsppadm enforce scan would be a candidate for relocation. VxFS considers files
that experience no activity over the period of interest to have an I/O temperature
of zero. VxFS relocates qualifying files in the order in which it encounters the
files in its scan of the file system directory tree.
Using I/O temperature or access temperature rather than a binary indication of
activity, such as the POSIX atime or mtime, minimizes the chance of not relocating
files that were only accessed occasionally during the period of interest. A large
file that has had only a few bytes transferred to or from it would have a low I/O
temperature, and would therefore be a candidate for relocation to tier2 volumes,
even if the activity was very recent.
But, the greater value of I/O temperature or access temperature as a file relocation
criterion lies in upward relocation: detecting increasing levels of I/O activity
against files that had previously been relocated to lower tiers in a storage hierarchy
due to inactivity or low temperatures, and relocating them to higher tiers in the
storage hierarchy.
The following XML snippet illustrates relocating files from tier2 volumes to tier1
when the activity level against them increases.
<RELOCATE>
<FROM>
<SOURCE>
<CLASS>tier2</CLASS>
</SOURCE>
</FROM>
<TO>
<DESTINATION>
<CLASS>tier1</CLASS>
</DESTINATION>
</TO>
177
178
<WHEN>
<IOTEMP Type="nrbytes">
<MAX Flags="gt">5</MAX>
<PERIOD Units="days">2</PERIOD>
</IOTEMP>
</WHEN>
</RELOCATE>
The <RELOCATE> statement specifies that files on tier2 volumes whose I/O
temperature as calculated using the number of bytes read is above 5 over a 2-day
period are to be relocated to tier1 volumes. Bytes written to the file during the
period of interest are not part of this calculation.
Using I/O temperature rather than a binary indicator of activity as a criterion for
file relocation gives administrators a granular level of control over automated
file relocation that can be used to attune policies to application requirements. For
example, specifying a large value in the <PERIOD> element of an upward relocation
statement prevents files from being relocated unless I/O activity against them is
sustained. Alternatively, specifying a high temperature and a short period tends
to relocate files based on short-term intensity of I/O activity against them.
I/O temperature and access temperature utilize the sqlite3 database for building
a temporary table indexed on an inode. This temporary table is used to filter files
based on I/O temperature and access temperature. The temporary table is stored
in the database file .__fsppadm_fcliotemp.db, which resides in the lost+found
directory of the mount point.
<SELECT>
<DIRECTORY Flags="nonrecursive">db/datafiles</DIRECTORY>
<DIRECTORY Flags="nonrecursive">db/indexes</DIRECTORY>
<DIRECTORY Flags="nonrecursive">db/logs</DIRECTORY>
</SELECT>
If a rule includes multiple SELECT statements, a file need only satisfy one of them
to be selected for action. This property can be used to specify alternative conditions
for file selection.
In the following example, a file need only reside in one of db/datafiles,
db/indexes, or db/logs or be owned by one of DBA_Manager, MFG_DBA, or
HR_DBA to be designated for possible action:
<SELECT>
<DIRECTORY Flags="nonrecursive">db/datafiles</DIRECTORY>
<DIRECTORY Flags="nonrecursive">db/indexes</DIRECTORY>
<DIRECTORY Flags="nonrecursive">db/logs</DIRECTORY>
</SELECT>
<SELECT>
<USER>DBA_Manager</USER>
<USER>MFG_DBA</USER>
<USER>HR_DBA</USER>
</SELECT>
179
180
In this statement, VxFS would allocate space for newly created files designated
by the rule's SELECT statement on tier1 volumes if space was available. If no tier1
volume had sufficient free space, VxFS would attempt to allocate space on a tier2
volume. If no tier2 volume had sufficient free space, VxFS would attempt allocation
on a tier3 volume. If sufficient space could not be allocated on a volume in any of
the three specified placement classes, allocation would fail with an ENOSPC error,
even if the file system's volume set included volumes in other placement classes
that did have sufficient space.
The <TO> clause in the RELOCATE statement behaves similarly. VxFS relocates
qualifying files to volumes in the first placement class specified if possible, to
volumes in the second specified class if not, and so forth. If none of the destination
criteria can be met, such as if all specified classes are fully occupied, qualifying
files are not relocated, but no error is signaled in this case.
You cannot write rules to relocate or delete a single designated set of files if the
files meet one of two or more relocation or deletion criteria.
181
182
The rules that comprise a placement policy may occur in any order, but during
both file allocation and fsppadm enforce relocation scans, the first rule in which
a file is designated by a SELECT statement is the only rule against which that file
is evaluated. Thus, rules whose purpose is to supersede a generally applicable
behavior for a special class of files should precede the general rules in a file
placement policy document.
The following XML snippet illustrates faulty rule placement with potentially
unintended consequences:
<?xml version="1.0"?>
<!DOCTYPE FILE_PLACEMENT_POLICY SYSTEM "placement.dtd">
<FILE_PLACEMENT_POLICY Version="5.0">
<RULE Name="GeneralRule">
<SELECT>
<PATTERN>*</PATTERN>
</SELECT>
<CREATE>
<ON>
<DESTINATION>
<CLASS>tier2</CLASS>
</DESTINATION>
</ON>
</CREATE>
...other statements...
</RULE>
<RULE Name="DatabaseRule">
<SELECT>
<PATTERN>*.db</PATTERN>
</SELECT>
<CREATE>
<ON>
<DESTINATION>
<CLASS>tier1</CLASS>
</DESTINATION>
</ON>
</CREATE>
...other statements...
</RULE>
</FILE_PLACEMENT_POLICY>
The GeneralRule rule specifies that all files created in the file system, designated
by <PATTERN>*</PATTERN>, should be created on tier2 volumes. The
DatabaseRule rule specifies that files whose names include an extension of .db
should be created on tier1 volumes. The GeneralRule rule applies to any file created
in the file system, including those with a naming pattern of *.db, so the
DatabaseRule rule will never apply to any file. This fault can be remedied by
exchanging the order of the two rules. If the DatabaseRule rule occurs first in the
policy document, VxFS encounters it first when determining where to new place
files whose names follow the pattern *.db, and correctly allocates space for them
on tier1 volumes. For files to which the DatabaseRule rule does not apply, VxFS
continues scanning the policy and allocates space according to the specification
in the CREATE statement of the GeneralRule rule.
A similar consideration applies to statements within a placement policy rule. VxFS
processes these statements in order, and stops processing on behalf of a file when
it encounters a statement that pertains to the file. This can result in unintended
behavior.
The following XML snippet illustrates a RELOCATE statement and a DELETE
statement in a rule that is intended to relocate if the files have not been accessed
in 30 days, and delete the files if they have not been accessed in 90 days:
<RELOCATE>
<TO>
<DESTINATION>
<CLASS>tier2</CLASS>
</DESTINATION>
</TO>
<WHEN>
<ACCAGE Units="days">
<MIN Flags="gt">30</MIN>
</ACCAGE>
</WHEN>
</RELOCATE>
<DELETE>
<WHEN>
<ACCAGE Units="days">
<MIN Flags="gt">90</MIN>
</ACCAGE>
</WHEN>
</DELETE>
As written with the RELOCATE statement preceding the DELETE statement, files
will never be deleted, because the <WHEN> clause in the RELOCATE statement
applies to all selected files that have not been accessed for at least 30 days. This
includes those that have not been accessed for 90 days. VxFS ceases to process a
file against a placement policy when it identifies a statement that applies to that
file, so the DELETE statement would never occur. This example illustrates the
183
184
general point that RELOCATE and DELETE statements that specify less inclusive
criteria should precede statements that specify more inclusive criteria in a file
placement policy document. The GUI automatically produce the correct statement
order for the policies it creates.
Chapter
11
186
Quick I/O is part of the VRTSvxfs package, but is available for use only with other
Symantec products.
See the Veritas Storage Foundation Release Notes.
The Quick I/O interface treats the same file as if it were a raw character device,
having performance similar to a raw device
Quick I/O allows a database server to use the Quick I/O interface while a backup
server uses the VxFS interface.
Note: When Quick I/O is enabled, you cannot create a regular VxFS file with a
name that uses the ::cdev:vxfs: extension. If an application tries to create a
regular file named xxx::cdev:vxfs:, the create fails. If Quick I/O is not available,
it is possible to create a regular file with the ::cdev:vxfs: extension, but this could
cause problems if Quick I/O is later enabled. Symantec advises you to reserve the
extension only for Quick I/O files.
187
188
the regular file xxx is physically present on the VxFS file system
If the file xxx is being used for memory mapped I/O, it cannot be accessed as
a Quick I/O file.
An I/O fails if the file xxx has a logical hole and the I/O is done to that hole on
xxx::cdev:vxfs:.
The size of the file cannot be extended by writes through the Quick I/O
interface.
Creates a symbolic link with an absolute path name for a specified file. The
default is to create a symbolic link with a relative path name.
-e
(For Oracle database files to allow tablespace resizing.) Extends the file size
by the specified amount.
-h
(For Oracle database files.) Creates a file with additional space allocated for
the Oracle header.
-r
(For Oracle database files to allow tablespace resizing.) Increases the file to
the specified size.
-s
You can specify file size in terms of bytes (the default), or in kilobytes, megabytes,
gigabytes, or sectors (512 bytes) by adding a k, K, m, M, g, G, s, or S suffix. If the
189
size of the file including the header is not a multiple of the file system block size,
it is rounded to a multiple of the file system block size before preallocation.
The qiomkfile command creates two files: a regular file with preallocated,
contiguous space; and a symbolic link pointing to the Quick I/O name extension.
The first file created is a regular file named /database/.dbfile, which has
the real space allocated. The second file is a symbolic link named
/database/dbfile. This is a relative link to /database/.dbfile via the Quick
I/O interface. That is, to .dbfile::cdev:vxfs:. This allows .dbfile to be
accessed by any database or application as a raw character device.
If you specify the -a option with qiomkfile, an absolute path name is
used, such as the following:
1 oracle
1 oracle
or:
$ ls -lL
crw-r-----rw-r--r-
1
1
oracle
oracle
dba
dba
43,0
Oct 22 15:04
10485760
Oct 22 15:04
dbfile
.dbfile
If you specified the -a option with qiomkfile, the results are as follows:
$ ls -al
-rw-r--r--
1 oracle
dba
.dbfile
190
lrwxrwxrwx
1 oracle
dba
separate from the data directories. For example, you can create a directory named
/database and put in all the symbolic links, with the symbolic links pointing to
absolute path names.
191
192
Create the file dbfile and preallocate 100 MB for the file:
$ qiomkfile -h headersize -s 100m /database/dbfile
Create the masterdev file and preallocate 100 MB for the file:
$ qiomkfile -s 100m masterdev
You can use this master device while running the sybsetup program or
sybinit script.
Add a new 500 megabyte database device datadev to the file system /sybdata
on your dataserver:
$ cd /sybdata
$ qiomkfile -s 500m datadev
...
disk init
name = "logical_name",
physname = "/sybdata/datadev",
vdevno = "device_number",
size = 256000
go
193
194
The following example creates a 100 megabyte master device masterdev on the
file system /sybmaster.
To create a new Sybase database device
Create the masterdev file and preallocate 100 MB for the file:
$ qiomkfile -s 100m masterdev
You can use this master device while running the sybsetup program or sybinit
script.
Add a new 500 megabyte database device datadev to the file system /sybdata
on your dataserver:
$ cd /sybdata
$ qiomkfile -s 500m datadev
...
Create the masterdev file and preallocate 100 MB for the file:
$ qiomkfile -s 100m masterdev
You can use this master device while running the sybsetup program or sybinit
script.
Add a new 500 megabyte database device datadev to the file system /sybdata
on your dataserver:
$ cd /sybdata
$ qiomkfile -s 500m datadev
...
195
196
Enable the Cached Quick I/O feature for specific files using the qioadmin
command.
197
198
If desired, make this setting persistent across mounts by adding a file system
entry in the file /etc/vx/tunefstab:
/dev/vx/dsk/datadg/database01 qio_cache_enable=1
/dev/vx/dsk/datadg/database02 qio_cache_enable=1
Check the setting of the flag qio_cache_enable using the vxtunefs command,
and the individual cache advisories for each file, to verify caching.
199
200
You can add the following line to the file /etc/system to load Quick I/O
whenever the system reboots.
forceload: drv/fdd
Create a regular VxFS file and preallocate it to the required size, or use the
qiomkfile command. The size of this preallocation depends on the size
requirement of the database server.
Create and access the database using the file name xxx::cdev:vxfs:.
Appendix
Quick Reference
This appendix includes the following topics:
Command summary
Using quotas
Command summary
Symbolic links to all VxFS command executables are installed in the /opt/VRTS/bin
directory. Add this directory to the end of your PATH environment variable to
access the commands.
Table A-1 describes the VxFS-specific commands.
202
Quick Reference
Command summary
Table A-1
VxFS commands
Command
Description
cfscluster
CFS cluster configuration command. This functionality is available only with the Veritas Cluster
File System product.
cfsdgadm
Adds or deletes shared disk groups to or from a cluster configuration. This functionality is
available only with the Veritas Cluster File System product.
cfsmntadm
Adds, deletes, modifies, and sets policy on cluster mounted file systems. This functionality is
available only with the Veritas Cluster File System product.
cfsmount,
cfsumount
Mounts or unmounts a cluster file system. This functionality is available only with the Veritas
Cluster File System product.
cp
df
Reports the number of free disk blocks and inodes for a VxFS file system.
fcladm
ff
Lists file names and inode information for a VxFS file system.
fiostat
fsadm
fsapadm
fscat
fscdsadm
fscdsconv
fscdstask
fsck
fsckpt_restore
fsckptadm
fsclustadm
Manages cluster-mounted VxFS file systems. This functionality is available only with the Veritas
Cluster File System product.
fsdb
fsdbencap
Encapsulates databases.
fsmap
Quick Reference
Command summary
Table A-1
Command
Description
fsppadm
fstyp
fsvmap
fsvoladm
glmconfig
ls
mkfs
mount
mv
ncheck
Generates path names from inode numbers for a VxFS file system.
qioadmin
qiomkfile
Creates a VxFS Quick I/O device file. This functionality is available only with the Veritas Quick
I/O for Databases feature.
qiostat
Displays statistics for VxFS Quick I/O for Databases. This functionality is available only with
the Veritas Quick I/O for Databases feature.
qlogadm
Administers low level IOCTL for the QuickLog driver. This functionality is available only with
the Veritas QuickLog feature.
qlogattach
qlogck
Recovers QuickLog devices during the boot process. This functionality is available only with
the Veritas QuickLog feature.
qlogclustadm
Administers Cluster QuickLog devices. This functionality is available only with the Veritas
QuickLog feature.
qlogdb
Debugs QuickLog. This functionality is available only with the Veritas QuickLog feature.
qlogdetach
Detaches a QuickLog volume from a QuickLog device. This functionality is available only with
the Veritas QuickLog feature.
203
204
Quick Reference
Command summary
Table A-1
Command
Description
qlogdisable
Remounts a VxFS file system with QuickLog logging disabled. This functionality is available
only with the Veritas QuickLog feature.
qlogenable
Remounts a VxFS file system with QuickLog logging enabled. This functionality is available
only with the Veritas QuickLog feature.
qlogmk
Creates and attaches a QuickLog volume to a QuickLog device. This functionality is available
only with the Veritas QuickLog feature.
qlogprint
Displays records from the QuickLog configuration. This functionality is available only with the
Veritas QuickLog feature.
qlogrec
Recovers the QuickLog configuration file during a system failover. This functionality is available
only with the Veritas QuickLog feature.
qlogrm
Removes a QuickLog volume from the configuration file. This functionality is available only
with the Veritas QuickLog feature.
qlogstat
Prints statistics for running QuickLog devices, QuickLog volumes, and VxFS file systems. This
functionality is available only with the Veritas QuickLog feature.
qlogtrace
Prints QuickLog tracing. This functionality is available only with the Veritas QuickLog feature.
setext
umount_vxfs
vxdump
vxedquota
vxenablef
vxfsconvert
Converts an unmounted file system to VxFS or upgrades a VxFS disk layout version.
vxfsstat
vxlsino
vxquot
vxquota
vxquotaoff
vxquotaon
vxrepquota
Quick Reference
Online manual pages
Table A-1
Command
Description
vxrestore
vxtunefs
vxupgrade
Section 1
Description
cp_vxfs
cpio_vxfs
fiostat
getext
ls_vxfs
mv_vxfs
qioadmin
Administers VxFS Quick I/O for Databases cache. This functionality is available only with the
Veritas Quick I/O for Databases feature.
qiomkfile
Creates a VxFS Quick I/O device file. This functionality is available only with the Veritas Quick
I/O for Databases feature.
qiostat
Displays statistics for VxFS Quick I/O for Databases. This functionality is available only with
the Veritas Quick I/O for Databases feature.
setext
205
206
Quick Reference
Online manual pages
Section 1M
Description
cfscluster
Configures CFS clusters. This functionality is available only with the Veritas Cluster File System
product.
cfsdgadm
Adds or deletes shared disk groups to/from a cluster configuration. This functionality is available
only with the Veritas Cluster File System product.
cfsmntadm
Adds, deletes, modifies, and sets policy on cluster mounted file systems. This functionality is
available only with the Veritas Cluster File System product.
cfsmount,
cfsumount
Mounts or unmounts a cluster file system. This functionality is available only with the Veritas
Cluster File System product.
df_vxfs
Reports the number of free disk blocks and inodes for a VxFS file system.
fcladm
ff_vxfs
Lists file names and inode information for a VxFS file system.
fsadm_vxfs
fsapadm
fscat_vxfs
fscdsadm
fscdsconv
fscdstask
fsck_vxfs
fsckptadm
fsckpt_restore
fsclustadm
Manages cluster-mounted VxFS file systems. This functionality is available only with the Veritas
Cluster File System product.
fsdbencap
Encapsulates databases.
fsdb_vxfs
fsmap
fsppadm
Quick Reference
Online manual pages
Table A-3
Section 1M
Description
fstyp_vxfs
fsvmap
fsvoladm
glmconfig
Configures Group Lock Managers (GLM). This functionality is available only with the Veritas
Cluster File System product.
mkfs_vxfs
mount_vxfs
ncheck_vxfs
Generates path names from inode numbers for a VxFS file system.
qlogadm
Administers low level IOCTL for the QuickLog driver. This functionality is available only with
the Veritas QuickLog feature.
qlogck
Recovers QuickLog devices during the boot process. This functionality is available only with
the Veritas QuickLog feature.
qlogdetach
Detaches a QuickLog volume from a QuickLog device. This functionality is available only with
the Veritas QuickLog feature.
qlogmk
Creates and attaches a QuickLog volume to a QuickLog device. This functionality is available
only with the Veritas QuickLog feature.
qlogstat
Prints statistics for running QuickLog devices, QuickLog volumes, and VxFS file systems. This
functionality is available only with the Veritas QuickLog feature.
quot
quotacheck_vxfs
umount_vxfs
vxdiskusg
vxdump
vxedquota
vxenablef
vxfsconvert
Converts an unmounted file system to VxFS or upgrades a VxFS disk layout version.
vxfsstat
207
208
Quick Reference
Online manual pages
Table A-3
Section 1M
Description
vxlsino
vxquot
vxquota
vxquotaoff
vxquotaon
vxrepquota
vxrestore
vxtunefs
vxupgrade
Section 3
Description
vxfs_ap_alloc2
vxfs_ap_assign_ckpt
vxfs_ap_assign_file
vxfs_ap_assign_file_pat
vxfs_ap_assign_fs
Assigns an allocation policy for all file data and metadata within a
specified file system.
vxfs_ap_assign_fs_pat
vxfs_ap_define
vxfs_ap_define2
vxfs_ap_enforce_file
Ensures that all blocks in a specified file match the file allocation policy.
vxfs_ap_enforce_file2
vxfs_ap_enumerate
vxfs_ap_enumerate2
Quick Reference
Online manual pages
Table A-4
Section 3
Description
vxf_ap_free2
vxfs_ap_query
vxfs_ap_query2
vxfs_ap_query_ckpt
vxfs_ap_query_file
vxfs_ap_query_file_pat
vxfs_ap_query_fs
vxfs_ap_query_fs_pat
vxfs_ap_remove
vxfs_fcl_sync
vxfs_fiostats_dump
vxfs_fiostats_getconfig
vxfs_fiostats_set
Turns on and off sub-file I/O statistics and resets statistics counters.
vxfs_get_ioffsets
vxfs_inotopath
vxfs_nattr_check
vxfs_nattr_fcheck
vxfs_nattr_link
vxfs_nattr_open
vxfs_nattr_rename
vxfs_nattr_unlink
vxfs_nattr_utimes
vxfs_vol_add
209
210
Quick Reference
Online manual pages
Table A-4
Section 3
Description
vxfs_vol_clearflags
vxfs_vol_deencapsulate
vxfs_vol_encapsulate
vxfs_vol_encapsulate_bias
vxfs_vol_enumerate
vxfs_vol_queryflags
vxfs_vol_remove
vxfs_vol_resize
vxfs_vol_setflags
vxfs_vol_stat
Section 4
Description
fs_vxfs
inode_vxfs
qlog_config
Describes the QuickLog configuration file. This functionality is available only with the Veritas
QuickLog feature.
tunefstab
Section 7
Description
qlog
Describes the Veritas QuickLog device driver. This functionality is available only with the
Veritas QuickLog feature.
vxfsio
Quick Reference
Creating a VxFS file system
generic_options
specific_options
-o N
Displays the geometry of the file system and does not write
to the device.
-o largefiles
special
size
211
212
Quick Reference
Converting a file system to VxFS
-e
-f
-l logsize
-n|N
-s size
Directs vxfsconvert to use free disk space past the current end of the
file system to store VxFS metadata.
-v
-y|Y
special
Specifies the name of the character (raw) device that contains the file
system to convert.
Quick Reference
Mounting a file system
vxfs
generic_options
specific_options
-o ckpt=ckpt_name
-o cluster
Mounts a file system in shared mode. Available only with the VxFS
cluster file system feature.
-o cluster
Mounts a file system in shared mode. Available only with the VxFS
cluster file system feature.
special
mount_point
-r
213
214
Quick Reference
Mounting a file system
Mount options
The mount command has numerous options to tailor a file system for various
functions and environments.
The following table lists some of the specific_options:
Security feature
Support for large files If you specify the largefiles option, you can create files larger than
two gigabytes on the file system. The default option is largefiles.
Support for cluster file If you specify the cluster option, the file system is mounted in
systems
shared mode. Cluster file systems depend on several other Veritas
products that must be correctly configured before a complete
clustering environment is enabled.
Using Storage
Checkpoints
Using databases
If you are using databases with VxFS and if you have installed a
license key for the Veritas Quick I/O for Databases feature, the
mount command enables Quick I/O by default (the same as
specifying the qio option). The noqio option disables Quick I/O. If
you do not have Quick I/O, mount ignores the qio option.
Alternatively, you can increase database performance using the
mount option convosync=direct, which utilizes direct I/O.
See About Quick I/O on page 185.
Temporary file
systems
Quick Reference
Mounting a file system
device to fsck
fsck
pass
mount at
boot
mount
options
# /dev/dsk/c1d0s2
/dev/rdsk/c1d0s2
/usr
ufs
yes
/proc
/proc
proc
no
fd
/dev/fd
fd
no
swap
/tmp
tmpfs
yes
/dev/dsk/c0t3d0s0
/dev/rdsk/c0t3d0s0
ufs
no
/dev/dsk/c0t3d0s1
swap
no
215
216
Quick Reference
Unmounting a file system
/dev/vx/dsk/fsvol/vol1
/dev/vx/rdsk/fsvol/vol1
/ext
vxfs
yes
This unmounts all file systems except /, /usr, /usr/kvm, /var, /proc, /dev/fd,
and /tmp.
Quick Reference
Identifying file system types
Use the mount command to view the status of mounted file systems:
mount -v
This shows the file system type and mount options for all mounted file systems.
The -v option specifies verbose mode.
To view the status of mounted file systems
Use the mount command to view the status of mounted file systems:
mount -v
This shows the file system type and mount options for all mounted file systems.
The -v option specifies verbose mode.
217
218
Quick Reference
Identifying file system types
special
-v
Use the fstyp command to determine the file system type of the device
/dev/vx/dsk/fsvol/vol1:
# fstyp -v /dev/vx/dsk/fsvol/vol1
The output indicates that the file system type is vxfs, and displays file system
information similar to the following:
vxfs
magic a501fcf5 version 7 ctime Tue Jun 25 18:29:39 2003
logstart 17 logend 1040
bsize 1024 size 1048576 dsize 1047255 ninode 0 nau 8
defiextsize 64 ilbsize 0 immedlen 96 ndaddr 10
aufirst 1049 emap 2 imap 0 iextop 0 istart 0
bstart 34 femap 1051 fimap 0 fiextop 0 fistart 0 fbstart
1083
nindir 2048 aulen 131106 auimlen 0 auemlen 32
auilen 0 aupad 0 aublocks 131072 maxtier 17
inopb 4 inopau 0 ndiripau 0 iaddrlen 8
bshift 10
inoshift 2 bmask fffffc00 boffmask 3ff checksum d7938aa1
oltext1 9 oltext2 1041 oltsize 8 checksum2 52a
free 382614 ifree 0
efree 676 413 426 466 612 462 226 112 85 35 14 3 6 5 4 4 0 0
Quick Reference
Identifying file system types
special
-v
Use the fstype command to determine the file system type of the device
/dev/vx/dsk/fsvol/vol1:
# fstyp -v /dev/vx/dsk/fsvol/vol1
The output indicates that the file system type is vxfs, and displays file system
information similar to the following:
vxfs
magic a501fcf5 version 6 ctime Tue Jun 25 18:29:39 2003
logstart 17 logend 1040
bsize 1024 size 1048576 dsize 1047255 ninode 0 nau 8
defiextsize 64 ilbsize 0 immedlen 96 ndaddr 10
aufirst 1049 emap 2 imap 0 iextop 0 istart 0
bstart 34 femap 1051 fimap 0 fiextop 0 fistart 0 fbstart
1083
nindir 2048 aulen 131106 auimlen 0 auemlen 32
auilen 0 aupad 0 aublocks 131072 maxtier 17
inopb 4 inopau 0 ndiripau 0 iaddrlen 8
bshift 10
inoshift 2 bmask fffffc00 boffmask 3ff checksum d7938aa1
oltext1 9 oltext2 1041 oltsize 8 checksum2 52a
free 382614 ifree 0
efree 676 413 426 466 612 462 226 112 85 35 14 3 6 5 4 4 0 0
219
220
Quick Reference
Resizing a file system
newsize
The size (in sectors) to which the file system will increase.
mount_point
-r rawdev
Quick Reference
Resizing a file system
-b 22528 /ext
Use the fsadm command to decrease the size of a VxFS file system:
fsadm
newsize
The size (in sectors) to which the file system will shrink.
mount_point
-r rawdev
221
222
Quick Reference
Resizing a file system
-b 20480 /ext
Warning: After this operation, there is unused space at the end of the device.
You can then resize the device, but be careful not to make the device smaller
than the new size of the file system.
-d
-D
-e
-E
mount_point
-r rawdev
Quick Reference
Backing up and restoring a file system
-EeDd /ext
Use the mount command to create and mount a snapshot of a VxFS file system:
mount [-F vxfs] -o snapof=source,[snapsize=size] \
destination snap_mount_point
source
destination
size
snap_mount_point
223
224
Quick Reference
Backing up and restoring a file system
-c
backupdev
snap_mount_point
Back up the VxFS snapshot file system mounted at /snapmount to the tape
drive with device name /dev/rmt/00m:
# vxdump -cf /dev/rmt/00m /snapmount
Quick Reference
Using quotas
-v
-x
filename
Restore a VxFS snapshot file system from the tape /dev/st1 into the mount
point /restore:
# cd /restore
# vxrestore -v -x -f /dev/st1
Using quotas
You can use quotas to allocate per-user quotas on VxFS file systems.
See Using quotas on page 104.
See the vxquota(1M), vxquotaon(1M), vxquotaoff(1M), and vxedquota(1M) manual
pages.
Turning on quotas
You can enable quotas at mount time or after a file system is mounted. The root
directory of the file system must contain a file named quotas that is owned by
root.
225
226
Quick Reference
Using quotas
To turn on quotas
If the root directory does not contain a quotas file, the mount command
succeeds, but quotas are not turned on.
Create a quotas file if it does not already exist and turn on quotas for a VxFS
file system mounted at /mnt:
# touch /mnt/quotas
# vxquotaon /mnt
Quick Reference
Using quotas
vxedquota creates a temporary file for a specified user. This file contains on-disk
quotas for each mounted VxFS file system that has a quotas file. The temporary
file has one or more lines similar to the following:
fs /mnt blocks (soft = 0, hard = 0) inodes (soft=0, hard=0)
fs /mnt1 blocks (soft = 100, hard = 200) inodes (soft=10, hard=20)
Quotas do not need to be turned on for vxedquota to work. However, the quota
limits apply only after quotas are turned on for a given file system.
vxedquota has an option to modify time limits. Modified time limits apply to the
entire file system; you cannot set time limits for an individual user.
To set up user quotas
Viewing quotas
The superuser or individual user can view disk quotas and usage on VxFS file
systems using the vxquota command. This command displays the user's quotas
and disk usage on all mounted VxFS file systems where the quotas file exists. You
will see all established quotas regardless of whether or not the quotas are actually
turned on.
To view quotas for a specific user
227
228
Quick Reference
Using quotas
Appendix
Diagnostic messages
This appendix includes the following topics:
Kernel messages
Disabling transactions If the file system detects an error while writing the intent log, it
disables transactions. After transactions are disabled, the files in
the file system can still be read or written, but no block or inode
frees or allocations, structural changes, directory entry changes,
or other changes to metadata are allowed.
230
Diagnostic messages
About kernel messages
Disabling a file system If an error occurs that compromises the integrity of the file system,
VxFS disables itself. If the intent log fails or an inode-list error
occurs, the super-block is ordinarily updated (setting the
VX_FULLFSCK flag) so that the next fsck does a full structural
check. If this super-block update fails, any further changes to the
file system can cause inconsistencies that are undetectable by the
intent log replay. To avoid this situation, the file system disables
itself.
Diagnostic messages
Kernel messages
instance of the message to guarantee that the sequence of events is known when
analyzing file system problems.
Each message is also written to an internal kernel buffer that you can view in the
file /var/adm/messages.
In some cases, additional data is written to the kernel buffer. For example, if an
inode is marked bad, the contents of the bad inode are written. When an error
message is displayed on the console, you can use the unique message ID to find
the message in /var/adm/messages and obtain the additional information.
Kernel messages
Some commonly encountered kernel messages are described on the following
table:
Table B-1
Kernel messages
231
232
Diagnostic messages
Kernel messages
Table B-1
WARNING: msgcnt x: mesg 002: V-2-2: vx_snap_strategy mount_point file system write attempt to read-only file system
WARNING: msgcnt x: mesg 002: V-2-2: vx_snap_copyblk - mount_point
file system write attempt to read-only file system
Description
The kernel tried to write to a read-only file system. This is an
unlikely problem, but if it occurs, the file system is disabled.
Action
The file system was not written, so no action is required. Report
this as a bug to your customer support organization.
Diagnostic messages
Kernel messages
Table B-1
008, 009
WARNING: msgcnt x: mesg 008: V-2-8: vx_direrr: function mount_point file system dir inode dir_inumber dev/block
device_ID/block dirent inode dirent_inumber error errno
WARNING: msgcnt x: mesg 009: V-2-9: vx_direrr: function mount_point file system dir inode dir_inumber dirent inode
dirent_inumber immediate directory error errno
Description
A directory operation failed in an unexpected manner. The mount
point, inode, and block number identify the failing directory. If the
inode is an immediate directory, the directory entries are stored
in the inode, so no block number is reported. If the error is ENOENT
or ENOTDIR, an inconsistency was detected in the directory block.
This inconsistency could be a bad free count, a corrupted hash
chain, or any similar directory structure error. If the error is EIO
or ENXIO, an I/O failure occurred while reading or writing the disk
block.
The VX_FULLFSCK flag is set in the super-block so that fsck will
do a full structural check the next time it is run.
Action
Check the console log for I/O errors. If the problem was caused by
a disk failure, replace the disk before the file system is mounted
for write access. Unmount the file system and use fsck to run a full
structural check.
233
234
Diagnostic messages
Kernel messages
Table B-1
011
012
Diagnostic messages
Kernel messages
Table B-1
014
235
236
Diagnostic messages
Kernel messages
Table B-1
016
Diagnostic messages
Kernel messages
Table B-1
237
238
Diagnostic messages
Kernel messages
Table B-1
Diagnostic messages
Kernel messages
Table B-1
239
240
Diagnostic messages
Kernel messages
Table B-1
Description
When inode information is no longer dependable, the kernel marks
it bad in memory. This is followed by a message to mark it bad on
disk as well unless the mount command ioerror option is set to
disable, or there is subsequent I/O failure when updating the inode
on disk. No further operations can be performed on the inode.
The most common reason for marking an inode bad is a disk I/O
failure. If there is an I/O failure in the inode list, on a directory
block, or an indirect address extent, the integrity of the data in the
inode, or the data the kernel tried to write to the inode list, is
questionable. In these cases, the disk driver prints an error message
and one or more inodes are marked bad.
The kernel also marks an inode bad if it finds a bad extent address,
invalid inode fields, or corruption in directory data blocks during
a validation check. A validation check failure indicates the file
system has been corrupted. This usually occurs because a user or
process has written directly to the device or used fsdb to change
the file system.
The VX_FULLFSCK flag is set in the super-block so fsck will do a
full structural check the next time it is run.
Action
Check the console log for I/O errors. If the problem is a disk failure,
replace the disk. If the problem is not related to an I/O failure, find
out how the disk became corrupted. If no user or process is writing
to the device, report the problem to your customer support
organization. In either case, unmount the file system. The file
system can be remounted without a full fsck unless the
VX_FULLFSCK flag is set for the file system.
019
Diagnostic messages
Kernel messages
Table B-1
021
241
242
Diagnostic messages
Kernel messages
Table B-1
023
Diagnostic messages
Kernel messages
Table B-1
025
026
WARNING: msgcnt x: mesg 026: V-2-26: vx_snap_copyblk mount_point primary file system read error
Description
Snapshot file system error.
When the primary file system is written, copies of the original data
must be written to the snapshot file system. If a read error occurs
on a primary file system during the copy, any snapshot file system
that doesn't already have a copy of the data is out of date and must
be disabled.
Action
An error message for the primary file system prints. Resolve the
error on the primary file system and rerun any backups or other
applications that were using the snapshot that failed when the
error occurred.
243
244
Diagnostic messages
Kernel messages
Table B-1
WARNING: msgcnt x: mesg 027: V-2-27: vx_snap_bpcopy mount_point snapshot file system write error
Description
A write to the snapshot file system failed.
As the primary file system is updated, copies of the original data
are read from the primary file system and written to the snapshot
file system. If one of these writes fails, the snapshot file system is
disabled.
Action
Check the console log for I/O errors. If the disk has failed, replace
it. Resolve the error on the disk and rerun any backups or other
applications that were using the snapshot that failed when the
error occurred.
028
Diagnostic messages
Kernel messages
Table B-1
031
032
245
246
Diagnostic messages
Kernel messages
Table B-1
WARNING: msgcnt x: mesg 033: V-2-33: vx_check_badblock mount_point file system had an I/O error, setting VX_FULLFSCK
Description
When the disk driver encounters an I/O error, it sets a flag in the
super-block structure. If the flag is set, the kernel will set the
VX_FULLFSCK flag as a precautionary measure. Since no other
error has set the VX_FULLFSCK flag, the failure probably occurred
on a data block.
Action
Unmount the file system and use fsck to run a full structural check.
Check the console log for I/O errors. If the problem is a disk failure,
replace the disk before the file system is mounted for write access.
034
035
Diagnostic messages
Kernel messages
Table B-1
037
WARNING: msgcnt x: mesg 037: V-2-37: vx_metaioerr - function volume_name file system meta data [read|write] error in dev/block
device_ID/block
Description
A read or a write error occurred while accessing file system
metadata. The full fsck flag on the file system was set. The message
specifies whether the disk I/O that failed was a read or a write.
File system metadata includes inodes, directory blocks, and the
file system log. If the error was a write error, it is likely that some
data was lost. This message should be accompanied by another file
system message describing the particular file system metadata
affected, as well as a message from the disk driver containing
information about the disk I/O error.
Action
Resolve the condition causing the disk error. If the error was the
result of a temporary condition (such as accidentally turning off
a disk or a loose cable), correct the condition. Check for loose cables,
etc. Unmount the file system and use fsck to run a full structural
check (possibly with loss of data).
In case of an actual disk error, if it was a read error and the disk
driver remaps bad sectors on write, it may be fixed when fsck is
run since fsck is likely to rewrite the sector with the read error. In
other cases, you replace or reformat the disk drive and restore the
file system from backups. Consult the documentation specific to
your system for information on how to recover from disk errors.
The disk driver should have printed a message that may provide
more information.
247
248
Diagnostic messages
Kernel messages
Table B-1
Diagnostic messages
Kernel messages
Table B-1
040
249
250
Diagnostic messages
Kernel messages
Table B-1
042
WARNING: msgcnt x: mesg 042: V-2-42: vx_bsdquotaupdate mount_point file system user|group_id disk limit reached
Description
The hard limit on blocks was reached. Further attempts to allocate
blocks for files owned by the user will fail.
Action
Remove some files to free up space.
043
WARNING: msgcnt x: mesg 043: V-2-43: vx_bsdquotaupdate mount_point file system user|group_id disk quota exceeded too long
Description
The soft limit on blocks was exceeded continuously for longer than
the soft quota time limit. Further attempts to allocate blocks for
files will fail.
Action
Remove some files to free up space.
044
WARNING: msgcnt x: mesg 044: V-2-44: vx_bsdquotaupdate mount_point file system user|group_id disk quota exceeded
Description
The soft limit on blocks is exceeded. Users can exceed the soft limit
for a limited amount of time before allocations begin to fail. After
the soft quota time limit has expired, subsequent attempts to
allocate blocks for files fail.
Action
Remove some files to free up space.
Diagnostic messages
Kernel messages
Table B-1
WARNING: msgcnt x: mesg 045: V-2-45: vx_bsdiquotaupdate mount_point file system user|group_id inode limit reached
Description
The hard limit on inodes was exceeded. Further attempts to create
files owned by the user will fail.
Action
Remove some files to free inodes.
046
WARNING: msgcnt x: mesg 046: V-2-46: vx_bsdiquotaupdate mount_point file system user|group_id inode quota exceeded too long
Description
The soft limit on inodes has been exceeded continuously for longer
than the soft quota time limit. Further attempts to create files
owned by the user will fail.
Action
Remove some files to free inodes.
047
251
252
Diagnostic messages
Kernel messages
Table B-1
Diagnostic messages
Kernel messages
Table B-1
057
253
254
Diagnostic messages
Kernel messages
Table B-1
059
WARNING: msgcnt x: mesg 059: V-2-59: vx_snap_getbitbp mount_point snapshot file system bitmap write error
Description
An I/O error occurred while writing to the snapshot file system
bitmap. There is no problem with the snapped file system, but the
snapshot file system is disabled.
Action
Check the console log for I/O errors. If the problem is a disk failure,
replace the disk. If the problem is not related to an I/O failure, find
out how the disk became corrupted. If no user or process was
writing to the device, report the problem to your customer support
organization. Restart the snapshot on an error free disk partition.
Rerun any backups that failed when the error occurred.
Diagnostic messages
Kernel messages
Table B-1
WARNING: msgcnt x: mesg 060: V-2-60: vx_snap_getbitbp mount_point snapshot file system bitmap read error
Description
An I/O error occurred while reading the snapshot file system
bitmap. There is no problem with snapped file system, but the
snapshot file system is disabled.
Action
Check the console log for I/O errors. If the problem is a disk failure,
replace the disk. If the problem is not related to an I/O failure, find
out how the disk became corrupted. If no user or process was
writing to the device, report the problem to your customer support
organization. Restart the snapshot on an error free disk partition.
Rerun any backups that failed when the error occurred.
061
062
255
256
Diagnostic messages
Kernel messages
Table B-1
WARNING: msgcnt x: mesg 063: V-2-63: vx_fset_markbad mount_point file system mount_point fileset (index number) marked
bad
Description
An error occurred while reading or writing a fileset structure.
VX_FULLFSCK flag is set. If the VX_FULLFSCK flag can't be set,
the file system is disabled.
Action
Unmount the file system and use fsck to run a full structural check.
064
066
Diagnostic messages
Kernel messages
Table B-1
068
257
258
Diagnostic messages
Kernel messages
Table B-1
Diagnostic messages
Kernel messages
Table B-1
NOTICE: msgcnt x: mesg 071: V-2-71: cleared data I/O error flag in
mount_point file system
Description
The user data I/O error flag was reset when the file system was
mounted. This message indicates that a read or write error occurred
while the file system was previously mounted.
See Message Number 038.
Action
Informational only, no action required.
072
074
259
260
Diagnostic messages
Kernel messages
Table B-1
076
Description
Action
The operation may take a considerable length of time. You can do
a forced unmount, or simply wait for the operation to complete so
file system can be unmounted cleanly.
See the umount_vxfs(1M) manual page.
The operation may take a considerable length of time. You can do
a forced unmount, or simply wait for the operation to complete so
file system can be unmounted cleanly.
See the umount_vxfs(1M) manual page.
The operation may take a considerable length of time. You can do
a forced unmount, or simply wait for the operation to complete so
file system can be unmounted cleanly.
See the umount_vxfs(1M) manual page.
The operation may take a considerable length of time. You can do
a forced unmount, or simply wait for the operation to complete so
file system can be unmounted cleanly.
See the umount_vxfs(1M) manual page.
The operation may take a considerable length of time. Wait for the
operation to complete so file system can be unmounted cleanly.
Diagnostic messages
Kernel messages
Table B-1
078
261
262
Diagnostic messages
Kernel messages
Table B-1
Diagnostic messages
Kernel messages
Table B-1
263
264
Diagnostic messages
Kernel messages
Table B-1
Diagnostic messages
Kernel messages
Table B-1
Description
When inode information is no longer dependable, the kernel marks
it bad on disk. The most common reason for marking an inode bad
is a disk I/O failure. If there is an I/O failure in the inode list, on a
directory block, or an indirect address extent, the integrity of the
data in the inode, or the data the kernel tried to write to the inode
list, is questionable. In these cases, the disk driver prints an error
message and one or more inodes are marked bad.
The kernel also marks an inode bad if it finds a bad extent address,
invalid inode fields, or corruption in directory data blocks during
a validation check. A validation check failure indicates the file
system has been corrupted. This usually occurs because a user or
process has written directly to the device or used fsdb to change
the file system.
The VX_FULLFSCK flag is set in the super-block so fsck will do a
full structural check the next time it is run.
Action
Check the console log for I/O errors. If the problem is a disk failure,
replace the disk. If the problem is not related to an I/O failure, find
out how the disk became corrupted. If no user or process is writing
to the device, report the problem to your customer support
organization. In either case, unmount the file system and use fsck
to run a full structural check.
265
266
Diagnostic messages
Kernel messages
Table B-1
081
Diagnostic messages
Kernel messages
Table B-1
083
084
267
268
Diagnostic messages
Kernel messages
Table B-1
086
087
088
Diagnostic messages
Kernel messages
Table B-1
090
091
092
269
270
Diagnostic messages
Kernel messages
Table B-1
Action
Upgrade your disk layout to Version 7 for shared mounts. Use the
vxupgrade command to begin upgrading file systems using older
disk layouts to Version 7.
096
097
Description
Diagnostic messages
Kernel messages
Table B-1
101
Description
The size of the FCL file is approching the maximum file size supported.
This size is platform specific. When the FCL file is reaches the
maximum file size, the FCL will be deactivated and reactivated. All
logging information gathered so far will be lost.
102
Action
Take any corrective action possible to restrict the loss due to the
FCL being deactivated and reactivated.
Description
The size of FCL file reached the maximum supported file size and the
FCL has been reactivated. All records stored in the FCL file, starting
from the current fc_loff up to the maximum file size, have been purged.
New records will be recorded in the FCL file starting from offset
fs_bsize. The activation time in the FCL is reset to the time of
reactivation. The impact is equivalent to File Change Log being
deactivated and activated.
Action
Informational only; no action required.
271
272
Diagnostic messages
About unique message identifiers
20003
Diagnostic messages
Unique message identifiers
Table B-2
20012
20076
21256
273
274
Diagnostic messages
Unique message identifiers
Table B-2
Description
The attempt to mount a VxFS file system has failed because either
the volume being mounted or the directory which is to be the mount
point is busy.
The reason that a VxVM volume could be busy is if the volume is
in a shared disk group and the volume is currently being accessed
by a VxFS command, such as fsck, on a node in the cluster.
One reason that the mount point could be busy is if a process has
the directory open or has the directory as its current directory.
Another reason that the mount point could be busy is if the
directory is NFS-exported.
Action
For a busy mount point, if a process has the directory open or has
the directory as its current directory, use the fuser command to
locate the processes and either get them to release their references
to the directory or kill the processes. Afterward, attempt to mount
the file system again.
If the directory is NFS-exported, unexport the directory, such as
by using unshare mntpt on the Solaris operating system.
Afterward, attempt to mount the file system again.
21268
Diagnostic messages
Unique message identifiers
Table B-2
23729
24996
275
276
Diagnostic messages
Unique message identifiers
Appendix
Disk layout
This appendix includes the following topics:
Not Supported
Version 2
278
Disk layout
About disk space allocation
Version 3
Not Supported
Version 4
Supported
Version 5
Version 6
Supported
Version 7
Supported
Some of the disk layout versions were not supported on all UNIX operating systems.
Currently, only the Version 4, 5, 6, and 7 disk layouts can be created and mounted.
Version 1 and 2 file systems cannot be created nor mounted. Version 7 is the
default disk layout version.
The vxupgrade command is provided to upgrade an existing VxFS file system to
the Version 5, 6, or 7 layout while the file system remains online. You must upgrade
in steps from older to newer layouts.
See the vxupgrade(1M) manual page.
The vxfsconvert command is provided to upgrade Version 1 and 2 disk layouts
to the Version 7 disk layout while the file system is not mounted.
See the vxfsconvert(1M) manual page.
Disk layout
VxFS Version 4 disk layout
on the same system. VxFS allocates disk space to files in extents. An extent is a
set of contiguous blocks.
Contains the object location table (OLT). The OLT, which is referenced
from the super-block, is used to locate the other structural files.
label file
279
280
Disk layout
VxFS Version 4 disk layout
device file
Both the primary fileset and the structural fileset have their own set
of inodes stored in an inode list file. Only the inodes in the primary
fileset are visible to users. When the number of inodes is increased,
the kernel increases the size of the inode list file.
inode allocation
unit file
Holds the free inode map, extended operations map, and a summary
of inode resources.
log file
extent allocation
unit state file
extent allocation Contains the AU summary for each allocation unit, which contains
unit summary file the number of free extents of each size. The summary for an extent
is created only when an allocation unit is expanded for use.
free extent map
file
Contains the free extent maps for each of the allocation units.
quotas files
The Version 4 disk layout supports Access Control Lists and Block-Level
Incremental (BLI) Backup. BLI Backup is a backup method that stores and retrieves
only the data blocks changed since the previous backup, not entire files. This
saves times, storage space, and computing resources required to backup large
databases.
Figure C-1 shows how the kernel and utilities build information about the structure
of the file system.
The super-block location is in a known location from which the OLT can be located.
From the OLT, the initial extents of the structural inode list can be located along
with the inode number of the fileset header file. The initial inode list extents
Disk layout
VxFS Version 5 disk layout
contain the inode for the fileset header file from which the extents associated
with the fileset header file are obtained.
As an example, when mounting the file system, the kernel needs to access the
primary fileset in order to access its inode list, inode allocation unit, quotas file
and so on. The required information is obtained by accessing the fileset header
file from which the kernel can locate the appropriate entry in the file and access
the required information.
VxFS Version 4 disk layout
Figure C-1
Fileset Header/
File Inode Number
Fileset Header
File Inode
....
OLT Replica
....
Structural Fileset
Header
Fileset Index
and Name
max_inodes
Primary Fileset
Header
Features
....
....
281
282
Disk layout
VxFS Version 6 disk layout
kernel operating system. The maximum file system size on a 32-bit kernel is still
one terabyte. Files cannot exceed two terabytes in size. For 64-bit kernels, the
maximum size of the file system you can create depends on the block size:
If you specify the file system size when creating a file system, the block size
defaults to the appropriate value as shown above.
See the mkfs(1M) manual page.
The Version 5 disk layout also supports group quotas. Quota limits cannot exceed
one terabyte.
See About quota files on &ProductNameLong; on page 102.
Some UNIX commands may not work correctly on file systems larger than one
terabyte.
See Using UNIX Commands on File Systems Larger than One TB on page 283.
1024 bytes
2048 bytes
4096 bytes
8192 bytes
Disk layout
VxFS Version 7 disk layout
283
284
Disk layout
Using UNIX Commands on File Systems Larger than One TB
Glossary
access control list (ACL) The information that identifies specific users or groups and their access privileges
A process that manages predefined Veritas Cluster Server (VCS) resource types.
Agents bring resources online, take resources offline, and monitor resources to
report any state changes to VCS. When an agent is started, it obtains configuration
information from VCS and periodically monitors the resources and updates VCS
with the resource status.
allocation unit
API
asynchronous writes
A delayed write in which the data is written to a page in the systems page cache,
but is not written to disk before the write returns to the caller. This improves
performance, but carries the risk of data loss if the system crashes before the data
is flushed to disk.
atomic operation
Block-Level Incremental A Symantec backup capability that does not store and retrieve entire files. Instead,
Backup (BLI Backup)
only the data blocks that have changed since the previous backup are backed up.
buffered I/O
During a read or write operation, data usually goes through an intermediate kernel
buffer before being copied between the user buffer and disk. If the same data is
repeatedly read or written, this kernel buffer acts as a cache, which can improve
performance. See unbuffered I/O and direct I/O.
contiguous file
A file in which data blocks are physically adjacent on the underlying media.
data block
A block that contains the actual data belonging to files and directories.
data synchronous
A form of synchronous I/O that writes the file data to disk before the write returns,
but only marks the inode for later update. If the file size changes, the inode will
be written before the write returns. In this mode, the file data is guaranteed to be
writes
286
Glossary
on the disk before the write returns, but the inode modification times may be lost
if the system crashes.
defragmentation
The process of reorganizing data on disk by making file data blocks physically
adjacent to reduce access times.
direct extent
direct I/O
An unbuffered form of I/O that bypasses the kernels buffering of data. With direct
I/O, the file system transfers data directly between the disk and the user-supplied
buffer. See buffered I/O and unbuffered I/O.
Discovered Direct I/O behavior is similar to direct I/O and has the same alignment
constraints, except writes that allocate storage or extend the file size do not require
writing the inode changes before returning to the application.
encapsulation
extent
A group of contiguous file system data blocks treated as a single unit. An extent
is defined by the address of the starting block and a length.
extent attribute
A quotas file (named quotas) must exist in the root directory of a file system for
quota-related commands to work. See quotas file and internal quotas file.
fileset
An extent attribute used to override the default allocation policy of the file system
and set all allocations for a file to a specific fixed size.
fragmentation
The on-going process on an active file system in which the file system is spread
further and further along the disk, leaving unused gaps or fragments between
areas that are in use. This leads to degraded performance because the file system
has fewer options when assigning a file to an extent.
GB
hard limit
The hard limit is an absolute limit on system resources for individual users for
file and data block usage on a file system. See quota.
indirect address extent An extent that contains references to other extents, as opposed to file data itself.
A single indirect address extent references indirect data extents. A double indirect
address extent references single indirect address extents.
indirect data extent
An extent that contains file data and is referenced via an indirect address extent.
Glossary
inode
A unique identifier for each file within a file system that contains the data and
metadata associated with that file.
intent logging
A method of recording pending changes to the file system structure. These changes
are recorded in a circular intent log file.
VxFS maintains an internal quotas file for its internal usage. The internal quotas
file maintains counts of blocks and indices used by each user. See quotas and
external quotas file.
large file
A file larger than two one terabyte. VxFS supports files up to 8 exabytes in size.
A file system larger than one terabytes. VxFS supports file systems up to 8 exabytes
in size.
latency
For file systems, this typically refers to the amount of time it takes a given file
system operation to return to the user.
metadata
MB
mirror
A duplicate copy of a volume and the data therein (in the form of an ordered
collection of subdisks). Each mirror is one copy of the volume with which the
mirror is associated.
multi-volume file
system
A single file system that has been created over multiple volumes, with each volume
having its own properties.
MVS
Multi-volume support.
The information needed to locate important file system structural elements. The
OLT is written to a fixed location on the underlying media (or disk).
(OLT)
object location table
replica
A copy of the OLT in case of data corruption. The OLT replica is written to a fixed
location on the underlying media (or disk).
page file
A fixed-size block of virtual address space that can be mapped onto any of the
physical addresses available on a system.
preallocation
primary fileset
quotas
Quota limits on system resources for individual users for file and data block usage
on a file system. See hard limit and soft limit.
287
288
Glossary
quotas file
The quotas commands read and write the external quotas file to get or change
usage limits. When quotas are turned on, the quota limits are copied from the
external quotas file to the internal quotas file. See quotas, internal quotas file,
and external quotas file.
reservation
A special private disk group that always exists on the system. The root disk group
is named rootdg.
A disk group in which the disks are shared by multiple hosts (also referred to as
a cluster-shareable disk group).
shared volume
A volume that belongs to a shared disk group and is open on more than one node
at the same time.
A file system whose exact image has been used to create a snapshot file system.
soft limit
The soft limit is lower than a hard limit. The soft limit can be exceeded for a limited
time. There are separate time limits for files and blocks. See hard limit and quotas.
Storage Checkpoint
A facility that provides a consistent and stable view of a file system or database
image and keeps track of modified data blocks since the last Storage Checkpoint.
structural fileset
The files that define the structure of the file system. These files are not visible or
accessible to the user.
super-block
A block containing critical information about the file system such as the file
system type, layout, and size. The VxFS super-block is always located 8192 bytes
from the beginning of the file system and is 8192 bytes long.
synchronous writes
A form of synchronous I/O that writes the file data to disk, updates the inode
times, and writes the updated inode to disk. When the write returns to the caller,
both the data and the inode have been written to disk.
TB
transaction
Updates to the file system structure that are grouped together to ensure they are
all completed.
throughput
For file systems, this typically refers to the number of I/O operations in a given
unit of time.
unbuffered I/O
I/O that bypasses the kernel cache to increase I/O performance. This is similar to
direct I/O, except when a file is extended; for direct I/O, the inode is written to
disk synchronously, for unbuffered I/O, the inode update is delayed. See buffered
I/O and direct I/O.
Glossary
volume
volume set
A container for multiple different volumes. Each volume can have its own
geometry.
vxfs
VxFS
VxVM
289
290
Glossary
Index
A
access control lists 20
alias for Quick I/O files 187
allocation policies 56
default 56
extent 15
extent based 14
multi-volume support 125
cpio_vxfs 58
creating a multi-volume support file system 122
creating file systems with large files 38
creating files with mkfs 211212
creating Quick I/O files 188
cron 26, 42
cron sample script 43
D
B
bad block revectoring 33
blkclear 18
blkclear mount option 33
block based architecture 23
block size 14, 279
blockmap for a snapshot file system 99
buffered file systems 18
buffered I/O 63
C
cache advisories 64
Cached Quick I/O 197
Cached Quick I/O read-ahead 197
cio
Concurent I/O 39
closesync 18
cluster mount 22
commands
cron 26
fsadm 26
getext 58
mkfs 279
qiostat 199
setext 58
contiguous reservation 57
converting a data Storage Checkpoint to a nodata
Storage Checkpoint 78
convosync mount option 31, 35
copy-on-write technique 67, 71
cp_vxfs 58
data copy 62
data integrity 18
data Storage Checkpoints definition 72
data synchronous I/O 34, 63
data transfer 62
default
allocation policy 56
block sizes 14, 279
default_indir_size tunable parameter 46
defragmentation 26
extent 42
scheduling with cron 42
delaylog mount option 3132
device file 280
direct data transfer 62
direct I/O 62
directory reorganization 43
disabled file system
snapshot 100
transactions 229
discovered direct I/O 63
discovered_direct_iosize tunable parameter 46
disk layout
Version 1 277
Version 2 277
Version 3 278
Version 4 278279
Version 5 278, 281
Version 6 278
Version 7 278
disk space allocation 14, 279
displaying mounted file systems 216
292
Index
E
enabling Quick I/O 196
encapsulating volumes 119
enhanced data integrity modes 18
ENOENT 233
ENOSPC
86
ENOTDIR 233
expansion 26
extensions of Quick I/O files 187
extent 14, 55
attributes 55
description 279
indirect 15
reorganization 43
extent allocation 1415
aligned 56
control 55
fixed size 55
unit state file 280
unit summary file 280
extent size
indirect 15
external quotas file 102
F
fc_foff 111
fcl_inode_aging_count tunable parameter
49
fcl_inode_aging_size tunable parameter 49
fcl_keeptime tunable parameter 47
fcl_maxalloc tunable parameter 47
fcl_winterval tunable parameter 48
file
device 280
extent allocation unit state 280
extent allocation unit summary 280
fileset header 280
free extent map 280
inode allocation unit 280
inode list 280
intent log 280
label 279
object location table 279
quotas 280
file (continued)
sparse 57
file change log 47
file system
block size 59
buffering 18
displaying mounted 216
increasing size 220
fileset
header file 280
primary 69
filesystems file 215
fixed extent size 55
fixed write size 57
fragmentation
monitoring 4243
reorganization facilities 42
reporting 42
fragmented file system characteristics 43
free extent map file 280
free space monitoring 42
freeze 65
freezing and thawing, relation to Storage
Checkpoints 69
fsadm 26
how to reorganize a file system 222
how to resize a file system 220
reporting extent fragmentation 43
scheduling defragmentation using cron 43
fsadm_vxfs 39
fscat 94
fsck 78
fsckptadm
Storage Checkpoint administration 74
fstab file
editing 215
fstyp
how to determine the file system type 217218
fsvoladm 122
G
get I/O parameter ioctl 6566
getext 58
getfacl 20
global message IDs 230
H
how to access a Storage Checkpoint 77
Index
I
I/O
direct 62
sequential 63
synchronous 63
I/O requests
asynchronous 34
synchronous 33
increasing file system size 220
indirect extent
address size 15
double 15
single 15
initial_extent_size tunable parameter 49
inode allocation unit file 280
inode list error 230
inode list file 280
inode table 40
internal 40
sizes 40
inodes, block based 15
intent log 16
file 280
multi-volume support 119
Intent Log Resizing 17
internal inode table 40
internal quotas file 102
ioctl interface 55
K
kernel tunable parameters 40
L
label file 279
large files 20, 37
creating file systems with 38
mounting file systems with 38
largefiles mount option 38
local mount 22
log failure 230
log mount option 30
logiosize mount option 33
M
max_direct_iosize tunable parameter 50
max_diskq tunable parameter 50
max_seqio_extent_size tunable parameter 50
maximum I/O size 41
metadata
multi-volume support 119
mincache mount option 31, 34
mkfs 279
creating files with 211212
creating large files 39
modes
enhanced data integrity 18
monitoring fragmentation 42
mount 18, 39
how to display mounted file systems 216
how to mount a file system 213
mounting a Storage Checkpoint 77
pseudo device 77
mount options 30
blkclear 33
choosing 30
combining 39
convosync 31, 35
delaylog 19, 3132
extended 17
largefiles 38
log 18, 30
logiosize 33
mincache 31, 34
nodatainlog 31, 33
tmplog 32
mounted file system
displaying 216
293
294
Index
N
name space
preserved by Storage Checkpoints 68
naming convention, Quick I/O 187
ncheck 114
nodata Storage Checkpoints 78
nodata Storage Checkpoints definition 73
nodatainlog mount option 31, 33
parameters
default 44
tunable 45
tuning 44
performance
overall 30
snapshot file systems 97
preallocating space for Quick I/O files 191
primary fileset relation to Storage Checkpoints 69
pseudo device 77
qio module
loading on system reboot 200
qio_cache_enable tunable parameter 50, 197
qiomkfile 188
qiostat 199
Quick I/O 185
access Quick I/O files as raw devices 187
access regular UNIX files 190
creating Quick I/O files 188
direct I/O 186
double buffering 187
extension 187
read/write locks 186
sectors
forming logical blocks 279
sequential I/O 63
setext 58
setfacl 20
snapped file systems 20, 93
performance 97
unmounting 94
snapread 94
snapshot 223
how to create a backup file system 223
O_SYNC 31
object location table file 279
Index
SVID requirement
VxFS conformance to 27
symbolic links
accessing Quick I/O files 190
synchronous I/O 63
system failure recovery 16
system performance
overall 30
T
temporary directories 19
thaw 65
tmplog mount option 32
transaction disabling 229
tunable I/O parameters 45
default_indir_size 46
discovered_direct_iosize 46
fcl_keeptime 47
fcl_maxalloc 47
fcl_winterval 48
initial_extent_size 49
inode_aging_count 49
inode_aging_size 49
max_direct_iosize 50
max_diskq 50
max_seqio_extent_size 50
qio_cache_enable 50, 197
read_nstream 45
read_pref_io 45
Volume Manager maximum I/O size 41
write_nstream 46
write_pref_io 45
write_throttle 52
tuning I/O parameters 44
typed extents 15
U
umount command 216
uninitialized storage, clearing 33
unmount 78, 230
a snapped file system 94
V
VEA 25
VERITAS Enterprise Administrator 25
Version 1 disk layout 277
Version 2 disk layout 277
Version 3 disk layout 278
295
296
Index
W
writable Storage Checkpoints 77
write size 57
write_nstream tunable parameter 46
write_pref_io tunable parameter 45
write_throttle tunable parameter 52