VMFS Deep Dive
VMFS Deep Dive
VMFS Deep Dive
VMFS Deep Dive - ESX Storage Stack and VMFS - VMFS Vs RDM - SCSI reservation conflicts - Multipathing - Snapshot LUNs and resignaturing The Storage Stack in VI3
Built-In VMFS Cluster File System Simplifies VM provisioning Enables independent VMotion and HA restart of VMs in common LUN File-level locking protects virtual disks Separates VM and storage administration Use RDMs for access to SAN features
Mapping files in a VMFS volume Presented as virtual SCSI device Key contents of the metadata include location and locking of mapped device Virtual machine must interact with a real disk on the SAN
Microsoft Cluster Services (MSCS) Storage VMFS vs. RDM RAW VMFS
RAW may give better performance Leverage templates and quick provisioning RAW means more LUNs - More provisioning time Fewer LUNs means you dont have to watch Heap Advanced features still work Scales better with Consolidated Backup Preferred Method
Skeleton of a VMFS
A VMFS holds files and has its own metadata Metadata gets updated through - Creating a file - Changing a files attributes - Powering on a VM - Powering off a VM - Growing a file
When metadata is updated, the VMkernel places a non-persistent SCSI reservation on the entire VMFS volume Lock held on volume for the duration of the operation Other VMkernels are prevented from doing metadata updates
Concurrent-access filesystem Most I/O happens simultaneously from all hosts Filesystem metadata updates are atomic and performed by the requesting host
- Locking a file for read/write (e.g. vmdk when powering on VM) - Creating a new directory or file - Growing a file etc.
For the time needed by the locking operation (NOT metadata update), a LUN is reserved (=locked for access) to a single host SCSI Reservation Conflict What it is
What happens if we try to perform I/O to a LUN thats already reserved? - A retry counter is decreased and the I/O operation is retried - The retry is scheduled with a pseudo-random algorithm - If the counter reaches 0, we have a SCSI reservation conflict SCSI: 6630: Partition table read from device vmhba1:0:6 failed: SCSI reservation conflict (0xbad0022) SCSI: vm 1033: 5531: Sync CR at 64 SCSI: vm 1033: 5531: Sync CR at 48 SCSI: vm 1033: 5531: Sync CR at 32 SCSI: vm 1033: 5531: Sync CR at 16 SCSI: vm 1033: 5531: Sync CR at 0 WARNING: SCSI: 5541: Failing I/O due to too many reservation conflicts
WARNING: SCSI: 5637: status SCSI reservation conflict, r status 0xc0de01 for vmhba1:0:6. residual R 919, CR 0, ER 3 Whos holding a SCSI Reservation?
- vmkfstools L reserve : This should NEVER EVER be done - Interaction with installed third-party management agents Multiple ESX hosts, alternatively - High latency/slow SAN o Critical lock-passing between ESX hosts during vmotion - SAN firmware slow in honoring SCSI reserve/release o Synchronously mirrored LUNs One non-ESX host - LUN erroneously mapped to e.g. a Windows host No host - Persistent reservation held by the SAN - Needs investigation by the SAN vendor
ESX Server Multipathing
Determined at boot, install / rescan: - N = adapter number - T = target number (generally 1 SP = 1 target)
Determined by the SAN - L = LUN ID - SCSI identifier of the LUN (not shown here) Determined at datastore or extent creation - P = partition number (if 0 or absent = whole disk)
Per-LUN Multipathing Failover Policy
VMware supports using only one path at a time - MRU = Most Recently Used - Fixed = choose a preferred path & failback to it - multiple ESX hosts or multiple LUNs, allows for manual load balancing between SPs
Never setup Fixed policy with an active/passive SAN! Why? Path Thrashing
Only possible on active/passive SANs Host 1 needs access to the LUN through SP1 Host 2 needs access to the LUN through SP2
The LUN keeps being trespassed between SPs and its never available for I/O Multipathing
Active/Active - LUNs presented on multiple Storage Processors - Fixed path policy Failover on NO_CONNECT Preferred path policy Failback to preferred path if it recovers Active/Passive - LUNs presented on a single Storage Processor - MRU (Most Recently Used) path policy Failover on NOT_READY, ILLEGAL_REQUEST or NO_CONNECT
No preferred path policy, no failback to preferred path
1st active path discovered or user configured. Active/Active arrays only - Most recently used (MRU) Active/Active arrays Active/Passive arrays
Snapshot LUNs and Resignaturing How VMware ESX Identifies Disks
Each LUN has a SCSI identifier string provided by the SAN vendor The SCSI ID stays the same amongst different paths The vmkernel identifies disks with a combination of LUN ID, SCSI ID and part of the model string # ls -l /vmfs/devices/disks/ total 179129968 -rwxrwxrwx 1 root root 72833679360 Nov 13 12:16 vmhba0:0:0:0 lrwxrwxrwx 1 root root 58 Nov 13 12:16 vmhba1:0:0:0 -> vml.020000000060060160432017002a547c3e7893dc11524149442035 lrwxrwxrwx 1 root root 58 Nov 13 12:16 vmhba1:0:1:0 -> vml.02000100006006016043201700a99d1c3bb9c5dc11524149442035 lrwxrwxrwx 1 root root 58 Nov 13 12:16 vmhba1:0:10:0 -> vml.02000a000060060160432017000db2f61d17d3dc11524149442035
(...) Snapshot LUNs & Resignaturing Key Facts
ESX identifies objects in a VMFS datastore by path e.g. /vmfs/volumes// The VMFS UUID (aka signature) is generated at VMFS creation
The VMFS header includes hashed information about the disk where its been created
- VMFS relies on SCSI reservations to acquire on-disk locks, which in turn enforce atomicity of filesystem metadata updates" - SCSI reservations dont work across mirrored LUNs - To avoid corruption, we need to prevent mounting a datastore and a copy of it at the same time
On rescan, the information about the disk in the VMFS header metadata (m/d) is checked against the actual values If any of the fields doesnt match, the VMFS is not mounted and ESX complains its a snapshot LUN LVM: 5739: Device vmhba1:0:1:1 is a snapshot: LVM: 5745: disk ID: LVM: 5747: m/d disk ID:
ALERT: LVM: 4903: vmhba1:0:1:1 may be snapshot: disabling access. See resignaturing section in SAN config guide.
Are they mirrored/snapshot LUNs? - If yes: will the ESX host(s) ever see both original and copy at the same time? Yes resignature No either allow snapshots or resignature - If no: do multiple ESX hosts see the same LUN with different IDs? Yes fix the SAN config; if not possible allow snapshots No IDs permanently changed: either allows snapshots or resignature
Resignaturing Issues Never ever resignature
- resignaturing implies changing UUID and datastore name - All paths to filesystem objects (vmdks, VMs) will become invalid!