Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

VMFS Deep Dive

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 11
At a glance
Powered by AI
The key takeaways from the document are that it discusses the ESX storage stack and VMFS filesystem, differences between using VMFS and RDM, issues around SCSI reservation conflicts, and different multipathing failover policies.

VMFS presents virtual disks to VMs, while RDM maps real disks directly. VMFS allows for features like templates and quick provisioning while RDM provides direct access to SAN features. VMFS generally scales better while RDM may provide better performance in some cases.

A SCSI reservation conflict occurs when an I/O operation tries to access a LUN that is already reserved by another host. This can happen due to issues like a slow SAN not quickly releasing reservations or a LUN being concurrently accessed across multiple hosts.

VMFS Deep Dive

Tuesday, April 28, 2009 , Posted by Virtualbox at 11:46 PM


Agenda

VMFS Deep Dive - ESX Storage Stack and VMFS - VMFS Vs RDM - SCSI reservation conflicts - Multipathing - Snapshot LUNs and resignaturing The Storage Stack in VI3

VMFS A Clustered filesystem for todays dynamic IT world

Built-In VMFS Cluster File System Simplifies VM provisioning Enables independent VMotion and HA restart of VMs in common LUN File-level locking protects virtual disks Separates VM and storage administration Use RDMs for access to SAN features

Raw Disk Mapping (RDM)

Mapping files in a VMFS volume Presented as virtual SCSI device Key contents of the metadata include location and locking of mapped device Virtual machine must interact with a real disk on the SAN
Microsoft Cluster Services (MSCS) Storage VMFS vs. RDM RAW VMFS

RAW may give better performance Leverage templates and quick provisioning RAW means more LUNs - More provisioning time Fewer LUNs means you dont have to watch Heap Advanced features still work Scales better with Consolidated Backup Preferred Method
Skeleton of a VMFS

A VMFS holds files and has its own metadata Metadata gets updated through - Creating a file - Changing a files attributes - Powering on a VM - Powering off a VM - Growing a file

When metadata is updated, the VMkernel places a non-persistent SCSI reservation on the entire VMFS volume Lock held on volume for the duration of the operation Other VMkernels are prevented from doing metadata updates

VMFS 3 & SCSI Reservations

Concurrent-access filesystem Most I/O happens simultaneously from all hosts Filesystem metadata updates are atomic and performed by the requesting host

- Locking a file for read/write (e.g. vmdk when powering on VM) - Creating a new directory or file - Growing a file etc.
For the time needed by the locking operation (NOT metadata update), a LUN is reserved (=locked for access) to a single host SCSI Reservation Conflict What it is

What happens if we try to perform I/O to a LUN thats already reserved? - A retry counter is decreased and the I/O operation is retried - The retry is scheduled with a pseudo-random algorithm - If the counter reaches 0, we have a SCSI reservation conflict SCSI: 6630: Partition table read from device vmhba1:0:6 failed: SCSI reservation conflict (0xbad0022) SCSI: vm 1033: 5531: Sync CR at 64 SCSI: vm 1033: 5531: Sync CR at 48 SCSI: vm 1033: 5531: Sync CR at 32 SCSI: vm 1033: 5531: Sync CR at 16 SCSI: vm 1033: 5531: Sync CR at 0 WARNING: SCSI: 5541: Failing I/O due to too many reservation conflicts
WARNING: SCSI: 5637: status SCSI reservation conflict, r status 0xc0de01 for vmhba1:0:6. residual R 919, CR 0, ER 3 Whos holding a SCSI Reservation?

One ESX host (persistent reservation)

- vmkfstools L reserve : This should NEVER EVER be done - Interaction with installed third-party management agents Multiple ESX hosts, alternatively - High latency/slow SAN o Critical lock-passing between ESX hosts during vmotion - SAN firmware slow in honoring SCSI reserve/release o Synchronously mirrored LUNs One non-ESX host - LUN erroneously mapped to e.g. a Windows host No host - Persistent reservation held by the SAN - Needs investigation by the SAN vendor
ESX Server Multipathing

Multipathing vmhbaN:T:L:P notation

Determined at boot, install / rescan: - N = adapter number - T = target number (generally 1 SP = 1 target)

Determined by the SAN - L = LUN ID - SCSI identifier of the LUN (not shown here) Determined at datastore or extent creation - P = partition number (if 0 or absent = whole disk)
Per-LUN Multipathing Failover Policy

VMware supports using only one path at a time - MRU = Most Recently Used - Fixed = choose a preferred path & failback to it - multiple ESX hosts or multiple LUNs, allows for manual load balancing between SPs
Never setup Fixed policy with an active/passive SAN! Why? Path Thrashing

Only possible on active/passive SANs Host 1 needs access to the LUN through SP1 Host 2 needs access to the LUN through SP2
The LUN keeps being trespassed between SPs and its never available for I/O Multipathing

Active/Active - LUNs presented on multiple Storage Processors - Fixed path policy Failover on NO_CONNECT Preferred path policy Failback to preferred path if it recovers Active/Passive - LUNs presented on a single Storage Processor - MRU (Most Recently Used) path policy Failover on NOT_READY, ILLEGAL_REQUEST or NO_CONNECT
No preferred path policy, no failback to preferred path

Load Balancing - Fixed (Preferred Path)

1st active path discovered or user configured. Active/Active arrays only - Most recently used (MRU) Active/Active arrays Active/Passive arrays
Snapshot LUNs and Resignaturing How VMware ESX Identifies Disks

Each LUN has a SCSI identifier string provided by the SAN vendor The SCSI ID stays the same amongst different paths The vmkernel identifies disks with a combination of LUN ID, SCSI ID and part of the model string # ls -l /vmfs/devices/disks/ total 179129968 -rwxrwxrwx 1 root root 72833679360 Nov 13 12:16 vmhba0:0:0:0 lrwxrwxrwx 1 root root 58 Nov 13 12:16 vmhba1:0:0:0 -> vml.020000000060060160432017002a547c3e7893dc11524149442035 lrwxrwxrwx 1 root root 58 Nov 13 12:16 vmhba1:0:1:0 -> vml.02000100006006016043201700a99d1c3bb9c5dc11524149442035 lrwxrwxrwx 1 root root 58 Nov 13 12:16 vmhba1:0:10:0 -> vml.02000a000060060160432017000db2f61d17d3dc11524149442035
(...) Snapshot LUNs & Resignaturing Key Facts

ESX identifies objects in a VMFS datastore by path e.g. /vmfs/volumes// The VMFS UUID (aka signature) is generated at VMFS creation
The VMFS header includes hashed information about the disk where its been created

The Check for Snapshot LUNs

- VMFS relies on SCSI reservations to acquire on-disk locks, which in turn enforce atomicity of filesystem metadata updates" - SCSI reservations dont work across mirrored LUNs - To avoid corruption, we need to prevent mounting a datastore and a copy of it at the same time

On rescan, the information about the disk in the VMFS header metadata (m/d) is checked against the actual values If any of the fields doesnt match, the VMFS is not mounted and ESX complains its a snapshot LUN LVM: 5739: Device vmhba1:0:1:1 is a snapshot: LVM: 5745: disk ID: LVM: 5747: m/d disk ID:
ALERT: LVM: 4903: vmhba1:0:1:1 may be snapshot: disabling access. See resignaturing section in SAN config guide.

LUNs Detected as Snapshots Causes

LUN ID mismatch SCSI ID change (e.g. LUN copied to a new SAN)


They are effectively snapshots (e.g. DR site)

LUNs Detected as Snapshots How to Fix

Are they mirrored/snapshot LUNs? - If yes: will the ESX host(s) ever see both original and copy at the same time? Yes resignature No either allow snapshots or resignature - If no: do multiple ESX hosts see the same LUN with different IDs? Yes fix the SAN config; if not possible allow snapshots No IDs permanently changed: either allows snapshots or resignature
Resignaturing Issues Never ever resignature

while the VMs are running

- resignaturing implies changing UUID and datastore name - All paths to filesystem objects (vmdks, VMs) will become invalid!

You might also like