Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

FlashArray VMware Vsphere Best Practices

Download as pdf or txt
Download as pdf or txt
You are on page 1of 69
At a glance
Powered by AI
The document discusses best practices for integrating Pure Storage FlashArray storage with VMware vSphere environments. It covers topics like host configuration, storage connectivity options, and performance optimization techniques.

The document covers FlashArray configuration, SAN design and setup, ESXi host configuration including multipathing and VAAI, and both iSCSI and Fibre Channel connectivity options.

The document outlines steps for configuring the VMware Native Multipathing plugin including setting the round robin path selection policy and tuning the I/O operations limit. It also discusses enabling EFI-based virtual machines and configuring VAAI.

VMware vSphere Best Practices Guide

for the Pure Storage FlashArray


March 2017
Contents
Document Changes .............................................................................................................................................. 5

Best Practice Checklist......................................................................................................................................... 6

Notes on this Document ...................................................................................................................................... 8

Executive Summary ............................................................................................................................................... 9

Goals and Objectives............................................................................................................................................ 9

Audience .................................................................................................................................................................. 9

FlashArray and VMware Integration ................................................................................................................ 10

FlashArray Configuration ..................................................................................................................................... 11

Host and Host Group Creation ............................................................................................................................................................ 11

Connecting Volumes to Hosts............................................................................................................................................................ 12

SAN Design and Setup........................................................................................................................................ 13

VMware ESXi Host Configuration..................................................................................................................... 14

VMware Native Multipathing Plugin (NMP) Configuration .......................................................................................................... 14

Round Robin Path Selection Policy .............................................................................................................................................. 14

Tuning Round Robin—the I/O Operations Limit........................................................................................................................ 14

Configuring Round Robin and the I/O Operations Limit......................................................................................................... 16

Verifying Connectivity ...................................................................................................................................................................... 17

Enabling EFI-Based Virtual Machines .............................................................................................................................................. 20

VAAI Configuration ................................................................................................................................................................................ 21

iSCSI Configuration............................................................................................................................................................................... 22

Set Login Timeout to a Larger Value .......................................................................................................................................... 22

Disable DelayedAck ........................................................................................................................................................................ 23

iSCSI Port Binding ............................................................................................................................................................................ 24

Jumbo Frames................................................................................................................................................................................... 25

© Pure Storage 2017 | 2


Challenge-Handshake Authentication Protocol (CHAP) ....................................................................................................... 26

Datastore Management ..................................................................................................................................... 27

Volume Sizing and Count ................................................................................................................................................................... 27

VMFS Version Recommendations ................................................................................................................................................... 28

Datastore Performance Management ............................................................................................................................................. 28

Queue Depth Limits and DSNRO................................................................................................................................................. 29

Dynamic Queue Throttling............................................................................................................................................................. 30

Storage I/O Control ........................................................................................................................................................................... 31

Storage DRS ...................................................................................................................................................................................... 32

Datastore Capacity Management ..................................................................................................................................................... 32

VMFS Usage vs. FlashArray Volume Capacity ........................................................................................................................ 32

Monitoring and Managing VMFS Capacity Usage .................................................................................................................. 34

Shrinking a Volume ............................................................................................................................................................................... 41

Mounting a Snapshot Volume ............................................................................................................................................................ 41

Deleting a Datastore ............................................................................................................................................................................ 42

Virtual Machine and Guest Configuration ..................................................................................................... 45

Virtual Disk Choice ............................................................................................................................................................................... 45

Virtual Hardware Configuration ........................................................................................................................................................ 48

Template Configuration....................................................................................................................................................................... 49

Guest-level Settings .............................................................................................................................................................................. 51

High-IOPS Virtual Machines ........................................................................................................................................................... 51

Space Management and Reclamation ........................................................................................................... 53

VMware Dead Space Overview ........................................................................................................................................................ 53

Space Reclamation with VMFS ......................................................................................................................................................... 54

VMFS UNMAP in vSphere 5.5 through 6.0 ............................................................................................................................... 54

VMFS UNMAP in vSphere 6.5 ...................................................................................................................................................... 55

© Pure Storage 2017 | 3


Space Reclamation In-Guest .............................................................................................................................................................. 56

Understanding In-Guest UNMAP in ESXi .................................................................................................................................. 56

In-Guest UNMAP Alignment Requirements .............................................................................................................................. 59

In-Guest UNMAP with Windows ................................................................................................................................................... 59

In-Guest UNMAP with Linux .......................................................................................................................................................... 62

What to expect after UNMAP is run on the FlashArray .............................................................................................................. 63

VMware Horizon 7 Configuration and Tuning.............................................................................................. 64

Horizon Connection Server Tuning ................................................................................................................................................. 64

References ............................................................................................................................................................ 66

About the Author ................................................................................................................................................. 66

Appendix I: Per-Device NMP Configuration ................................................................................................. 67

© Pure Storage 2017 | 4


Document Changes
Document change tracking has been reset to a base from the June 2016 release of this document.

Version Changes

• Added notes on iSCSI.

June 10th 2016 • Checklist updates to include ATS-Only and UseATSForHBOnVMFS5


settings.

• Updated for ESXi 6.5

• Added notes about queuing

• More details on SIOC and SDRS

• In-guest UNMAP details for Linux and Windows


March 17th, 2017
• Capacity management discussion

• Retired recommendation for MaxHWTransferSize

• Short section on Horizon best practices

© Pure Storage 2017 | 5


Best Practice Checklist
Below is a checklist to refer to when configuring a VMware® environment for the Pure Storage®
FlashArray. All settings that are not mentioned here should remain set to the default. Please refer to Pure
Storage and/or VMware support when considering changes to settings not mentioned below. A setting
not mentioned here indicates that Pure Storage does not generally have a specific recommendation for
that setting and recommends either the VMware default or to just follow the guidance of VMware.

For more details, read through the rest of the document.

Acknowledged/Done? Description

Configure Round Robin and an I/O Operations Limit of 1 for every


FlashArray device. The best way to do this is to create an ESXi SATP Rule
on every host (below). This will make sure all devices are set properly
 automatically.

esxcli storage nmp satp rule add -s "VMW_SATP_ALUA" -V "PURE" -M


"FlashArray" -P "VMW_PSP_RR" -O "iops=1"

For iSCSI, disable DelayedAck and set the Login Timeout to 30 seconds.
 Jumbo Frames are optional.

 In vSphere 6.x, change EnableBlockDelete to enabled.

 For VMFS-5, Run UNMAP frequently.

 For VMFS-6, keep automatic UNMAP enabled.

For ESXi hosts running EFI-enabled VMs set the ESXi parameter
 Disk.DiskMaxIOSize to 4 MB.

DataMover.HardwareAcceleratedMove,
 DataMover.HardwareAcceleratedInit, and
VMFS3.HardwareAcceleratedLocking should all be enabled.

Ensure all ESXi hosts are connected to both FlashArray controllers. Ideally
 at least two paths to each. Aim for total redundancy.

 Install VMware tools whenever possible.

Queue depths are to be left at the default. Changing queue depths on the
 ESXi host is considered to be a tweak and should only be examined if a
performance problem is observed.

© Pure Storage 2017 | 6


When mounting snapshots, use the ESXi resignature option and avoid
 force-mounting.

Configure Host Groups on the FlashArray identically to clusters in


vSphere. For example, if a cluster has four hosts in it, create a
 corresponding Host Group on the relevant FlashArrays with exactly those
four hosts—no more, no less.

 Use Paravirtual SCSI adapters for virtual machines whenever possible.

ATS-only should be configured on all Pure Storage volumes. This is a


 default configuration and no changes should normally be required

UseATSForHBOnVMFS5 should be enabled. This was introduced in


 vSphere 5.5 U2 and is enabled by default. It is NOT required though.

A PowerCLI script to check and set certain best practices can be found here:

https://github.com/codyhosterman/powercli/blob/master/bestpractices.ps1

A script to just check for best practices can be found here:

https://github.com/codyhosterman/powercli/blob/master/bestpracticechecker.ps1

For detailed description on both scripts, refer to this post:

http://www.codyhosterman.com/2016/05/updated-flasharray-vmware-best-practices-powercli-scripts/

© Pure Storage 2017 | 7


Notes on this Document
When you see this box, a best practice is defined. This might be a requirement or a recommendation—
whichever one it is will be indicated in the box text:

BEST PRACTICE: Enable a setting

When something needs to be noted that if missed can cause data loss, corruption, data unavailability, or
simply something generally important that isn’t best practice related, this box will be used to stress it:

Please note that this is important

As always refer to the checklist and the document change section for important updates quickly.

© Pure Storage 2017 | 8


Executive Summary
This document describes the best practices for using the Pure Storage® FlashArray in VMware vSphere®
5.5 1+ and 6.0+ environments. While the Pure Storage FlashArray includes general support for a wide
variety of VMware products, this guide will focus on best practices concerning VMware ESXi® and
VMware vCenter™. This paper will provide guidance and recommendations for ESXi and vCenter settings
and features that provide the best performance, value and efficiency when used with the Pure Storage
FlashArray.

Goals and Objectives


This document is intended to provide understanding and insight into any pertininent best practices when
using VMware vSphere with the Pure Storage FlashArray. Options or configurations that are to be left at
the default are generally not mentioned and therefore recommendations for default values should be
assumed. Changing or altering parameters not mentioned in this guide may in fact be supported but are
likely not recommended in most cases and should be considered on a case-by-case basis. Please contact
Pure Storage and/or VMware support in those situations.

Please note that this will be updated frequently, so for the most up-to-date version and other VMware
information please refer to the following page on the Pure Storage support site.

https://support.purestorage.com/Solutions/Virtualization/VMware

Audience
This document is intended for use by VMware and/or storage administrators who want to deploy the Pure
Storage FlashArray in VMware vSphere-based virtualized datacenters. A working familiarity with VMware
products and concepts is recommended.

1
ESXi 5.0 and 5.1 is End of Support by VMware and has been retired from this guide. For information on 5.0, please
reach out to Pure Storage support.

© Pure Storage 2017 | 9


FlashArray and VMware Integration
This document is focused on core ESXi and vCenter best practices to ensure the best performance at
scale and to explain management techniques to maintain the heath of your VMware vSphere
environment on FlashArray storage.

Many of the techniques and operations can be simplified, automated and enhanced through Pure Storage
integration with various VMware products:

1. FlashArray Management Plugin for VMware vSphere Web Client

2. FlashArray Storage Replication Adapter for VMware Site Recovery Manager™

3. FlashArray Plugin for VMware vRealize® Orchestrator™

4. FlashArray Management Pack for VMware vRealize Operations Manager™

5. FlashArray Content Pack for VMware vRealize Log Insight™

Detailed description of these integrations are beyond the scope of this document, but further details can
be found at https://support.purestorage.com/Solutions/Virtualization/VMware.

© Pure Storage 2017 | 10


FlashArray Configuration
This section describes the recommendations for creating provisioning objects (called hosts and host
groups) on the FlashArray. The purpose is to outline the proper configuration for general understanding. It
is important to note though, that the FlashArray vSphere Web Client Plugin will automate all of the
following tasks for you and is therefore is a recommended mechanism for doing so.

Host and Host Group Creation

The FlashArray has two object types for volume provisioning, hosts and host groups:

• Host—a host is a collection of initiators (iSCSI IQNs or Fibre Channel WWNs) that refers to a
physical host. A FlashArray host object must have a one to one relationship with an ESXi host.
Every active initiator for a given ESXi host should be added to the respective FlashArray host
object. If an initiator is not yet zoned (for instance), and not intended to be, it can be omitted from
the FlashArray host object. Furthermore, while the FlashArray supports multiple protocols for a
single host (a mixture of FC and iSCSI), ESXi does not support presenting VMFS storage via more
than one protocol. So creating a multi-protocol host object should be avoided on the FlashArray
when in use with VMware ESXi.

In the example below, the ESXi host has two online Fibre Channel HBAs with WWNs of
20:00:00:25:B5:11:11:4D and 20:00:00:25:B5:11:11:5D. So a host has been created with those WWNs and
not the two that are offline as they are not currently intended for use.

© Pure Storage 2017 | 11


• Host Group—a host group is a collection of host objects. Pure Storage recommends grouping
your ESXi hosts into clusters within vCenter—as this provides a variety of benefits like High
Availability and Dynamic Resource Scheduling. In order to provide simple provisioning, Pure
Storage also recommends creating host groups that correspond to VMware clusters. Therefore,
with every VMware cluster that will use FlashArray storage, a respective host group should be
created. Every ESXi host that is in the cluster should have a corresponding host (as described
above) that is added to a host group. The host group and its respective cluster should have the
same number of hosts. It is recommended to not have more or less hosts in the host group as is in
the cluster. While it is supported to have an unmatching count, it makes cluster-based provisioning
simpler, and a variety of orchestration integrations require these to match. So it is highly
recommended to do so.

BEST PRACTICE: Match FlashArray hosts groups with vCenter clusters.

Connecting Volumes to Hosts

A FlashArray volume can be connected to either host objects or host groups. If a volume is intended to
be shared by the entire cluster, it is recommended to connect the volume to the host group, not the
individual hosts. The makes provisioning easier, and helps ensure the entire ESXi cluster has access to
the volume. Generally, volumes that are intended to host virtual machines, should be connected at the
host group level.

Private volumes, like ESXi boot volumes, should not be connected to the host group as they do not (and
should not) be shared. These volumes should be connected to the host object instead.

Pure Storage has no requirement on LUN IDs for VMware ESXi environments, and users should therefore
rely on the automatic LUN ID selection built into Purity.

© Pure Storage 2017 | 12


SAN Design and Setup
For zoning recommendations and best practices, please refer to the following articles (and others) on
support.purestorage.com:

https://support.purestorage.com/Solutions/SAN/Best_Practices/SAN_Guidelines_for_Maximizing_Pure_
Performance

https://support.purestorage.com/Solutions/SAN/Configuration/Introduction_to_Fibre_Channel_and_Zoni
ng

© Pure Storage 2017 | 13


VMware ESXi Host Configuration
This section reviews the recommended configuration of ESXi hosts for use with the Pure Storage
FlashArray.

VMware Native Multipathing Plugin (NMP) Configuration

VMware offers a Native Multipathing Plugin (NMP) layer in vSphere through Storage Array Type Plugins
(SATP) and Path Selection Policies (PSP) as part of the VMware APIs for Pluggable Storage Architecture
(PSA). The SATP has all the knowledge of the storage array to aggregate I/Os across multiple channels
and has the intelligence to send failover commands when a path has failed. The Path Selection Policy can
be either “Fixed”, “Most Recently Used” or “Round Robin”.

Round Robin Path Selection Policy

The Pure Storage FlashArray is seen by VMware ESXi as an ALUA-compliant array. The FlashArray
advertises all paths as optimized for doing I/Os (active/optimized). When a FlashArray is connected to an
ESXi farm and storage is provisioned, FlashArray volumes get claimed by the “VMW_SATP_ALUA” SATP.

The default pathing policy for the ALUA SATP is the Most Recently Used (MRU) Path Selection Policy.
With this PSP, ESXi will use the first identified path for a volume until that path goes away, or it is manually
told to use a different path. For a variety of reasons this is not the ideal configuration for FlashArray
volumes—most notably of which is performance. To better leverage the active-active nature of the front
end of the FlashArray, Pure Storage requires that you configure FlashArray volumes to use the Round
Robin Path Selection Policy. The Round Robin PSP rotates between all discovered paths for a given
volume which allows ESXi (and therefore the virtual machines running on the volume) to maximize the
possible performance by using all available resources (HBAs, target ports, etc.).

BEST PRACTICE: Use the Round Robin Path Selection Policy for FlashArray
volumes.

Tuning Round Robin—the I/O Operations Limit

The Round Robin Path Selection Policy allows for additional tuning of its path-switching behavior in the
form of a setting called the I/O Operations Limit. The I/O Operations Limit (sometimes called the “IOPS”
value) dictates how often ESXi switches logical paths for a given device. By default, when Round Robin is
enabled on a device, ESXi will switch to a new logical path every 1,000 I/Os. In other words, ESXi will
choose a logical path, and start issuing all I/Os for that device down that path. Once it has issued 1,000
I/Os for that device, down that path, it will switch to a new logical path and so on.

© Pure Storage 2017 | 14


Pure Storage recommends tuning this value down to the minimum of 1. This will cause ESXi to change
logical paths after every single I/O, instead of 1,000.

This recommendation is made for a few reasons:

1. Performance. Often the reason cited to change this value is performance. While this is true in certain
cases, the performance impact of changing this value is not usually profound (generally in the single
digits of percentage performance increase). While changing this value from 1,000 to 1 can improve
performance, it generally will not solve a major performance problem. Regardless, changing this value
can improve performance in some use cases, especially with iSCSI.

2. Path Failover Time. It has been noted in testing that ESXi will fail logical paths much more quickly
when this value is set to a the minimum of 1. During a physical failure of the storage environment (loss
of a HBA, switch, cable, port, controller) ESXi, after a certain period of time, will fail any logical path
that relies on that failed physical hardware and will discontinue attempting to use it for a given
volume. This failure does not always happen immediately. When the I/O Operations Limit is set to the
default of 1,000 path failover time can sometimes be in the 10s of seconds which can lead to
noticeable disruption in performance during this failure. When this value is set to the minimum of 1,
path failover generally decreases to sub-ten seconds. This greatly reduces the impact of a physical
failure in the storage environment and provides greater performance resiliency and reliability.

3. FlashArray Controller I/O Balance. When Purity is upgraded on a FlashArray, the following process is
observed (at a high level): upgrade Purity on one controller, reboot it, wait for it to come back up,
upgrade Purity on the other controller, reboot it and you’re done. Due to the reboots, twice during the
process half of the FlashArray front-end ports go away. Because of this, we want to ensure that all
hosts are actively using both controllers prior to upgrade. One method that is used to confirm this is to
check the I/O balance from each host across both controllers. When volumes are configured to use
Most Recently Used, an imbalance of 100% is usually observed (ESXi tends to select paths that lead to
the same front end port for all devices). This then means additional troubleshooting to make sure that
host can survive a controller reboot. When Round Robin is enabled with the default I/O Operations
Limit, port imbalance is improved to about 20-30% difference. When the I/O Operations Limit is set to
1, this imbalance is less than 1%. This gives Pure Storage and the end user confidence that all hosts
are properly using all available front end ports.

For these three above reasons, Pure Storage highly recommends altering the I/O Operations Limit to 1.

BEST PRACTICE: Change the Round Robin I/O Operations Limit from 1,000 to 1
for FlashArray volumes.

© Pure Storage 2017 | 15


Configuring Round Robin and the I/O Operations Limit

There are a variety of ways to configure Round Robin and the I/O Operations Limit. This can be set on a
per-device basis and as every new volume is added, these options can be set against that volume. This is
not a particularly good option as one must do this for every new volume, which can make it easy to
forget, and must do it on every host for every volume. This makes the chance of exposure to mistakes
quite large.

The recommended option for configuring Round Robin and the correct I/O Operations Limit is to create a
rule that will cause any new FlashArray device that is added in the future to that host to automatically get
the Round Robin PSP and an I/O Operation Limit value of 1.

The following command creates a rule that achieves both of these for only Pure Storage FlashArray
devices:

esxcli storage nmp satp rule add -s "VMW_SATP_ALUA" -V "PURE" -M "FlashArray" -P "VMW_PSP_RR"
-O "iops=1"

This must be repeated for each ESXi host.

This can also be accomplished through PowerCLI. Once connected to a vCenter Server this script will
iterate through all of the hosts in that particular vCenter and create a default rule to set Round Robin for
all Pure Storage FlashArray devices with an I/O Operation Limit set to 1.

$creds = Get-Credential
Connect-VIServer -Server <vCenter> -Credential $creds
$hosts = get-vmhost
foreach ($esx in $hosts)
{
$esxcli=get-esxcli -VMHost $esx -v2
$satpArgs = $esxcli.storage.nmp.satp.rule.remove.createArgs()
$satpArgs.description = "Pure Storage FlashArray SATP Rule"
$satpArgs.model = "FlashArray"
$satpArgs.vendor = "PURE"
$satpArgs.satp = "VMW_SATP_ALUA"
$satpArgs.psp = "VMW_PSP_RR"
$satpArgs.pspoption = "iops=1"
$esxcli.storage.nmp.satp.rule.add.invoke($satpArgs)
}

Furthermore, this can be configured using vSphere Host Profiles:

© Pure Storage 2017 | 16


It is important to note that existing, previously presented devices will need to be manually set to Round
Robin and an I/O Operation Limit of 1. Optionally, the ESXi host can be rebooted so that it can inherit the
multipathing configuration set forth by the new rule.

For setting a new I/O Operation Limit on an existing device, see Appendix I: Per-Device NMP
Configuration.

BEST PRACTICE: Use a SATP rule to configure multipathing for FlashArray


volumes.

Verifying Connectivity

It is important to verify proper connectivity prior to implementing production workloads on a host or


volume. This consists of a few steps:

1. Verifying proper multipathing settings in ESXi.

2. Verifying the proper numbers of paths

3. Verifying I/O balance and redundancy on the FlashArray

© Pure Storage 2017 | 17


The Path Selection Policy and number of paths can be verified easily inside of the vSphere Web Client.

This will report the path selection policy and the number of logical paths. The number of logical paths will
depend on the number of HBAs, zoning and the number of ports cabled on the FlashArray.

The I/O Operations Limit cannot be checked from the vSphere Web Client—it can only be verified or
altered via command line utilities. The following command can check a particular device for the PSP and
I/O Operations Limit:

esxcli storage nmp device list -d naa.<device NAA>

Please remember that each of these settings are a per-host setting, so while a volume might be
configured properly on one host, it may not be correct on another. The PowerCLI script below can help
you verify this at scale in a simple way.

https://github.com/codyhosterman/powercli/blob/master/bestpracticechecker.ps1

It is also possible to check multipathing from the FlashArray.

A CLI command exists to monitor I/O balance coming into the array:

purehost monitor --balance --interval <how long to sample> --repeat <how many
iterations>

© Pure Storage 2017 | 18


The command will report a few things:

1. The host name

2. The individual initiators from the host. If they are logged into more than one FlashArray port, it will be
reported more than once. If an initiator is not logged in at all, it will not appear

3. The port that initiator is logged into

4. The number of I/Os that came into that port from that initiator over the time period sampled

5. The relative percentage of I/Os for that initiator as compared to the maximum

The balance command will count the I/Os that came down the particular initiator during the sampled time
period, and it will do that for all initiator/target relationships for that host. Whichever relationship/path has
the most I/Os will be designated as 100%. The rest of the paths will be then denoted as a percentage of
that number. So if a host has two paths, and the first path has 1,000 I/Os and the second path has 800,
the first path will be 100% and the second will be 80%.

A well balanced host should be within a few percentage points of each path. Anything more than 15% or
so might be worthy of investigation. Refer to this post for more information.

The GUI will also report on host connectivity in general, based on initiator logins.

This report should be listed as redundant for every hosts, meaning that it is connected to each controller.
If this reports something else, investigate zoning and/or host configuration to correct this. For detailed
explanation of the various reported states, please refer to the FlashArray User Guide which can be found
directly in your GUI:

© Pure Storage 2017 | 19


Enabling EFI-Based Virtual Machines

If a virtual machine is using EFI (Extensible Firmware Interface) instead of BIOS, it is necessary to reduce
the ESXi parameter called Disk.DiskMaxIOSize from the default of 32 MB (32,768 KB) down to 4 MB
(4,096 KB). If this is not configured, the virtual machine will fail to properly boot.

This should be set on every ESXi host EFI-enabled virtual machines have access to, in order to provide for
vMotion support. If EFI is not used, this can remain at the default. There is no known performance effect
to changing this value. For more detail on this change, please refer to the VMware KB article here:

KB: Windows virtual machines using EFI fails to start

BEST PRACTICE: Change the Disk.DiskMaxIOSize from 32 MB to at least 4 MB


when EFI-enabled VMs are present. A lower value is also acceptable.

© Pure Storage 2017 | 20


VAAI Configuration

The VMware API for Array Integration (VAAI) primitives offer a way to offload and accelerate certain
operations in a VMware environment.

Pure Storage requires that all VAAI features be enabled on every ESXi hosts that are using FlashArray
storage. Disabling VAAI features can greatly reduce the efficiency and performance of FlashArray storage
in ESXi environments.

All VAAI features are enabled by default (set to 1) in ESXi 5.x and later, so no action is typically required.
Though these settings can be verified via the vSphere Web Client or CLI tools.

1. WRITE SAME—DataMover.HardwareAcceleratedInit

2. XCOPY—DataMover.HardwareAcceleratedMove

3. ATOMIC TEST & SET— VMFS3.HardwareAcceleratedLocking

BEST PRACTICE: Keep VAAI enabled. DataMover.HardwareAcceleratedInit,


DataMover.HardwareAcceleratedMove, and
VMFS3.HardwareAcceleratedLocking

In order to provide a more efficient heart-beating mechanism for datastores VMware introduced a new
host-wide setting called /VMFS3/UseATSForHBOnVMFS5. In VMware’s own words:

“A change in the VMFS heartbeat update method was introduced in ESXi 5.5 Update 2, to help
optimize the VMFS heartbeat process. Whereas the legacy method involves plain SCSI reads and
writes with the VMware ESXi kernel handling validation, the new method offloads the validation
step to the storage system. “

Pure Storage recommends keeping this value on whenever possible. That being said, it is a host wide
setting, and it can possibly affect storage arrays from other vendors negatively. Read the VMware KB
article here:

ESXi host loses connectivity to a VMFS3 and VMFS5 datastore

© Pure Storage 2017 | 21


Pure Storage is NOT susceptible to this issue, but in the case of the presence of an affected array from
another vendor, it might be necessary to turn this off. In this case, Pure Storage supports disabling this
value and reverting to traditional heart-beating mechanisms.

BEST PRACTICE: Keep VMFS3.UseATSForHBOnVMFS5 enabled—this is


preferred. If another vendor is present and prefers it to be disabled, it is
supported by Pure Storage to disable it.

iSCSI Configuration

Just like any other array that supports iSCSI, Pure Storage recommends the following changes to an
iSCSI-based vSphere environment for the best performance.

For a detailed walkthrough of setting up iSCSI on VMware ESXi and on the FlashArray please refer to the
following VMware white paper. This is required reading for any VMware/iSCSI user:

http://www.vmware.com/files/pdf/iSCSI_design_deploy.pdf

Set Login Timeout to a Larger Value

For example, to set the Login Timeout value to 30 seconds, use commands similar to the following:

1. Log in to the vSphere Web Client and select the host under Hosts and Clusters.

2. Navigate to the Manage tab.

3. Select the Storage option.

4. Under Storage Adapters, select the iSCSI vmhba to be modified.

5. Select Advanced and change the Login Timeout parameter. This can be done on the iSCSI adapter
itself or on a specific target.

The default Login Timeout value is 5 seconds and the maximum value is 60 seconds.

© Pure Storage 2017 | 22


BEST PRACTICE: Set iSCSI Login Timeout for FlashArray targets to 30 seconds.
A higher value is supported but not necessary.

Disable DelayedAck

DelayedAck is an advanced iSCSI option that allows or disallows an iSCSI initiator to delay
acknowledgement of received data packets.

Disabling DelayedAck in ESXi 5.x:

1. Log in to the vSphere Web Client and select the host under Hosts and Clusters.

2. Navigate to the Manage tab.

3. Select the Storage option.

4. Under Storage Adapters, select the iSCSI vmhba to be modified.

Navigate to Advanced Options and modify the DelayedAck setting by using the option that best matches
your requirements, as follows:

Option 1: Modify the DelayedAck setting on a particular discovery address (recommended) as follows:

1. Select Targets.

2. On a discovery address, select the Dynamic Discovery tab.

3. Select the iSCSI server.

4. Click Advanced.

5. Change DelayedAck to false.

Option 2: Modify the DelayedAck setting on a specific target as follows:

1. Select Targets.

2. Select the Static Discovery tab.

3. Select the iSCSI server and click Advanced.

4. Change DelayedAck to false.

Option 3: Modify the DelayedAck setting globally for the iSCSI adapter as follows:

1. Select the Advanced Options tab and click Advanced.

© Pure Storage 2017 | 23


2. Change DelayedAck to false.

DelayedAck is highly recommended to be disabled, but is not absolutely required by Pure Storage. In
highly-congested networks, if packets are lost, or simply take too long to be acknowledged, due to that
congestion, performance can drop. If DelayedAck is enabled, where not every packet is acknowledged at
once (instead one acknowledgement is sent per so many packets) far more re-transmission can occur,
further exacerbating congestion. This can lead to continually decreasing performance until congestion
clears. Since DelayedAck can contribute to this it is recommended to disable it in order to greatly reduce
the effect of congested networks and packet retransmission.

Enabling jumbo frames can further harm this since packets that are retransmitted are far larger. If jumbo
frames are enabled, it is absolutely recommended to disable DelayedAck. See the following VMware KB
for more information:

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=100
2598

BEST PRACTICE: Disable DelayedAck for FlashArray iSCSI targets.

iSCSI Port Binding

For software iSCSI initiators, without additional configuration the default behavior for iSCSI pathing is for
ESXi to leverage its routing tables to identify a path to its configured iSCSI targets. Without solid
understanding of network configuration and routing behaviors, this can lead to unpredictable pathing
and/or path unavailability in a hardware failure. To configure predictable and reliable path selection and
failover it is necessary to configure iSCSI port binding (iSCSI multi-pathing).

Configuration and detailed discussion is out of the scope of this document, but it is recommended to read
through the following VMware document that describes this and other concepts in-depth:

http://www.vmware.com/files/pdf/techpaper/vmware-multipathing-configuration-software-iSCSI-port-
binding.pdf

BEST PRACTICE: Use Port Binding for ESXi software iSCSI adapters when
possible.

Note that ESXi 6.5 has expanded support for port binding and features such as iSCSI routing and multiple
subnets. Refer to ESXi 6.5 release notes for more information.

© Pure Storage 2017 | 24


Jumbo Frames

In some iSCSI environments it is required to enable jumbo frames to adhere with the network
configuration between the host and the FlashArray. Enabling jumbo frames is a cross-environment
change so careful coordination is required to ensure proper configuration. It is important to work with
your networking team and Pure Storage representatives when enabling jumbo frames. Please note that
this is not a requirement for iSCSI use on the Pure Storage FlashArray—in general Pure Storage
recommends leaving MTU at the default setting.

That being said, altering the MTU is a fully supported and is up to the discretion of the user.

1. Configure jumbo frames on the FlashArray iSCSI ports

Configure jumbo frames on the physical network switch/infrastructure for each port using the relevant
switch CLI or GUI.

2. Configure jumbo frames on the appropriate VMkernel network adapter and vSwitch.

A. Browse to a host in the vSphere Web Client navigator.


B. Click the Manage tab and select Networking > Virtual Switches.
C. Select the switch from the vSwitch list.
D. Click the name of the VMkernel network adapter.
E. Click the pencil icon to edit.
F. Click NIC settings and set the MTU to your desired value.
G. Click OK.

© Pure Storage 2017 | 25


H. Click the pencil icon to edit on the top to edit the vSwitch itself.
I. Set the MTU to your desired value.
J. Click OK.

Once jumbo frames are configured, verify end-to-end jumbo frame compatibility. To verify, try to ping an
address on the storage network with vmkping.

vmkping -s 9000 <ip address of Pure Storage iSCSI port>

If the ping operations does not return successfully, then jumbo frames is not properly configured in ESXi,
the networking devices, and/or the FlashArray port.

Challenge-Handshake Authentication Protocol (CHAP)

iSCSI CHAP is supported on the FlashArray for uni- or bidirectional authentication. Enabling CHAP is
optional and up to the discretion of the user. Please refer to the following post for a detailed walkthrough:

http://www.codyhosterman.com/2015/03/configuring-iscsi-chap-in-vmware-with-the-flasharray/

Please note that iSCSI CHAP is not currently supported with dynamic iSCSI targets on the
FlashArray. If CHAP is going to be used, please configure your iSCSI FlashArray targets as static
only. Dynamic target support for CHAP will be added in a future Purity release.

© Pure Storage 2017 | 26


Datastore Management
Volume Sizing and Count

A common question when first provisioning storage on the FlashArray is what capacity should I be using
for each volume? VMware VMFS supports up to a maximum size of 64 TB. The FlashArray supports far
larger than that, but for ESXi, volumes should not be made larger than 64 TB due to the filesystem limit of
VMFS.

Using a smaller number of large volumes is generally a better idea today. In the past a recommendation
to use a larger number of smaller volumes was made for performance limitations that no longer exist. This
limit traditionally was due to two reasons: VMFS scalability issues due to locking and/or per-volume
queue limitations on the underlying array. VMware resolved the first issue with the introduction of Atomic
Test and Set, also called Hardware Assisted Locking.

Prior to the introduction of VAAI ATS (Atomic Test and Set), VMFS used LUN-level locking via full SCSI
reservations to acquire exclusive metadata control for a VMFS volume. In a cluster with multiple nodes, all
metadata operations were serialized and hosts had to wait until whichever host, currently holding a lock,
released that lock. This behavior not only caused metadata lock queues but also prevented standard I/O
to a volume from VMs on other ESXi hosts which were not currently holding the lock.

With VAAI ATS, the lock granularity is reduced to a much smaller level of control (specific metadata
segments, not an entire volume) for the VMFS that a given host needs to access. This behavior makes the
metadata change process not only very efficient, but more importantly provides a mechanism for parallel
metadata access while still maintaining data integrity and availability. ATS allows for ESXi hosts to no
longer have to queue metadata change requests, which consequently speeds up operations that
previously had to wait for a lock. Therefore, situations with large amounts of simultaneous virtual machine
provisioning operations will see the most benefit. The standard use cases benefiting the most from ATS
include:

• High virtual machine to VMFS density.

• Extremely dynamic environments—numerous provisioning and de-provisioning of VMs (e.g. VDI


using non-persistent linked-clones).

• High intensity virtual machine operations such as boot storms, or virtual disk growth.

The introduction of ATS removed scaling limits via the removal of lock contention; thus, moving the
bottleneck down to the storage, where many traditional arrays had per-volume I/O queue limits. This
limited what a single volume could do from a performance perspective as compared to what the array
could do in aggregate. This is not the case with the FlashArray.

A FlashArray volume is not limited by an artificial performance limit or an individual queue. A single
FlashArray volume can offer the full performance of an entire FlashArray, so provisioning ten volumes
instead of one, is not going to empty the HBAs out any faster. From a FlashArray perspective, there is no
immediate performance benefit to using more than one volume for your virtual machines.

The main point is that there is always a bottleneck somewhere, and when you fix that bottleneck, it is just
transferred somewhere else. ESXi was once the bottleneck due to its locking mechanism, then it fixed

© Pure Storage 2017 | 27


that with ATS. This, in turn, moved the bottleneck down to the array volume queue depth limit. The
FlashArray doesn’t have a volume queue depth limit, so now that bottleneck has been moved back to
ESXi and its internal queues.

Altering VMware queue limits is not generally needed with the exception of extraordinarily intense
workloads. For high-performance configuration, refer to the section of this document on ESXi queue
configuration.

VMFS Version Recommendations

Pure Storage recommends using the latest supported version of VMFS that is permitted by your ESXi
host.

For ESXi 5.x through 6.0, use VMFS-5. For ESXi 6.5 and later it is highly recommended to use VMFS-6. It
should be noted that VMFS-6 is not the default option for ESX 6.5, so be careful to choose the correct
version when creating new VMFS datastores in ESXi 6.5.

Furthermore, when upgrading to ESXi 6.5, there is no in-place upgrade path of a VMFS-5 datastore to
VMFS-6. Therefore, it is recommended to create a new volume entirely, format it was VMFS-6, and then
Storage vMotion all virtual machines from the old VMFS-5 datastore to the new VMFS-6 datastore and
then delete and remove the VMFS-5 datastore when complete.

BEST PRACTICE: Use the latest supported VMFS version for the in-use ESXi host

Datastore Performance Management

ESXi and vCenter offer a variety of features to control the performance capabilities of a given datastore.
This section will overview FlashArray support and recommendations for these features.

For a deeper-dive of ESXi queueing and the FlashArray, please read this post:

© Pure Storage 2017 | 28


http://www.codyhosterman.com/2017/02/understanding-vmware-esxi-queuing-and-the-flasharray/

Queue Depth Limits and DSNRO

ESXi offers the ability to configure queue depth limits for devices on a HBA or iSCSI initiator. This dictates
how many I/Os can be outstanding to a given device before I/Os start queuing in the ESXi kernel. If the
queue depth limit is set too low, IOPS and throughput can be limited and latency can increase due to
queuing. If too high, virtual machine I/O fairness can be affected and high-volume workloads can affect
other workloads from other virtual machines or other hosts. The device queue depth limit is set on the
initiator and the value (and setting name) varies depending on the model and type:

Type Default Value Value Name

QLogic 64 qlfxmaxqdepth

Brocade 32 bfa_lun_queue_depth

Emulex 32 lpfc0_lun_queue_depth

Cisco UCS 32 fnic_max_qdepth

Software iSCSI 128 iscsivmk_LunQDepth

Changing these settings require a host reboot. For instructions to check and set these values, please
refer to this VMware KB article:

Changing the queue depth for QLogic, Emulex, and Brocade HBAs

There is a second per-device setting called “Disk Schedule Number Requests Outstanding” often
referred to as DSRNO. This is a hypervisor-level queue depth limit that provides a mechanism for
managing the queue depth limit for an individual device. This value is a per-device setting that defaults to
32 and can be increased to a value of 256.

© Pure Storage 2017 | 29


It should be noted that this value only comes into play for a volume when that volume is being accessed
by two or more virtual machines on that host. If there is more than one virtual machine active on it, the
lowest of the two values (DSNRO or the HBA device queue depth limit) is the value that is observed by
ESXi as the actual device queue depth limit. So, in other words, if a volume has two VMs on it, and
DSRNO is set to 32 and the HBA device queue depth limit is set to 64, the actual queue depth limit for
that device is 32. For more information on DSRNO see the VMware KB here:

Setting the Maximum Outstanding Disk Requests for virtual machines

In general, Pure Storage does not recommend changing these values. The majority of workloads are
distributed across hosts and/or not intense enough to overwhelm the default queue depths. The
FlashArray is fast enough (low enough latency) that the workload has to be quite high in order to
overwhelm the queue.

If the default queue depth is consistently overwhelmed, the simplest option is to provision a new
datastore and distribute some virtual machines to the new datastore. If a workload from a virtual machine
is too great for the default queue depth, then increasing the queue depth limit is the better option.

If a workload demands queue depths to be increased, Pure Storage recommends making both the HBA
device queue depth limit and DSNRO equal. Generally, do not change these values without direction from
VMware or Pure Storage support.

You can verify the values of both of these for a given device with the command:

esxcli storage core device list –d <naa.xxxxx>

Device Max Queue Depth: 96

No of outstanding IOs with competing worlds: 64

BEST PRACTICE: Leave queue depth limits at the default. Only raise them when
performance requirements dictate it.

Dynamic Queue Throttling

ESXi supports the ability to dynamically throttle a device queue depth limit when an array volume has
been overwhelmed. An array volume is overwhelmed when the array responds to an I/O request with a
sense code of QUEUE FULL or BUSY. When a certain number of these are received, ESXi will throttle
down the queue depth limit for that device and slowly increase it as conditions improve. This is controlled
via two settings:

• Disk.QFullSampleSize—the count of QUEUE FULL or BUSY conditions it takes before ESXi will
start throttling. Default is zero (feature disabled)

• Disk.QFullThreshold—the count of good condition responses after a QUEUE FULL or BUSY


required before ESXi starts increasing the queue depth limit again

© Pure Storage 2017 | 30


The Pure Storage FlashArray does not advertise a queue full condition for a volume. Since every volume
can use the full performance and queue of the FlashArray, this limit is impractically high and this sense
code essentially will never be issued. Therefore, there is no reason to set or alter these values for Pure
Storage FlashArray volumes because QUEUE FULL will never occur.

Storage I/O Control

VMware vCenter offers a feature called Storage I/O Control (SIOC) that will throttle selected virtual
machines when a certain average datastore latency has been reached or when a certain percentage of
peak throughput has been hit. ESXi throttles virtual machines by artificially reducing the number of slots
that are available to it in the device queue depth limit.

Pure Storage fully supports enabling this technology on datastores residing on the FlashArray. That being
said, it may not be particularly useful for a few reasons.

First, the minimum latency that can be configured for SIOC before it will begin throttling a virtual machine
is 5 ms.

When a latency threshold is entered, vCenter will aggregate a weighted average of all disk latencies seen
by all hosts that see that particular datastore. This number does not include host-side queuing, it is only
the time it takes for the I/O to be sent from the SAN to the array and acknowledged back.

Furthermore, SIOC uses a random-read injector to identify the capabilities of a datastore from a
performance perspective. At a high-level, it runs a quick series of tests with increasing numbers of
outstanding I/Os to identify the throughput maximums via high latency identification. This allows ESXi to
determine what the peak throughput is, for when the “Percentage of peak throughput” is chosen.

Knowing these factors, we can make these points about SIOC and the FlashArray:

1. SIOC is not going to be particularly helpful if there is host-side queueing since it does not take
host-induced additional latency into account. This (the ESXi device queue) is generally where
most of the latency is introduced in a FlashArray environment.

2. The FlashArray will rarely have sustained latency above 1 ms, so this threshold will not be reached
for any meaningful amount of time on a FlashArray volume so SIOC will never kick in

3. A single FlashArray volume does not have a queue limit, so it can handle quite a high number of
outstanding I/O and throughput (especially reads), therefore SIOC and its random-read injector
cannot identify FlashArray limits in meaningful ways.

In short, SIOC is fully supported by Pure Storage, but Pure Storage makes no specific recommendations
for configuration.

© Pure Storage 2017 | 31


Storage DRS

VMware vCenter also offers a feature called Storage DRS (SDRS). SDRS moves virtual machines from one
datastore to another when a certain average latency threshold has been reach on the datastore or when a
certain used capacity has been reached. For this section, let’s focus on the performance-based moves.

Storage DRS, like Storage IO Control, wait for a certain latency threshold to be reached before it acts.
And, also like SIOC, the minimum is 5 ms.

While it is too high in general to be useful for FlashArray-induced latency, SDRS differs from SIOC in the
latency it actually looks at. SDRS uses the “VMObservedLatency” (referred to a GAVG in esxtop) averages
from the hosts accessing the datastore. Therefore, this latency includes time spent queueing in the ESXi
kernel. So, theoretically, a high-IOPS workload, with a low configured device queue depth limit, an I/O
could conceivably spend 5 ms or more queuing in the kernel. In this situation Storage DRS will suggest
moving a virtual machine to a datastore which does not have an overwhelmed queue.

That being said, this is still an unlikely scenario because:

1. The FlashArray empties out the queue fast enough that a workload must be quite intense to fill up
an ESXi queue so much that is spends 5 ms or more in it. Usually, with a workload like that, the
queueing is higher up the stack (in the virtual machine)

2. Storage DRS samples for 16 hours before it makes a recommendation, so typically you will get one
recommendation set per-day for a datastore. So this workload must be consistently and extremely
high, for a long time, before SDRS acts.

In short, SDRS is fully supported by Pure Storage, but Pure Storage makes no specific recommendations
for performance-based move configuration.

Datastore Capacity Management

Managing the capacity usage of your VMFS datastores is an important part of regular care of your virtual
infrastructure. There are a variety of mechanisms inside of ESXi and vCenter to monitor capacity.
Frequently, the concept of data reduction on the FlashArray is seen as a complicating factor, when in
reality it is a simplifying factor, or at worse, a non-issue. Let’s overview some concepts on how to best
manage VMFS datastores from a capacity perspective.

VMFS Usage vs. FlashArray Volume Capacity

VMFS reports how much is currently allocated in the filesystem on that volume. Depending on the type of
virtual disk (thin or thick), dictates how much is consumed upon creation of the virtual machine (or virtual
disk specifically). Thin disks only allocates what the guest has actually written to, and therefore VMFS only
records what the virtual machine has written in its space usage. Thick type virtual disks allocate the full

© Pure Storage 2017 | 32


virtual disk immediately, so VMFS records much more space as being used than is actually used by the
virtual machines.

This is one of the reasons thin virtual disks are preferred—you get better insight into how much space the
guests are actually using.

Regardless of what type you choose, ESXi is going to take the sum total of the allocated space of your
virtual disks and compare that to the total capacity of the filesystem of the volume. The used space is the
sum of those virtual disks allocations. This number increases as virtual disks grow or new ones are added,
and can decrease as old ones are deleted or moved, or even shrunk.

Compare this to what the FlashArray reports for a capacity. What the FlashArray reports for a volume
usage is NOT the amount used for that volume. What the FlashArray reports is the unique footprint of the
volume on that array. Let’s look at this VMFS that is on 42 TB FlashArray volume. The VMFS is therefore
42 TB, but is only using 1.68 TB of space of the filesystem. This means that there are 1.68 TB of allocated
virtual disks:

Now let’s look at the FlashArray volume.

The FlashArray volume shows that 102 GB is being used. Does this mean that VMFS is incorrect? No.
VMFS is always the source of truth. The “Volumes” metric represents the amount of physical capacity that
has been written to the volume after data reduction that no other volume on the array shares.

This metric can go change at any time as the data set changes on that volume or any other volume on the
FlashArray. If, for instance, some other host writes 2 GB to another volume (let’s call it “volume2”), and
that 2 GB happens to be identical to 2 GB of that 102 GB on “InfrastructureDS”, then “InfrastructureDS”
would no longer have 102 GB of unique space. It would drop down to 100 GB, even though nothing
changed on “InfrastructureDS” itself. Instead, someone else just happened to write similar data, making
the footprint of “InfrastructureDS” less unique.

For a more detailed conversation around this, refer to this blog post:

http://www.codyhosterman.com/2017/01/vmfs-capacity-monitoring-in-a-data-reducing-world/

© Pure Storage 2017 | 33


So, why doesn’t VMFS report the same used capacity as that the FlashArray reports for as used for the
underlying volume? Well, because they mean different things. VMware reports what is allocated on the
VMFS and the FlashArray reports what is unique on the underlying volume. The FlashArray value can
change constantly. The FlashArray metric is only meant to show how reducible the data on that volume is
internal to the volume and against the entire array. Conversely, VMFS capacity usage is based solely on
how much capacity is allocated to it by virtual machines it. The FlashArray volume space metric, on the
other hand, actually relates to what is also being used on other volumes. In other words, VMFS usage is
only affected by data on the VMFS volume itself. The FlashArray volume space metric is affected by the
data on the volume and also on all of volumes. So the two values should not be conflated.

For capacity tracking, you should refer to the VMFS usage. How do we best track VMFS usage? What do
we do when it is full?

Monitoring and Managing VMFS Capacity Usage

As virtual machines grow and as new ones are added, the VMFS volume they sit on will slowly fill up. How
to respond and to manage this is a common question.

In general, using a product like vRealize Operations Manager with the FlashArray Management Pack is a
great option here. But for the purposes of this document we will focus on what can be done inside of
vCenter alone.

You need to decide on a few things:

• At what percentage full of my VMFS volume do I become concerned?

• When that happens what should I do?

• What capacity value should I monitor on the FlashArray?

The first question is the easiest to answer. Choose either a percentage full, or at a certain capacity free.
Do you want to do something when, for example, a VMFS volume hits 75% full or when there is less than
50 GB left free? Choose what makes sense to you.

vCenter alerts are a great way to monitor VMFS capacity automatically. There is a default alert for
datastore capacity, but it does not do anything other than tag the datastore object with the alarm state.
Pure Storage recommends creating an additional alarm for capacity that executes some type of additional
action when the alarm is triggered.

© Pure Storage 2017 | 34


Configuring a script to run, an email to be issued, or a notification trap to be sent greatly diminishes the
chance of a datastore running out of space unnoticed.

© Pure Storage 2017 | 35


BEST PRACTICE: Configure capacity alerts to send a message or initiate an
action

The next step is to decide what happens when a capacity warning occurs. There are a few options:

1. Increase the capacity of the volume

2. Move virtual machines off of the volume

3. Add a new volume

Your solution may be one of these options or a mix of all three. Let’s quickly walk through the options.

Option 1: Increase the capacity of the volume

This is the simplest option. If capacity has crossed the threshold you have specified, increase the volume
apacity to clear the threshold. The process is:

Increase the FlashArray volume capacity:

Rescan the hosts that use the datastore:

© Pure Storage 2017 | 36


Increase the VMFS to use the new capacity:

© Pure Storage 2017 | 37


Choose “Use ‘Free space xxx TB’ to expand the datastore”. There should be a note that the datastore
already occupies space on this volume. If this note does not appear, you have selected the wrong device
to expand to. Pure Storage highly recommends that you do not create VMFS datastores that span
multiple volumes—a VMFS should have a one to one relationship to a FlashArray volume.

This will clear the alarm and add additional capacity.

Option 2: Move virtual machine off of the volume

Another option is to move one or more virtual machines from a more-full datastore to a less-full datastore.
While this can be manually achieved through case-by-case Storage vMotion, Pure Storage recommends
leveraging Storage DRS to automate this. Storage DRS provides, in addition to the performance-based
moves discussed earlier in this document, the ability to automatically Storage vMotion virtual machines
based on capacity usage of VMFS datastores. If a datastore reaches a certain percent full, SDRS can

© Pure Storage 2017 | 38


automatically move, or make recommendations for, virtual machines to be moved to balance out space
usage across volumes.

SDRS is enabled on a datastore cluster:

When a datastore cluster is created you can enable SDRS and choose capacity threshold settings, which
can either be a percentage or a capacity amount:

Pure Storage has no specific recommendations for these values and can be decided upon based on your
own environment. Pure Storage does have a few recommendations for datastore cluster configuration in
general:

© Pure Storage 2017 | 39


• Only include datastores on the same FlashArray in a given datastore cluster. This will allow
Storage vMotion to use the VAAI XCOPY offload to accelerate the migration process of virtual
machines and greatly reduce the footprint of the migration workload

• Include datastores with similar configurations in a datastore cluster. For example, if a datastore is
replicated on the FlashArray, only include datastores that are replicated in the same FlashArray
protection group so that a SDRS migration does not violate required protection for a virtual
machine

Option 3: Create a new VMFS volume

The last option is to create an entirely new VMFS volume. You might decide to do this for a few reasons:

1. The current VMFS volumes have maxed out possible capacity (64 TB each)

2. The current VMFS volumes have overloaded the queue depth inside of every ESXi server using it.
Therefore, they can be grown in capacity, but cannot provide any more performance due to ESXi
limits

In this situation follow the standard VMFS provisioning steps for a new datastore. Once the creation of
volumes and hosts/host groups and the volume connection is complete, the volumes will be accessible to

© Pure Storage 2017 | 40


the ESXi host(s) 2. Using the vSphere Web Client, initiate a “Rescan Storage…” to make the newly-
connected Pure Storage volume(s) fully-visible to the ESXi servers in the cluster as shown above. One can
then use the “Add Storage” wizard to format the newly added volume.

Shrinking a Volume

While it is possible to shrink a FlashArray volume non-disruptively, vSphere does not have the ability to
shrink a VMFS partition. Therefore, do not shrink FlashArray volumes that contain VMFS datastores as
doing so could incur data loss.

Mounting a Snapshot Volume

The Pure Storage FlashArray provides the ability to take local or remote point-in-time snapshots of
volumes which can then be used for backup/restore and/or test/dev. When a snapshot is taken of a
volume containing a VMFS, there are a few additional steps from both the FlashArray and vSphere sides
to be able to access the snapshot point-in-time data.

When a FlashArray snapshot is taken, a new volume is not created—essentially it is a metadata point-in-
time reference to a data blocks on the array that reflect that moment’s version of the data. This snapshot
is immutable and cannot be directly mounted. Instead, the metadata of a snapshot has to be “copied” to
an actual volume which then allows the point-in-time, which was preserved by the snapshot metadata, to
be presented to a host. This behavior allows the snapshot to be re-used again and again without
changing the data in that snapshot. If a snapshot is not needed more than one time an alternative option
is to create a direct snap copy from one volume to another—merging the snapshot creation step with the
association step.

When a volume hosting a VMFS datastore is copied via array-based snapshots, the copied VMFS
datastore is now on a volume that has a different serial number than the original source volume.
Therefore, the VMFS will be reported as having an invalid signature since the VMFS datastore signature is
a hash partially based on the serial of the hosting device. Consequently, the device will not be
automatically mounted upon rescan—instead the new datastore wizard needs to be run to find the device
and resignature the VMFS datastore. Pure Storage recommends resignaturing copied volumes rather
than mounting them with an existing signatures (referred to as force mounting).

BEST PRACTICE: Resignature copied VMFS volumes and do not force mount
them.

2
Presuming SAN zoning is completed.

© Pure Storage 2017 | 41


For more detail on resignaturing and snapshot management, please refer to the following blog posts:

1. Mounting an unresolved VMFS

2. Why not force mount?

3. Why might a VMFS resignature operation fail?

4. How to correlate a VMFS and a FlashArray volume

5. How to snapshot a VMFS on the FlashArray

6. Restoring a single VM from a FlashArray snapshot

Deleting a Datastore
Prior to the deletion of a volume, ensure that all important data has been moved off or is no longer
needed. From the vSphere Web Client (or CLI) delete or unmount the VMFS volume and then detach the
underlying device from the appropriate host(s).

After a volume has been detached from the ESXi host(s) it must first be disconnected (from the FlashArray
perspective) from the host within the Purity GUI before it can be destroyed (deleted) on the FlashArray.

BEST PRACTICE: Unmount and detach FlashArray volumes from all ESXi hosts
before destroying them on the array.

© Pure Storage 2017 | 42


1. Unmount the VMFS datastore on every host that it is mounted to.

2. Detach the volume that hosted the datastore from every ESXi host that sees the volume

3. Disconnect the volume from the hosts or host groups on the FlashArray

© Pure Storage 2017 | 43


4. Destroy the volume on FlashArray. It is not recommended to eradicate it. Let the FlashArray
eradicate it automatically in 24 to provide for recovery if needed.

By default a volume can be recovered after deletion for 24 hours to protect against accidental removal.

This entire removal and deletion process is automated through the Pure Storage Plugin for the vSphere
Web Client and its use is therefore recommended.

© Pure Storage 2017 | 44


Virtual Machine and Guest Configuration
This section reviews recommended settings and configurations for virtual machines and their guest
operating systems. In general, refer to VMware recommendations for configuration of virtual guests, but
Pure Storage does have some additional recommendations for certain situations.

As always, configure guest operating systems in accordance with the corresponding vendor installation
guidelines.

Virtual Disk Choice

Storage provisioning in virtual infrastructure involves multiple steps of crucial decisions. VMware vSphere
offers three virtual disks formats: thin, zeroedthick and eagerzeroedthick.

To quickly review the types:

1. Thin—thin virtual disks only allocate what is used by the guest. Upon creation, thin virtual disks only
consume one block of space. As the guest writes data, new blocks are allocated on VMFS, then
zereod out, then the data is committed to storage. Therefore there is some additional latency for new
write

2. Zeroedthick (lazy)— zeroed thick virtual disks allocate all of the space on the VMFS upon creation. As
soon as the guest writes to a specific block for the first time in the virtual disk, the block is first zeroed,
then the data is committed. Therefore there is some additional latency for new writes. Though less
than thin (since it only has to zero—not also allocate), there is a negligible performance impact
between zeroedthick (lazy) and thin.

© Pure Storage 2017 | 45


3. Eagerzeroedthick—eagerzeroedthick virtual disks allocate all of their provisioned size upon creation
and also zero out the entire capacity upon creation. This type of disk cannot be used until the zeroing
is complete. Eagerzeroedthick has zero first-write latency penalty because allocation and zeroing is
done in advance, and not on-demand.

Prior to WRITE SAME support, the performance differences between these virtual disk allocation
mechanisms were distinct. This was due to the fact that before an unallocated block could be written to,
zeroes would have to be written first causing an allocate-on-first-write penalty (increased latency).
Therefore, for every new block written, there were actually two writes; the zeroes then the actual data.
For thin and zeroedthick virtual disks, this zeroing was on-demand so the penalty was seen by
applications. For eagerzeroedthick, it was noticed during deployment because the entire virtual disk had
to be zeroed prior to use. This zeroing caused unnecessary I/O on the SAN fabric, subtracting available
bandwidth from “real” I/O.

To resolve this issue, VMware introduced WRITE SAME support. WRITE SAME is a SCSI command that
tells a target device (or array) to write a pattern (in this case, zeros) to a target location. ESXi utilizes this
command to avoid having to actually send a payload of zeros but instead simply communicates to any
array that it needs to write zeros to a certain location on a certain device. This not only reduces traffic on
the SAN fabric, but also speeds up the overall process since the zeros do not have to traverse the data
path.

This process is optimized even further on the Pure Storage FlashArray. Since the array does not store
space-wasting patterns like contiguous zeros on the array, the zeros are discarded and any subsequent
reads will result in the array returning zeros to the host. This additional array-side optimization further
reduces the time and penalty caused by pre-zeroing of newly-allocated blocks.

With this knowledge, choosing a virtual disk is a factor of a few different variables that need to be
evaluated. In general, Pure Storage makes the following recommendations:

• Lead with thin virtual disks. They offer the greatest flexibility and functionality and the
performance difference is only at issue with the most sensitive of applications.

• For highly-sensitive applications with high performance requirements, eagerzeroedthick is the


best choice. It is always the best-performing virtual disk type.

• In no situation does Pure Storage recommend the use of zeroedthick (thick provision lazy zeroed)
virtual disks. There is very little advantage to this format over the others and can also lead to
stranded space as described in this post.

With that being said, for more details on how these recommendations were decided upon, refer to the
following considerations. Note that at the end of each consideration is a recommendation but that
recommendation is valid only when only that specific consideration is important. When choosing a virtual
disk type, take into account your virtual machine business requirements and utilize these requirements to
motivate your design decisions. Based on those decisions, choose the virtual disk type that is best
suitable for your virtual machine.

• Performance—with the introduction of WRITE SAME (more information on WRITE SAME can be
found in the section Block Zero or WRITE SAME) support, the performance difference between the
different types of virtual disks is dramatically reduced—almost eliminated. In lab experiments, a

© Pure Storage 2017 | 46


difference can be observed during writes to unallocated portions of a thin or zeroedthick virtual
disk. This difference is negligible but of course still non-zero. Therefore, performance is no longer
an overridingly important factor in the type of virtual disk to use as the disparity is diminished, but
for the most latency-sensitive of applications eagerzeroedthick will always be slightly better than
the others. Recommendation: eagerzeroedthick.

• Protection against space exhaustion—each virtual disk type, based on its architecture, has
varying degrees of protection against space exhaustion. Thin virtual disks do not reserve space
on the VMFS datastore upon creation and instead grow in 1 MB blocks as needed. Therefore, if
unmonitored, as one or more thin virtual disks grow on the datastore, they could exhaust the
capacity of the VMFS. Even if the underlying array has plenty of additional capacity to provide. If
careful monitoring is in place that provides the ability to make proactive resolution of capacity
exhaustion (moving the virtual machines around or grow the VMFS) thin virtual disks are a
perfectly acceptable choice. Storage DRS is an excellent solution for space exhaustion
prevention. While careful monitoring can protect against this possibility, it can still be of a concern
and should be contemplated upon initial provisioning. Zeroedthick and eagerzeroedthick virtual
disks are not susceptible to VMFS logical capacity exhaustion because the space is reserved on
the VMFS upon creation. Recommendation: eagerzeroedthick.

• Virtual disk density—it should be noted that while all virtual disk types take up the same amount
of physical space on the FlashArray due to data reduction, they have different requirements on
the VMFS layer. Thin virtual disks can be oversubscribed (more capacity provisioned than the
VMFS reports as being available) allowing for far more virtual disks to fit on a given volume than
either of the thick formats. This provides a greater virtual machine to VMFS datastore density and
reduces the number or size of volumes that are required to store them. This, in effect, reduces the
management overhead of provisioning and managing additional volumes in a VMware
environment. Recommendation: thin.

• Time to create—the virtual disk types also vary in how long it takes to initially create them. Since
thin and zeroedthick virtual disks do not zero space until they are actually written to by a guest
they are both created in trivial amounts of time—usually a second or two. Eagerzeroedthick disks,
on the other hand, are pre-zeroed at creation and consequently take additional time to create. If
the time-to-first-IO is paramount for whatever reason, thin or zeroedthick is best.
Recommendation: thin.

• Space efficiency—the aforementioned bullet on “virtual disk density” describes efficiency on the
VMFS layer. Efficiency on the underlying array should also be considered. In vSphere 6.0, thin
virtual disks support guest-OS initiated UNMAP to a virtual disk, through the VMFS and down to
the physical storage. Therefore, thin virtual disks can be more space efficient as time wears on
and data is written and deleted. For more information on this functionality in vSphere 6.0, refer to
the section, In-Guest UNMAP in ESXi 6.x, that can be found later in this paper. Recommendation:
thin.

• Storage usage trending—A useful metric to know and track is how much capacity is actually
being used by a virtual machine guest. If you know how much space is being used by the guests,
and furthermore, at what rate that is growing, you can more appropriately size and project storage
allocations. Since thick type virtual disks reserve all of the space on the VMFS whether or not the
guest has used it, it is difficult to know, without guest tools, how much the guest has actually
written. Often it is not known until the application has used its available space and the

© Pure Storage 2017 | 47


administrator requests more. This leads to abrupt and unplanned capacity increases. Thin virtual
disks only reserve what the guest has written, therefore will grow as the guest adds more data.
This growth can be monitored and trended. This will allow VMware administrators to plan and
predict future storage needs. Recommendation: thin

BEST PRACTICE: Use thin virtual disks for most virtual machines. Use
eagerzeroedthick for virtual machines that require very high performance
levels. Do not use zeroedthick.

No virtual disk option quite fits all possible use-cases perfectly, so choosing an allocation method should
generally be decided upon on a case-by-case basis. VMs that are intended for short term use, without
extraordinarily high performance requirements, fit nicely with thin virtual disks. For VMs that have higher
performance needs eagerzeroedthick is a good choice.

Virtual Hardware Configuration

Pure Storage makes the following recommendations for configuring a virtual machine in vSphere:

• Virtual SCSI Adapter—the best performing and most efficient virtual SCSI adapter is the VMware
Paravirtual SCSI Adapter. This adapter has the best CPU efficiency at high workloads and
provides the highest queue depths for a virtual machine—starting at an adapter queue depth of
256 and a virtual disk queue depth 64 (twice what the LSI Logic can provide by default). The
queue limits of PVSCSI can be further tuned, please refer to the Guest-level Settings section for
more information.

• Virtual Hardware—it is recommended to use the latest virtual hardware version that the hosting
ESXi hosts supports.

• VMware tools—in general, it is advisable to install the latest supported version of VMware tools in
all virtual machines.

• CPU and Memory-- provision vCPUs and memory as per the application requirements.

• VM encryption—vSphere 6.5 introduced virtual machine encryption which encrypts the VM’s
virtual disk from a VMFS perspective. Pure Storage generally recommends not using this and
instead relying on FlashArray-level Data-At-Rest-Encryption. Though, if it is necessary to leverage
VM Encryption, doing so is fully supported by Pure Storage—but it should be noted that data
reduction will disappear for that virtual machine as host level encryption renders post-encryption
deduplication and compression impossible.

• IOPS Limits—if you want to limit a virtual machine or a particular amount of IOPS, you can use the
built-in ESXi IOPS limits. ESXi allows you to specify a number of IOPS a given virtual machine can
issue for a given virtual disk. Once the virtual machine exceeds that number, any additional I/Os
will be queued. In ESXi 6.0 and earlier this can be applied via the “Edit Settings” option of a virtual
machine. In ESXi 6.5 and later, this can also be configured via a VM Storage Policy.

© Pure Storage 2017 | 48


BEST PRACTICE: Use the Paravirtual SCSI adapter for virtual machines for best
performance.

Template Configuration

In general, template configuration is no different than virtual machine configuration. Standard


recommendations apply. That being said, since templates are by definition frequently copied, Pure
Storage recommends putting copies of the templates on FlashArrays that are frequent targets of virtual
machines deployed from a template. If the template and target datastore are on the same FlashArray, the
copy process can take advantage of VAAI XCOPY, which greatly accelerates the copy process while
reducing the workload impact of the copy operation.

BEST PRACTICE: For the fastest and most efficient virtual machine
deployments, place templates on the same FlashArray as the target datastore.

Prior to Full Copy (XCOPY) API support, when virtual machines needed to be copied or moved from one
location to another, such as with Storage vMotion or a virtual machine cloning operation, ESXi would
issue many SCSI read/write commands between the source and target storage location (the same or
different device). This resulted in a very intense and often lengthy additional workload to this set of
devices. This SCSI I/O consequently stole available bandwidth from more “important” I/O such as the I/O
issued from virtualized applications. Therefore, copy or move operations often had to be scheduled to
occur only during non-peak hours in order to limit interference with normal production storage
performance. This restriction effectively decreased the ability of administrators to use the virtualized
infrastructure in the dynamic and flexible nature that was intended.

The introduction of XCOPY support for virtual machine movement allows for this workload to be
offloaded from the virtualization stack to almost entirely onto the storage array. The ESXi kernel is no
longer directly in the data copy path and the storage array instead does all the work. XCOPY functions by
having the ESXi host identify a region of a VMFS that needs to be copied. ESXi describes this space into a
series of XCOPY SCSI commands and sends them to the array. The array then translates these block
descriptors and copies/moves the data from the described source locations to the described target

© Pure Storage 2017 | 49


locations. This architecture therefore does not require the moved data to be sent back and forth between
the host and array—the SAN fabric does not play a role in traversing the data. This vastly reduces the time
to move data. XCOPY benefits are leveraged during the following operations 3:

• Virtual machine cloning

• Storage vMotion

• Deploying virtual machines from template

During these offloaded operations, the throughput required on the data path is greatly reduced as well as
the ESXi hardware resources (HBAs, CPUs etc.) initiating the request. This frees up resources for more
important virtual machine operations by letting the ESXi resources do what they do best: run virtual
machines, and lets the storage do what it does best: manage the storage.

On the Pure Storage FlashArray, XCOPY sessions are exceptionally quick and efficient. Due to
FlashReduce technology (features like deduplication, pattern removal and compression) similar data is
never stored on the FlashArray more than once. Therefore, during a host-initiated copy operation such as
with XCOPY, the FlashArray does not need to copy the data—this would be wasteful. Instead, Purity
simply accepts and acknowledges the XCOPY requests and creates new (or in the case of Storage
vMotion, redirects existing) metadata pointers. By not actually having to copy/move data, the offload
duration is greatly reduced. In effect, the XCOPY process is a 100% inline deduplicated operation. A non-
VAAI copy process for a virtual machine containing 50 GB of data can take on the order of multiple
minutes or more depending on the workload on the SAN. When XCOPY is enabled this time drops to a
matter of a few seconds.

XCOPY on the Pure Storage FlashArray works directly out of the box without any configuration required.
Nevertheless, there is one simple configuration change on the ESXi hosts that will increase the speed of
XCOPY operations. ESXi offers an advanced setting called the MaxHWTransferSize that controls the
maximum amount of data space that a single XCOPY SCSI command can describe. The default value for
this setting is 4 MB. This means that any given XCOPY SCSI command sent from that ESXi host cannot
exceed 4 MB of described data. Pure Storage recommends leaving this at the default value, but does
support increasing the value if another vendor requires it to be. There is no XCOPY performance impact
of increasing this value. Decreasing the value from 4 MB can slow down XCOPY sessions somewhat and
should not be done without guidance from VMware or Pure Storage support.

3
Note that there are VMware-enforced caveats in certain situations that would prevent XCOPY behavior and revert
to legacy software copy. Refer to VMware documentation for this information at www.vmware.com.

© Pure Storage 2017 | 50


Guest-level Settings

In general, standard operating system configuration best practices apply and Pure Storage does not
make any overriding recommendations. So, please refer to VMware and/or OS vendor documentation for
particulars of configuring a guest operating system for best operation in VMware virtualized environment.

That being said, Pure Storage does recommend two non-default options for file system configuration in a
guest on a virtual disk residing on a FlashArray volume. Both configurations provide automatic space
reclamation support. While it is highly recommended to follow these recommendations, it is not absolutely
required.

In short:

• For Linux guests in vSphere 6.5 or later using thin virtual disks, mount filesystems with the
discard option

• For Windows 2012 R2 or later guests in vSphere 6.0 or later using thin virtual disks, use a NTFS
allocation unit size of 64K

Refer to the in-guest space reclamation section for a detailed description of enabling these options.

High-IOPS Virtual Machines

As mentioned earlier, the Paravirtual SCSI adapter should be leveraged for the best default performance.
For virtual machines that host applications that need to push a large amount of IOPS (50,000+) to a single
virtual disk, some non-default configurations are required. The PVSCSI adapter allows the default adapter
queue depth limit and the per-device queue depth limit to be increased from the default of 256 and 64
(respectively) to 1024 and 256.

In general, this change is not needed and therefore not recommended for most workloads. Only increase
these values if you know a virtual machines needs or will need this additional queue depth. Opening this
queue for a virtual machine that does not (or should not) need it, can expose noisy neighbor performance
issues. If a virtual machine has a process that unexpectedly becomes intense it can unfairly steal queue
slots from other virtual machines sharing the underlying datastore on that host. This can then cause the
performance of other virtual machines to suffer.

BEST PRACTICE: Leave virtual machine queue depth limits at the default unless
performance requirements dictate otherwise.

If an application does need to push a high amount of IOPS to a single virtual disk these limits must be
increased. See VMware KB here for information on how to configure Paravirtual SCSI adapter queue
limits. The process slightly differs between Linux and Windows.

Refer to this blog post for more information:

© Pure Storage 2017 | 51


http://www.codyhosterman.com/2017/02/understanding-vmware-esxi-queuing-and-the-flasharray/

A few general recommendations:

1. Only increase these limits when needed

2. If you change this limit it is required to change queue depth limits in ESXi as well, otherwise
changing these values will have no tangible affect

3. A good rule of knowing if you need to change these values is if you are not getting the IOPS
you expect and the latency is high in the guest, but not reported as high in ESXi or on the
FlashArray volume

© Pure Storage 2017 | 52


Space Management and Reclamation
In block-based storage implementations, the file system is managed by the host, not the array. Because
of this, the array does not typically know when a file has been deleted or moved from a storage volume
and therefore does not know when to release the space. This behavior is especially detrimental in thinly-
provisioned environments (which today is nearly ubiquitous) where that space could be immediately
allocated to another device/application or just returned to the pool of available storage. Space consumed
by files that have been deleted or moved is referred to as “dead space”.

For data reduction all-flash-arrays like the FlashArray, it is of particular importance to make sure that this
dead space is reclaimed. If dead space is not reclaimed the array can inaccurately report how much
space is being used, which can lead to confusion (due to differing used space reporting from the hosts)
and premature purchase of additional storage. Therefore, reclaiming this space and making sure the
FlashArray has an accurate reflection of what is actually used is essential. An accurate space report has
the following benefits:

• More efficient replication since blocks that are no longer needed are not replication

• More efficient snapshots, blocks that are no longer needed are not protected by additional
snapshots

• Better space usage trending, if space is updated to be accurate frequently it is much easier to
trend and project actual space exhaustion. Otherwise, dead space can make it seem that capacity
is used up far earlier than it should be

The feature that can be used to reclaim space is called Space Reclamation, which uses the SCSI
command called UNMAP. UNMAP can be issued to underlying device to inform the array that certain
blocks are no longer needed by the host and can be “reclaimed”. The array can then return those blocks
to the pool of free storage.

Ensuring that space is reclaimed on a regular basis is a primary best practice for using the FlashArray in
VMware environments. Read on for details.

VMware Dead Space Overview

There are two place that dead space can be introduced:

• VMFS—when an administrator deletes a virtual disk or an entire virtual machine (or moves it to
another datastore) the space that used to store that virtual disk or virtual machine is now dead on
the array. The array does not know that the space has been freed up, therefore, effectively turning
those blocks into dead space.

• In-guest—when a file has been moved or deleted from a guest filesystem inside of a virtual
machine on a virtual disk, the underlying VMFS does not know that a block is no longer in use by
the virtual disk and consequently neither does the array. So that space is now also dead space.

© Pure Storage 2017 | 53


So dead space can be accumulated in two ways. Fortunately, VMware has methods for dealing with both,
that leverage the UNMAP feature support of the FlashArray.

Space Reclamation with VMFS

Space reclamation with VMFS differs depending on the version of ESXi. VMware has supported UNMAP
in various forms since ESXi 5.0. This document is only going to focus on UNMAP implementation for ESXi
5.5 and later. For previous UNMAP behaviors, refer to VMware documentation.

In vSphere 5.5 and 6.0, VMFS UNMAP is a manual process, executed on demand by an administrator. In
vSphere 6.5, VMFS UNMAP is an automatic process that gets executed by ESXi as needed without
administrative intervention.

VMFS UNMAP in vSphere 5.5 through 6.0

To reclaim space in vSphere 5.5 and 6.0, UNMAP is available in the command “esxcli”. UNMAP can be
run anywhere esxcli is installed and therefore does not require an SSH session:

esxcli storage vmfs unmap -l <datastore name> -n (blocks per iteration)

UNMAP with esxcli is an iterative process. The block count specifies how large each iteration is. If you do
not specify a block count, 200 blocks will be the default value (each block is 1 MB, so each iteration
issues UNMAP to a 200 MB section at a time). The operation runs UNMAP against the free space of the
VMFS volume until the entirety of the free space has been reclaimed. If the free space is not perfectly
divisible by the block count, the block count will be reduced at the final iteration to whatever amount of
space is left.

While the FlashArray can handle very large values for this, ESXi does not support increasing the block
count any larger than 1% of the free capacity of the target VMFS volume. Consequently, the best practice
for block count during UNMAP is no greater than 1% of the free space. So as an example, if a VMFS
volume has 1,048,576 MB free, the largest block count supported is 10,485 (always round down). If you
specify a larger value the command will still be accepted, but ESXi will override the value back down to
the default of 200 MB, which will dramatically slow down the operation.

It is imperative to calculate the block count value based off of the 1% of the free space only when that
capacity is expressed in megabytes—since VMFS 5 blocks are 1 MB each. This will allow for simple and
accurate identification of the largest allowable block count for a given datastore. Using GB or TB can lead
to rounding errors, and as a result, too large of a block count value. Always round off decimals to the
lowest near MB in order to calculate this number (do not round up).

BEST PRACTICE: For shortest UNMAP duration, use a large block count.

© Pure Storage 2017 | 54


There are other methods to run or even schedule UNMAP, such as PowerCLI, vRealize Orchestrator and
the FlashArray vSphere Web Client Plugin. These methods are outside of the scope of this document,
please refer to the respective VMware and FlashArray integration documents for further detail.

If an UNMAP process seems to be slow, you can check to see if the block count value was overridden.
You can check the hostd.log file in the /var/log/ directory on the target ESXi host. For every UNMAP
operation there will be a series of messages that dictate the block count for every iteration. Examine the
log and look for a line that indicates the UUID of the VMFS volume being reclaimed, the line will look like
the example below:

Unmap: Async Unmapped 5000 blocks from volume 545d6633-4e026dce-d8b2-90e2ba392174

From ESXi 5.5 Patch 3 and later, any UNMAP operation against a datastore that is 75% or more
full will use a block count of 200 regardless to any block count specified in the command. For
more information refer to the VMware KB article here.

VMFS UNMAP in vSphere 6.5

In the ESXi 6.5 release, VMware introduced automatic UNMAP support for VMFS volumes. ESXi 6.5
introduced a new version of VMFS, version 6. With VMFS-6, there is a new setting for all VMFS-6 volumes
called UNMAP priority. This defaults to low.

Pure Storage recommends that this be configured to “low” and not disabled. For the initial release of ESXi
6.5, VMware currently only offers a low priority—medium and high priorities have not yet been enabled in
the ESXi kernel. Pure Storage will re-evaluate this recommendation when and if these higher priorities
become available.

Automatic UNMAP with vSphere 6.5 is an asynchronous task and reclamation will not occur immediately
and will typically take 12 to 24 hours to complete. Each ESXi 6.5 host has a UNMAP “crawler” that will
work in tandem to reclaim space on all VMFS-6 volumes they have access to. If, for some reason, the
space needs to be reclaimed immediately, the esxcli UNMAP operation described in the previous section
can be run.

© Pure Storage 2017 | 55


BEST PRACTICE: Keep automatic UNMAP enabled on VMFS-6 volumes with the setting of “low”

Please note that VMFS-6 Automatic UNMAP will not be issued to inactive datastores. In other words, if a
datastore does not have actively running virtual machines on it, the datastore will be ignored. In those
cases, the simplest option to reclaim them is to run the traditional esxcli UNMAP command.

Pure Storage does support automatic UNMAP being disabled, if that is, for some reason, preferred by the
customer. But to provide the most efficient and accurate environment, it is highly recommended to be left
enabled.

Space Reclamation In-Guest

The discussion above speaks only about space reclamation directly on a VMFS volume which pertains to
dead space accumulated by the deletion or migration of virtual machines and virtual disks. Running
UNMAP on a VMFS only removes dead space in that scenario. But, as mentioned earlier, dead space can
accumulate higher up in the VMware stack—inside of the virtual machine itself.

When a guest writes data to a file system on a virtual disk, the required capacity is allocated on the VMFS
(if not already allocated) by expanding the file that represents the virtual disk. The data is then committed
down to the array. When that data is deleted by the guest, the guest OS filesystem is cleared of the file,
but this deletion is not reflected by the virtual disk allocation on the VMFS, nor the physical capacity on
the array. To ensure the below layers are accurately reporting used space, in-guest UNMAP should be
enabled.

Understanding In-Guest UNMAP in ESXi

Prior to ESXi 6.0 and virtual machine hardware version 11, guests could not leverage native UNMAP
capabilities on a virtual disk because ESXi virtualized the SCSI layer and did not report UNMAP capability
up through to the guest. So even if guest operating systems supported UNMAP natively, they could not
issue UNMAP to a file system residing on a virtual disk. Consequently, reclaiming this space was a manual
and tedious process.

In ESXi 6.0, VMware has resolved this problem and streamlined the reclamation process. With in-guest
UNMAP support, guests running in a virtual machine using hardware version 11 can now issue UNMAP
directly to virtual disks. The process is as follows:

1. A guest application or user deletes a file from a file system residing on a thin virtual disk

2. The guest automatically (or manually) issues UNMAP to the guest file system on the virtual disk

3. The virtual disk is then shrunk in accordance to the amount of space reclaimed inside of it.

© Pure Storage 2017 | 56


4. If EnableBlockDelete is enabled, UNMAP will then be issued to the VMFS volume for the space that
previously was held by the thin virtual disk. The capacity is then reclaimed on the FlashArray.

Prior to ESXi 6.0, the parameter EnableBlockDelete was a defunct option that was previously only
functional in very early versions of ESXi 5.0 to enable or disable automated VMFS UNMAP. This option is
now functional in ESXi 6.0 and has been re-purposed to allow in-guest UNMAP to be translated down to
the VMFS and accordingly the SCSI volume. By default, EnableBlockDelete is disabled and can be
enabled via the vSphere Web Client or CLI utilities.

In-guest UNMAP support does actually not require this parameter to be enabled though. Enabling this
parameter allows for end-to-end UNMAP or in other words, in-guest UNMAP commands to be passed
down to the VMFS layer. For this reason, enabling this option is a best practice for ESXi 6.x and later.

BEST PRACTICE: Enable the option “VMFS3.EnableBlockDelete” on ESXi 6.x


hosts. This is disabled by default.

ESXi 6.5 expands support for in-guest UNMAP to additional guests types. ESXi 6.0 in-guest UNMAP only
is supported with Windows Server 2012 R2 (or Windows 8) and later. ESXi 6.5 introduces support for
Linux operating systems. The underlying reason for this is that ESXi 6.0 and earlier only supported SCSI
version 2. Windows uses SCSI-2 UNMAP and therefore could take advantage of this feature set. Linux,
uses SCSI version 5 and could not. In ESXi 6.5, VMware enhanced their SCSI support to go up to SCSI-6,
which allows guest like Linux to issue commands that they could not before.

© Pure Storage 2017 | 57


Using the built-in Linux tool, sq_inq, you can see, through an excerpt of the response, the SCSI support
difference between the ESXi versions:

ESXi 6.0/VM Hardware Version 11: ESXi 6.5/VM Hardware Version 13:

pureuser@ubuntu:/mnt/unmap$ sudo sg_inq pureuser@ubuntu:/mnt/unmap$ sudo sg_inq


/dev/sdc -d /dev/sdc -d

standard INQUIRY: standard INQUIRY:

PQual=0 Device_type=0 RMB=0 PQual=0 Device_type=0 RMB=0


version=0x02 [SCSI-2] version=0x06 [SPC-4]

<…> <…>

Vendor identification: VMware Vendor identification: VMware

Product identification: Virtual disk Product identification: Virtual disk

Product revision level: 1.0 Product revision level: 2.0

You can note the differences in SCSI support level and also the product revision of the virtual disk
themselves (version 1 to 2).

It is important to note that simply upgrading to ESXi 6.5 will not provide SCSI-6 support. The virtual
hardware for the virtual machine must be upgraded to version 13 once ESXi has been upgraded. VM
hardware version 13 is what provides the additional SCSI support to the guest.

The following are the requirements for in-guest UNMAP to properly function:

1. The target virtual disk must be a thin virtual disk. Thick-type virtual disks do not support UNMAP.

2. For Windows In-Guest UNMAP:

a. ESXi 6.0 and later

b. VM Hardware version 11 and later

3. For Linux In-Guest UNMAP

a. ESXi 6.5 and later

b. VM Hardware version 13 and later

4. If Change Block Tracking (CBT) is enabled for a virtual disk, In-Guest UNMAP for that virtual disk is
only supported starting with ESXi 6.5

© Pure Storage 2017 | 58


In-Guest UNMAP Alignment Requirements

VMware ESXi requires that any UNMAP request sent down by a guest must be aligned to 1 MB. For a
variety of reasons, not all UNMAP requests will be aligned as such and in in ESXi 6.5 and earlier a large
percentage failed. In ESXi 6.5 Patch 1, ESXi has been altered to be more tolerant of misaligned UNMAP
requests. See the VMware patch information here:

https://kb.vmware.com/kb/2148989

Prior to this, any UNMAP requests that were even partially misaligned would fail entirely. Leading to no
reclamation. In ESXi 6.5 P1, any portion of UNMAP requests that are aligned will be accepted and passed
along to the underlying array. Misaligned portions will be accepted but not passed down. Instead, the
affected blocks referred to by the misaligned UNMAPs will be instead zeroed out with WRITE SAME. The
benefit of this behavior on the FlashArray, is that zeroing is identical in behavior to UNMAP so all of the
space will be reclaimed regardless of misalignment.

BEST PRACTICE: Apply ESXi 6.5 Patch Release ESXi650-201703001 (2148989) as soon as
possible to be able to take full advantage of in-guest UNMAP.

In-Guest UNMAP with Windows

Starting with ESXi 6.0, In-Guest UNMAP is supported with Windows 2012 R2 and later Windows-based
operating systems. For a full report of UNMAP support with Windows, please refer to Microsoft
documentation.

NTFS supports automatic UNMAP by default—this means (assuming the underlying storage supports it)
Windows will issue UNMAP to the blocks a file used to consume immediately once it has been deleted or
moved.

Automatic UNMAP is enabled by default in Windows. This can be verified with the following CLI
command:

fsutil behavior query DisableDeleteNotify

If DisableDeleteNotify is set to 0, UNMAP is ENABLED. Setting it to 1, DISABLES it. Pure Storage


recommends this value remain enabled. To change it, use the following command:

fsutil behavior set DisableDeleteNotify 1

© Pure Storage 2017 | 59


Windows also supports manual UNMAP, which can be run on-demand or per a schedule. This is
performed using the Disk Optimizer tool. Thin virtual disks can be identified in the tool as volume media
types of “thin provisioned drive”—these are the volumes that support UNMAP.

Select the drive and click “Optimize”. Or configure a scheduled optimization.

© Pure Storage 2017 | 60


Windows prior to ESXi 6.5 Patch 1

Ordinarily, this would work with the default configuration of NTFS, but VMware enforces additional
UNMAP alignment, that requires a non-default NTFS configuration. In order to enable in-guest UNMAP in
Windows for a given NTFS, that NTFS must be formatted using a 32 or 64K allocation unit size. This will
force far more Windows UNMAP operations to be aligned with VMware requirements.

64K is also the standard recommendation for SQL Server installations—which therefore makes this a
generally accepted change. To checking existing NTFS volumes are using the proper allocation unit size
to support UNMAP, this simple PowerShell two-line command can be run to list a report:

$wql = "SELECT Label, Blocksize, Name FROM Win32_Volume WHERE FileSystem='NTFS'"


Get-WmiObject -Query $wql -ComputerName '.' | Select-Object Label, Blocksize, Name

BEST PRACTICE: Use the 32 or 64K Allocation Unit Size for NTFS to enable
automatic UNMAP in a Windows virtual machine.

Due to alignment issues, the manual UNMAP tool (Disk Optimizer) is not particularly effective as often
most UNMAPs are misaligned and will fail.

Windows with ESXi 6.5 Patch 1 and Later

As of ESXi 6.5 Patch 1, all NTFS allocation unit sizes will work with in-guest UNMAP. So at this ESXi level
no unit size change is required to enable this functionality. That being said, there is additional benefit to

© Pure Storage 2017 | 61


using a 32 or 64 K allocation unit. While all sizes will allow all space to be reclaimed on the FlashArray, a
32 or 64 K allocation unit will cause more UNMAP requests to be aligned and therefore more of the
underlying virtual disk will be returned to the VMFS (more of it will be shrunk).

The manual tool, Disk Optimizer, now works quite well and can be used. If UNMAP is disabled in Windows
(it is enabled by default) this tool can be used to reclaim space on-demand or via a schedule. If automatic
UNMAP is enabled, there is generally no need to use this tool.

For more information on this, please read the following blog post:

http://www.codyhosterman.com/2017/03/in-guest-unmap-fix-in-esxi-6-5-patch-1/

In-Guest UNMAP with Linux

Starting with ESXi 6.5, In-Guest UNMAP is supported with Linux-based operating systems and most
common file systems (Ext4, Btrfs, JFS, XFS, F2FS, VFAT). For a full report of UNMAP support with Linux
configurations, please refer to appropriate Linux distribution documentation. To enable this behavior it is
necessary to use Virtual Machine Hardware Version 13 or later.

Linux supports both automatic and manual methods of UNMAP.

Linux file systems do not support automatic UNMAP by default—this behavior needs to be enabled during
the mount operation of the file system. This is achieved by mounting the file system with the “discard”
option.

pureuser@ubuntu:/mnt$ sudo mount /dev/sdd /mnt/unmaptest -o discard

When mounted with the discard option, Linux will issue UNMAP to the blocks a file used to consume
immediately once it has been deleted or moved.

Pure Storage does not require this feature to be enabled, but generally recommends doing so to keep
capacity information correct throughout the storage stack.

BEST PRACTICE: Mount Linux filesystems with the “discard” option to enable
in-guest UNMAP for Linux-based virtual machines.

Linux with ESXi 6.5

In ESXi 6.5, automatic UNMAP is supported and is able to reclaim most of the identified dead space. In
general, Linux aligns most UNMAP requests in automatic UNMAP and therefore is quite effective in
reclaiming space.

The manual method fstrim, does align initial UNMAP requests and therefore entirely fails.

© Pure Storage 2017 | 62


Linux with ESXi 6.5 Patch 1 and Later

In ESXi 6.5 Patch 1 and later, automatic UNMAP is even more effective, now that even the small number
of misaligned UNMAPs are handled. Furthermore, the manual method via fstrim works as well. So in this
ESXi version, either method is a valid option.

What to expect after UNMAP is run on the FlashArray

The behavior of space reclamation (UNMAP) on a data-reducing array such as the FlashArray is
somewhat changed and this is due to the concept of data-deduplication. When a host runs UNMAP (ESXi
or otherwise), an UNMAP SCSI command is issued to the storage device that indicates what logical
blocks are no longer in use. Traditionally, a logical block address referred to a specific part of an
underlying disk on an array. So when UNMAP was issued, the physical space was always reclaimed
because there was a direct correlation between a logical block and a physical cylinder/track/block on the
storage device. This is not necessarily the case on a data reduction array.

A logical block on a FlashArray volume does not refer directly to a physical location on flash. Instead, if
there is data written to that block, there is just a reference to a metadata pointer. That pointer then refers
to a physical location. If UNMAP is executed against that block, only the metadata pointer is guaranteed
to be removed. The physical data will remain if it is deduplicated, meaning other blocks (anywhere else
on the array) have metadata pointers to that data too. A physical block is only reclaimed once the last
pointer on your array to that data is removed. Therefore, UNMAP only directly removes metadata
pointers. The reclamation of physical capacity is only a possible consequential result of UNMAP.

Herein lies the importance of UNMAP—making sure the metadata tables of the FlashArray are accurate.
This allows space to be reclaimed as soon as possible. Generally, some physical space will be
immediately returned upon reclamation, as not everything is dedupable. In the end, the amount of
reclaimed space heavily relies on how dedupable the data set is—the higher the dedupability, the lower
the likelihood, and amount, and immediacy of physical space being reclaimed. The fact to remember is
that UNMAP is important for the long-term “health” of space reporting and usage on the array.

In addition to using the Pure Storage vSphere Web Client Plugin, standard provisioning methods through
the FlashArray GUI or FlashArray CLI can be utilized. This section highlights the end-to-end provisioning
of storage volumes on the Pure Storage FlashArray from creation of a volume to formatting it on an ESXi
host. The management simplicity is one of the guiding principles of FlashArray as just a few clicks are
required to configure and provision storage to the server.

© Pure Storage 2017 | 63


VMware Horizon 7 Configuration and Tuning
VMware Horizon View 7 configurations are quite minimal; some of the tuning is highlighted in the section.

Horizon Connection Server Tuning

1. Use SE sparse Virtual disks format–VMware Horizon 5.2 and above supports a vmdk disk format
called Space Efficient (SE) sparse virtual disks which was introduced in vSphere 5.1. The advantages
of SE sparse virtual disks can be summarized as follows:

o Benefits of growing and shrinking dynamically, this prevents VMDK bloat as desktops rewrite
data and delete data.

o Available for Horizon View Composer based linked clone desktops (Not for persistent
desktops) only

o VM hardware version 9 or later

o No need to do a refresh/recompose operation to reclaim space

o No need to set blackout periods, as we handle UNMAPs efficiently

2. We recommend using this disk format for deploying linked-clone and instant-clone desktops on Pure
Storage due to the space efficiencies and preventing VMDK bloat.

3. Disable View Storage Accelerator (linked-clones only, VSA must be enabled to use instant-clones)

o The View storage accelerator, VSA, is a feature in VMware View 5.1 onwards based on
VMware vSphere content based read caching (CBRC). There are several advantages of
enabling VSA including containing boot storms by utilizing the host side caching of commonly
used blocks. It even helps in steady state performance of desktops that use the same
applications. As Pure Storage FlashArray gives you lots of IOPS at very low latency, we don’t
need the extra layer of caching at the host level. The biggest disadvantage is the time it takes
to recompose and refresh desktops, as every time you change the image file it has to rebuild
the disk digest file. Also it consumes host side memory for caching and consume host CPU for
building digest files. For shorter desktop recompose times, we recommend turning off VSA.

4. Tune maximum concurrent vCenter operations—the default concurrent vCenter operations on the
vCenter servers are defined in the View configuration’s advanced vCenter settings. These values are
quite conservative and can be increased to higher values. Pure Storage FlashArray can withstand
more operations including:

o Max Concurrent vCenter provisioning operation (recommended value >= 50)

o Max Concurrent Power operations (recommended value >= 50)

o Max concurrent View composer operations (recommended value >= 50)

© Pure Storage 2017 | 64


The higher values will drastically cut down the amount of time needed to accomplish typical View
Administrative tasks such as recomposing or creating a new pool.

Some caveats include:

1. These settings are global and will affect all pools. Pools on other slower disk arrays will suffer if you
set these values higher, so enabling these will have adverse effects.

2. The vCenter configuration, especially number of vCPUs, amount of memory, and the backing storage
has implications from these settings. In order to attain the best possible performance levels, it is
important to note the vCenter configurations and size them according to VMware’s sizing guidelines
and increase them as needed if you notice a resource has become saturated.

© Pure Storage 2017 | 65


References
1. Interpreting esxtop statistics - http://communities.vmware.com/docs/DOC-11812

2. Configuration maximums in vSphere 5.0 - http://www.vmware.com/pdf/vsphere5/r50/vsphere-50-


configuration-maximums.pdf

3. Configuration maximums in vSphere 5.1 - http://www.vmware.com/pdf/vsphere5/r51/vsphere-51-


configuration-maximums.pdf

4. Configuration maximums in vSphere 5.5 - http://www.vmware.com/pdf/vsphere5/r55/vsphere-55-


configuration-maximums.pdf

5. Configuration maximums in vSphere 6.0 - https://www.vmware.com/pdf/vsphere6/r60/vsphere-60-


configuration-maximums.pdf

About the Author


Cody Hosterman is the Technical Director for VMware Solutions at Pure
Storage. His primary responsibility is overseeing, testing, designing,
documenting, and demonstrating VMware-based integration with the Pure
Storage FlashArray platform. Cody has been with Pure Storage since 2014
and has been working in vendor enterprise storage/VMware integration roles
since 2008.

Cody graduated from the Pennsylvania State University with a bachelors


degreee in Information Sciences & Technology in 2008. Special areas of
focus include core ESXi storage, vRealize (Orchestrator, Automation and Log Insight), Site Recovery
Manager and PowerCLI. Cody has been named a VMware vExpert from 2013 through 2016.

Blog: www.codyhosterman.com

Twitter: www.twitter.com/codyhosterman

YouTube: https://www.youtube.com/codyhosterman

© Pure Storage 2017 | 66


Appendix I: Per-Device NMP Configuration
This appendix describes how to check and/or set Round Robin and an I/O Operation Limit of 1 on existing
devices.

The first step is to change the particular device to use the Round Robin PSP. This must be done on every
ESXi host and can be done with through the vSphere Web Client, the Pure Storage Plugin for the vSphere
Web Client or via command line utilities.

esxcli storage nmp device set -d naa.<device NAA> --psp=VMW_PSP_RR

© Pure Storage 2017 | 67


Note that changing the PSP using the Web Client Plugin is the preferred option as it will automatically
configure Round Robin across all of the hosts. Note that this does not set the IO Operation Limit it 1 which
is a command line option only—this must be done separately.

Round Robin can also be set on a per-device, per-host basis using the standard vSphere Web Client
actions. The procedure to setup Round Robin policy for a Pure Storage volume is shown in the below
figure. Note that this does not set the IO Operation Limit it 1 which is a command line option only—this
must be done separately.

To set a device that is pre-existing to have an IO Operation limit of one, run the following command:

esxcli storage nmp psp roundrobin deviceconfig set -d naa.<naa address> -I 1 -t iops

© Pure Storage 2017 | 68


© 2017 Pure Storage, Inc. All rights reserved. Pure Storage and the "P" Logo are trademarks or registered
trademarks of Pure Storage, Inc. in the U.S. and other countries. VMware, Orchestrator, vSphere, vCenter,
and ESXi are registered trademarks of VMware in the U.S. and other countries. Microsoft Windows is a
registered trademark of Microsoft in the U.S. and other countries. Linux is a registered trademark of Linux
Foundation in the U.S. and other countries. The Pure Storage product described in this documentation is
distributed under a license agreement and may be used only in accordance with the terms of the
agreement. The license agreement restricts its use, copying, distribution, decompilation, and reverse
engineering. No part of this documentation may be reproduced in any form by any means without prior
written authorization from Pure Storage, Inc. and its licensors, if any.

THE DOCUMENTATION IS PROVIDED "AS IS" AND ALL EXPRESS OR IMPLIED CONDITIONS,

REPRESENTATIONS AND WARRANTIES, INCLUDING ANY IMPLIED WARRANTY OF MERCHANTABILITY,

FITNESS FOR A PARTICULAR PURPOSE, OR NON-INFRINGEMENT ARE DISCLAIMED, EXCEPT TO THE

EXTENT THAT SUCH DISCLAIMERS ARE HELD TO BE LEGALLY INVALID. PURE STORAGE SHALL NOT

BE LIABLE FOR INCIDENTAL OR CONSEQUENTIAL DAMAGES IN CONNECTION WITH THE

FURNISHING, PERFORMANCE, OR USE OF THIS DOCUMENTATION. THE INFORMATION CONTAINED

IN THIS DOCUMENTATION IS SUBJECT TO CHANGE WITHOUT NOTICE.

© Pure Storage 2017 | 69

You might also like