WinConnections Spring, 2011 - 30 Bite-Sized Tips for Best vSphere and Hyper-V VM Performance

30 Bite-Sized Tips for Best VM Performance Greg Shields, MVP Senior Partner and Principal Technologist www.ConcentratedTech.com

#1: Purchase Compatible Hardware … and not just “compatible with ESX”. Purchase hardware compatible with each other. Particularly considering vMotion needs.

#2: Buy Nehalem/Opteron Intel Nehalem & AMD Opteron include support for Intel EPT / AMD RVI processor extensions. Together, referred to as Second Level Address Translations , or SLAT Includes hardware-assisted memory management unit (MMU) virtualization. Significantly faster for certain workloads, such as those with large context switches. Finally, full support for Remote Desktop Services / XenApp Note that these support Large Memory Pages, which will disable ESX’s page table sharing.

#3: Mind NIC Oversubscription One of the greatest benefits of iSCSI is its linear scalability. Need more throughput, just add another NIC! However, VLANs and link aggregation introduce the notion of NIC oversubscription. Ceteris Paribus, Storage traffic >>> Regular traffic. Even with VLANs, always segregate storage NICs from production networking NICs. If possible/affordable use segregated network paths. Monitor! This will kill your performance faster than anything!

#4: Consider Further Segregating Heavy Workloads Some VMs run workloads that make heavy use of their attached disks. Consider segregating these workloads onto their own independent NICs and paths. Keep an eye on your IOPS.

#5: vSphere 4.0 VMs Don’t Backup Applications Correctly! vSphere 4.1 added full support for Microsoft VSS on Server 2008 guests. This support is only automatic if the guest was initially created on a vSphere 4.1 host. Hosts upgraded from vSphere 4.0 aren’t properly backing up their applications. Fix this by setting disk.EnableUUID to True. Power off machine. Edit Settings | Options | General | Configuration Parameters | Add Row Power on machine.

#6: HBA Max Queue Depth One solution for poor fibre storage performance can be adjusting your HBA maximum queue depth. More queues can mean more performance, but less cross-device and cross-VM optimizations. 32 by default. This is not a task taken lightly. Kind of like adjusting air/fuel mix on a carburetor. Multi-step process. See http://kb.vmware.com/kb/1267 for details.

#7: Consider Hardware iSCSI … but, perhaps, don’t buy them…  ESX’s software iSCSI initiator works well. However, using it incurs a small processing overhead. Hardware iSCSI NICs offload this overhead to the card. NFS/NAS storage also experience this behavior.

#7: Consider Hardware iSCSI … but, perhaps, don’t buy them…  ESX’s software iSCSI initiator works well. However, using it incurs a small processing overhead. Hardware iSCSI NICs offload this overhead to the card. NFS/NAS storage also experience this behavior. Newer NICs reduce this effect , those with… Checksum offload TCP segmentation offload (TSO) 64-bit DMA addressing Multiple Scatter Gather elements per Tx frame Jumbo frames

#8: Set NICs to Autonegotiate VMware’s recommendation is to set all NICs to autonegotiate, full duplex. This is sort of “duh” these days. But its worth mentioning, because… Some old school network admins still prefer to manually set speed/duplex due to a crazy old race condition bug that happened a long, long time ago.

#8: Set NICs to Autonegotiate VMware’s recommendation is to set all NICs to autonegotiate, full duplex. This is sort of “duh” these days. But its worth mentioning, because… Some old school network admins still prefer to manually set speed/duplex due to a crazy old race condition bug that happened a long, long time ago. Just smack around those old coots. 

#8: Set NICs to Autonegotiate VMware’s recommendation is to set all NICs to autonegotiate, full duplex. This is sort of “duh” these days. But its worth mentioning, because… Some old school network admins still prefer to manually set speed/duplex due to a crazy old race condition bug that happened a long, long time ago. Just smack around those old coots.  “ You can take your Token Ring and your IPX and go home now!”

#9: Do Not Team Storage NICs What? Don’t team them? Well, I guess I mean “team” as in the classic sense of network teaming.

#9: Do Not Team Storage NICs What? Don’t team them? Well, I guess I mean “team” as in the classic sense of network teaming. Remember that storage NICs leverage MPIO for link aggregation. MPIO is a superior technology over link aggregation anyway for storage. ‘ tis also easier to use, and better for routing! vCenter’s GUI wizards make this hard not to do, but be aware that extra steps are required…

#10: Enable Hyperthreading Early in ESX’s days we debated whether hyperthreading improved or decreased overall performance. That debate is over. The winner is “increase”. Today, hyperthreading adds a non-linear additional quantity of processing capacity. Like 20-30% (???), not a full extra proc. But you know this. Enable it in your servers’ BIOS. Just turn it on, OK?

#11: Allocate Only the CPUs You Need Allocate only as many vCPUs as a VM requires. Start with only one as your baseline. Rarely deviate. Circle this bullet point. No, really. Don’t use dual vCPUs if single-threaded application. Don’t assign more vRAM than necessary.

#11: Allocate Only the CPUs You Need Allocate only as many vCPUs as a VM requires. Start with only one as your baseline. Rarely deviate. Circle this bullet point. No, really. Don’t use dual vCPUs if single-threaded application. Don’t assign more vRAM than necessary. More vCPUs equals more problems. More vCPUs equals more interrupts Extra overhead in maintaining consistent memory view between vCPUs. This is tough, especially with today’s descheduled processing. Some OSs migrate single-threaded workloads between multiple CPUs, adding a performance tax. More CPUs good for CPU spike handling.

#12: Disconnect Unused Physical Hardware Devices COM, LPT, USB, Floppy, CD/DVD, NICs, etc. all consume interrupt resources. High priority resources. It is a big deal to insert a CD/DVD/USB. Connected Windows guests will poll CD/DVD drives very frequently, significantly affecting performance. Disconnect these in VM properties when not in use. There’s a reason why the “Connected” checkbox exists! Note! Connected devices can prevent a vMotion !

#13: Upgrade to VM Version 7 Virtual hardware version 7 offers some very significant performance improvements. VMXNET3 paravirtualized NIC driver PVSCSI paravirtualized SCSI driver Upgrade VMware tools. Reboot. (More on these in a minute) Note that VMv7 hardware cannot be vMotioned to ESX servers prior to 4.0. Be careful of this.

#14: Don’t Fear Scaling Out Creating VMs is easy, so we create them. You’ll eventually run out of CPU resources. You’ll probably run out of RAM first. Don’t run more VMs than processing/memory capacity. When running very close to capacity, use CPU reservations to guarantee 100% CPU availability for the console. Host | Configuration | System Resource Allocation | Edit Particularly important if you have software installed there. This is unnecessary in ESXi.

#15: 80% is Nice VMware recommends maintaining an administrative ceiling on utilization at 80% . This reserves enough capacity for failure, service console. VMware suggests that 90% should be a warning for overconsumption. Less workload dynamics can shift this up. … but, seriously, who can really state that?

#16: With Older OSs, Use UP HAL When Possible Newer OSs (Vista, W7, 2008) use the same HAL for all UP/SMP conditions. Older OSs leverage two HALs A Uniprocessor HAL A Multiprocessor HAL An SMP HAL that is only given a single vCPU will run slightly slower. Slightly more synchronization code. Note that this will impact hot add.

#17: Mind Scheduling Affinity It is possible to tag a VM to a particular pProc. Good for ensuring that VM has processing resources during contention. Setting Code Sharing to None prevents any other vProc from using a pProc on the same core. Like disabling HT. Setting Code Sharing to Internal prevents vProcs on other VMs from using pProc on same core. Only same VM. Just set this to Any.

#18: Don’t Touch this Setting. Exceptionally rare are the cases when this setting should be adjusted. So, no touchy.

#18: Don’t Touch this Setting. Exceptionally rare are the cases when this setting should be adjusted. So, no touchy. I will tell you when. I have very reasonable consulting rates.

#19: Don’t Just Keep Up the Old (and Dumb) Habit of Assigning 4G of RAM to Every Stinking Virtual Machine, No Matter What Workload it Runs. Really. Consciously consider the amount of RAM that a VM needs, and assign it that RAM. Yes, VMware has memory ballooning. But overallocating unnecessarily increases VM overhead. Ballooning isn’t automatic. Ballooning is slow. Ballooning is reactive.

#19: Don’t Just Keep Up the Old (and Dumb) Habit of Assigning 4G of RAM to Every Stinking Virtual Machine, No Matter What Workload it Runs. Really. Consciously consider the amount of RAM that a VM needs, and assign it that RAM. Yes, VMware has memory ballooning. But overallocating unnecessarily increases VM overhead. Ballooning isn’t automatic. Ballooning is slow. Ballooning is reactive. Note to Self: Talk about Hyper-V’s Dynamic Memory here. Very cool.

#20: Stop with the Snapshots Snapshots are (were) a significant selling point in the early days of virtualization. About to do something risky? Snapshot! Its like a career protection device!

#20: Stop with the Snapshots Snapshots are (were) a significant selling point in the early days of virtualization. About to do something risky? Snapshot! Its like a career protection device! However, snapshots aren’t (and never were) meant for long-term storage. And I mean “no more than just a few minutes” long. They’re not meant for backups. Reverting to an aged snapshot can break computer trust relationships to Windows domain. Managing snapshots, particularly linked ones, significantly reduces overall VM performance.

#21: Perform vSphere Tasks in the Off Hours Some vSphere tasks are actually quite impactful on VM operations. Provisioning virtual disks Cloning virtual machines svMotion Manipulating file permissions Backups Anti-virus (bleh) Do these tasks during off hours, or you may impact performance for other running VMs.

#22: Mind Affinities Some VMs need to regularly communicate with each other with high throughput. “ Keep Virtual Machines Together” Make sure these machines share the same vSwitch. Collocation forces inter-VM traffic through the system bus rather than pNICs, significantly increasing speed. The loss of other VMs could be bad, if both are collocated on the same host. “ Separate Virtual Machines”

#23: Disable Screen Savers And Window animations. Screen savers represent a machine interrupt, particularly those with heavy graphics. “ Pipes”, I’m looking right at you! This interrupt is particularly impactful on collocated VMs. … and, plus, screen savers on servers is sooooo 2002.

#24: Use NTP, not VMware Tools for Time Sync … and here’s one out of the odd files… VMware suggests configuring VMs to sync time from an external NTP server . They prefer this even over their own internal timekeeper. Their timekeeper uses a much lower resolution than NTP. NTP = milliseconds NT5DS = 1 second VMware Tools = ?

#25: Never Use PerfMon Inside the VM, Except… Not that you’d ever actually use PerfMon, but… Measuring performance from within a virtual machine fails to take into account for unscheduled time. Essentially, when the ESX server isn’t servicing the VM, no time passes within that VM. Also, in-VM PerfMon doesn’t recognize virtualization overhead. Most important, in-VM PerfMon can’t see down into layers below the VM: Storage, processing, etc.

#25: Never Use PerfMon Inside the VM, Except… Not that you’d ever actually use PerfMon, but… Measuring performance from within a virtual machine fails to take into account for unscheduled time. Essentially, when the ESX server isn’t servicing the VM, no time passes within that VM. Also, in-VM PerfMon doesn’t recognize virtualization overhead. Most important, in-VM PerfMon can’t see down into layers below the VM: Storage, processing, etc. VMware Tools adds PerfMon counters to VMs. These are OK to use, as they’re synched from ESX.

#26: Paravirtualization is Your Friend VM Hardware Version 7 adds two new paravirtualized drivers. VMXNET3 replaces E1000 PVSCSI replaces BusLogic/LSILogic Paravirtualized drivers are superior to emulation They are “aware” they’ve been virtualized. Can work directly with host without needing emulation. Mexican menus versus French menus. VMXNET3 supports TSO & Jumbo Frames, in the VM ! Even if the physical hardware doesn’t support TSO!

#27: Turn on Jumbo Frames, but Do it Everywhere If you plan to use Jumbo Frames… MTU size is usually set to 9000 Make sure you enable it everywhere . This brings particular assist with large file transfers (think WDS, virtual disk provisioning, etc.) and storage connections. Not all network equipment supports Jumbo Frames. Test, test, test.

#28: DRS Will Prioritize Faster Hosts over Slower Ones A neat fact (that I didn’t know): When potential hosts for a DRS relocation have compatible CPUs but different CPU frequencies and/or memory capacity… … DRS will prioritize relocating VMs to the system with the highest CPU frequency and more memory. This won’t be the case if that CPU is already at capacity.

#29: Disable FT, Unless You’re Using It … and most of you aren’t. You can “turn on” but not “enable” FT. Problem: Turning on Fault Tolerance automatically disables some features that enhance VM performance. Hardware virtual MMU is one. Or, just don’t use that horrible feature. Har! (Is there anyone from VMware in the audience…?)

#30: Match Configured OS with Actual OS Big oops here, usually during OS migrations. This setting also sets a few important low- level kernel optimizations. Make sure yours are correct!

BONUS TIP #31: Follow the Numbers Private Clouds are all about quantifying performance in terms of supply and demand. vSphere gives you those numbers. Just sum ‘em up.

Final Thoughts See! Creating good VMs isn’t all that easy. Our jobs aren’t going away any time soon! These little optimizations add up Be smart with your virtual environment and always remember…

Final Thoughts See! Creating good VMs isn’t all that easy. Our jobs aren’t going away any time soon! These little optimizations add up Be smart with your virtual environment and always remember… … you cannot change the laws of Physics!

WinConnections Spring, 2011 - 30 Bite-Sized Tips for Best vSphere and Hyper-V VM Performance

More Related Content

WinConnections Spring, 2011 - 30 Bite-Sized Tips for Best vSphere and Hyper-V VM Performance

Editor's Notes