Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Capacity Management and VMware

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Managing Capacity in VMware® Environments

Part I – Introduction and Concepts

February 2009

http://www.systar.com/products/omnivision/for_VMware 1
Overview

Server virtualization must reduce costs while not increasing business risk. Paramount to achieving cost
reduction objectives is the application of capacity management practices.

As mentioned time and again – server virtualization presents a paradigm shift in IT management and
particularly in capacity management. But what does this really mean? Let’s say you have migrated
significant amounts of production workloads to VMware, but the server environment continues to be
considerably underutilized compared to the cost reduction promises initially offered by the shift to
virtualization. In order to reduce costs further, you need to boost the average utilization of your
capacity while not introducing substantial risk to the business. Capturing the planned investment
returns of VMware without a mature understanding of capacity management is a real challenge.

Taking advantage of VMware’s resource management capabilities – with the proper understanding of its
capacity management implications – will enable you to further optimize the utilization of your
virtualized capacity. Combining VMware’s resource management with an automated capacity
management solution will enable you to assess, plan, and optimize your VMware environments to meet
the promised returns. Those firms that continue to rely on the default resource settings will set in place
significant risks to quality of IT services as workloads in their VMware environments expand. By relying
on the default resource settings, an IT organization will never fully optimize their VMware environment.

If you are aiming to maximize investment returns from your VMware cluster environments, the best
place to start is a good understanding of the fundamentals. This initial white paper begins with a
discussion of VMware resource management concepts and then focuses on how these concepts affect
capacity management. The concepts in this white paper all apply to VMware Virtual Infrastructure 3.x
versions and assume a Distributed Resource Scheduling (DRS) enabled cluster with a single resource
pool for simplification.

VMware Administrators, Architects, and Capacity Planners expanding from departmental to enterprise
wide deployments of VMware, while seeking to reduce costs through optimization of the virtual
environment, will benefit most from reading this paper.

http://www.systar.com/products/omnivision/for_VMware 2
VMware Resource Management Concepts

Let’s start by looking at how VMware handles resource management, at a high level, and establish the
key terms and controls. The key unit described here is the Virtual Machine (VM) since that is the
granular element of work in a VMware Virtual Infrastructure (VI) environment. Each virtual machine has
various sizing properties and policies, including: configured size, reservations, limits, shares and
entitlements. We will visit each of these concepts to understand its impact on capacity of the VM, host
and cluster in which it resides. Your understanding of these concepts will be key to optimizing your
VMware environment and has the potential to save your business hundreds of thousands of dollars in
capital expenses.

Figure 1. VMs are configured with a number of different resource settings.


The resource settings will help the ESX host and cluster establish the amount
of virtual CPU and Memory available for its operation. Please note: this
graphic intentionally over-simplifies complex configuration policies, but is
intended to provide a visual foundation for the discussion below.

Each VM has the following sizing properties:

Configured Size
The Configured Size equates to the number of virtual CPUs (vCPUs) and the number of MB of
memory. These sizes represent the size of the physical machine the VM is provided. They serve
as a hard cap on resources unless a Limit below that is specified (see Limit definition below).
This information is specified at VM creation time and is not generally changed dynamically.

http://www.systar.com/products/omnivision/for_VMware 3
Reservation
A Reservation is the amount of vCPU and memory (in absolute units) that a VM is guaranteed
should it need it. It has been called “Min”, and “Guaranteed” in the past but Reservation is now
the accepted term.
Resources reserved for a VM are not allocated to it unless they are actually used – thus CPU and
memory reserved, but not requested by a VM, are made available to other VMs. For example,
an application runs in a VM configured with one vCPU and uses an average of 50% of that vCPU
over time with spikes up 80%. A reservation of 0.5 (50%) vCPU will guarantee that the
application gets its average CPU requirement and allows it to compete for CPU needed above
that. This balances resource sharing with resource requirements to optimize the process.
A VM cannot be “Powered On” in a host unless there are resources available to meet the
Reservations specified for the VM. The resources available for starting new VMs are calculated
by subtracting overhead and the sum of all Reservations from the total capacity. The default
setting is to have no Reservation. Without Reservations specified, it is possible for a host to
become over-subscribed as a result of new VM additions or migrations.
If the sum of the Reservations for all VMs has exceeded capacity, a host is considered in
violation of its Reservation. This condition will trigger a migration on a periodic Distributed
Resource Scheduling (DRS) balancing operation. Although a single triggered migration can
improve quality of service, frequent migrations can crush performance within VMware clusters.
DRS concepts are discussed in more detail, later in the paper.
On a resource constrained host, a VM should be provided at least its reserved amount of
resources if requested. The inability to do this is a key measure of an imbalance in a host and
cluster. If a Reservation is not specified, then a value of 0 is used and no resources are required
for a Power On and none are guaranteed.
In the section Impact on Capacity, we will examine how the Reservation policies can impact
capacity using three different configuration and usage scenarios.

Shares
Shares are relative units that determine a VM’s priority among sibling VMs (all VMs in the
absence of multiple resource pools) and are used to determine resource allocation under
contention.
VMware always uses a fair share allocation scheme and thus a VM gets a share of the available
resources based on its virtual configuration. Thus a single vCPU machine gets half of the CPU
resources of a two vCPU SMP machine. The same for is true for memory allocations. However,
the allocation can be modified by setting the Shares property for a virtual machine; a VM with
2000 Shares will get twice the CPU resources per vCPU as a VM with 1000 Shares. It is
important to remember that Share properties affect the per vCPU and per MB fair share
allocations.
When visiting a prospect recently, we learned of a scenario where 10 VMs were installed on a
host. Without thinking much about the consequences, the business established fair share
allocations for all 10 VMs. Later, when 1 of the 10 was experiencing heavily degraded

http://www.systar.com/products/omnivision/for_VMware 4
performance, the company was challenged to diagnose the performance issue. As a short-term
remedy, they decided to power down 5 of the 10 VMs and noticed that the one troubled VM
returned to acceptable performance levels. In this case, where the 10 VMs were once sharing
10% of the configured resources each, the remaining 5 VMs were now provided a 20% share.
Although performance returned to normal for the impacted VM, the question remained of what
to do with the 5 VMs that were powered down. Taking advantage of the point system of shares,
new policies could be set to provide the proper required capacity, or as an alternative, the VMs
could be migrated to another host in the cluster that is considered to have sufficient capacity.
When managing environments where several business applications are accessing the same
server resources, establishing shares allows VMware architects and administrators to establish
priorities according the importance of the application to the business.
Shares determine resource allocation under contention. Resources allocated to a VM based on
shares are bound by the Reservation setting on the low end and the Limit on the high end. This
concept is further illustrated in Figure 1 above.

Entitlement
Although referred to in the past as “EMIN”, Entitlement is now the preferred term. Entitlements
are the computed result of configurations, reservations, limits and shares used to establish the
resource allocation given to each VM for its operation. The Entitlement will always fall between
the Reservation and the Limit, based upon its Share.
The general measure of the capacity and health of a cluster is the ability of the cluster to deliver
the entitled resources to all VMs. Additionally, a good measure of a host’s ability to provide
expected capacity is the measure of its total entitlements vs. total capacity. Unlike Reservations,
the host will not identify a violation when entitlements exceed capacity. By understanding the
sum of all VM entitlements on a host and within a cluster, VMware architects and
administrators will have a clear picture of the resources being made available to meet demands
on their capacity.

Limit
A Limit serves as a hard cap on resource allocation for a VM. In some cases, Limits may be
defined to represent allocations of resources based upon service agreements. Limits can
provide both benefit and burden. For example, a VM might have 2GB of memory configured
but a Limit set to 1 GB for business priority reasons. In general, Limits should be used only after
careful consideration of the capacity impact.
For example, in a recent client visit, we were able to view reports of a cluster that was
configured with 32GB of memory. Looking at capacity trends over a 9-week period, we noticed
that 19 GB of the 32 GB was consistently used. With 92 VMs in the cluster, each VM was blindly
assigned a Limit on Memory of 4GB, in order to provide protection in the overly subscribed
environment. In one instance, the Limit on memory was insufficient for a key application that
required more than 4GB during a heavy processing period and resulted in excessive disk
swapping. End-user performance was degraded for over three hours each week as a result of
this Limit.
If a Limit is not specified, then the Configuration Size is the Limit.

http://www.systar.com/products/omnivision/for_VMware 5
From the resource management concepts described above, you can see that when compared to a
traditional capacity management of non-virtualized server environments, VMware capacity
management is more complex. In the non-virtualized server environment, the limits of CPU and
Memory are established by the physical representation of those resources (e.g., 4 processers = capacity
of 4 processors). In the VMware environment, we have new settings, metrics, and concepts that
determine total, used, and available capacity of a VM (e.g., 2 vCPUs may equate to 20% of a physical
CPU).

Distributed Resource Scheduling

Distributed Resource Scheduling (DRS) provides a watchful eye over VMs in clustered environments.
With an intention to provide each VM its required resources, DRS carefully tracks workload activity on
each Host within a cluster. When unsatisfactory conditions are observed for a VM within one host, DRS
assesses other host locations within the cluster where conditions may be more attractive. If DRS finds a
suitable location, it then facilitates the VM move, known as a VM migration.

DRS has three goals (all with respect to a DRS enabled cluster):

• Load balancing (cluster balance factor )

• QoS enforcement (shares, reservations, and limits)


• Policy enforcement (admin roles, power management, access control, maintenance mode, etc.)

Figure 2. DRS manages load balancing of VMs in order to


maintain expected quality of service levels. Source: VMware.

DRS uses two primary mechanisms to achieve these goals:


• Periodic Invocation
o Load Balancing

http://www.systar.com/products/omnivision/for_VMware 6
o Host evacuation (used in High Availability configurations, or in conjunction with dynamic
power management policies)
o Reservation balancing
o Affinity/Anti-affinity (rules to keep certain VMs together or apart)
• VM Initial Placement
o VM Power On
A common approach is to use the “Automatic” settings of DRS for each cluster and to set the migration
threshold to “Moderately Aggressive”. At first these settings may trigger VM migrations, but tend to
settle down after a while if workloads demands are similar for the VMs in the cluster. However, since
most clusters contain a variety of workload profiles and demand patterns, there tend to be more
migrations than optimal over time. In order to get a better understanding of how DRS impacts capacity
management, let’s take a closer look at how it works.

Using DRS, VMware Administrators can specify an initial automatic or manual placement of a VM in a
cluster. Placement has a large impact on capacity since resources are shared. DRS obeys all rules like
affinity/anti-affinity, and reservations, and then selects the lightest loaded host for the new VM.
Although it is logical for DRS to place a VM within the lightest loaded host, it may not always represent
the best choice – especially when taking into consideration whether or not certain VMs should be placed
on the same host (note: VMware warns to use affinity/anti-affinity policies on rare occasion). DRS’s
lightest load policy does not consider how well a new VM’s workload will fit with existing VM workloads.
Nor does it consider strategies like Distributed Power Management (DPM) where you may want to
minimize the load on a host so it can be evacuated and powered off at low demand periods.
On a periodic basis, DRS invocation occurs and does the following:
• DRS computes an Imbalance Metric by a formula which effectively compares the variance in all
host’s ability to deliver on their entitlements
• If the Imbalance Metric is greater than a threshold (set as the migration aggressiveness factor)
then migration from highly loaded to less loaded hosts begins.
• If any host is in violation of its Reservation policy (more reservations than capacity) or in
violation of affinity or anti-affinity rules, this causes migrations as well.
• DRS also calculates which migrations need to be made and then looks ahead to see if the
desired migration will trigger additional movements in order to fully optimize the environment.
DRS invocation allows companies to maximize the utilization and thus capacity of all resources in a
cluster (if the VMs and DRS have been configured correctly). We should also note that VMware
recommends an “Auto” setting for DRS but also recommends that Administrators override “Auto”
settings at the VM level for important application workloads.

As you can see, VMware’s DRS is designed to deliver the best capacity for a given set of hardware and
workloads and can do this job well. However, its assessment of capacity takes a short-term focus,
initiating migrations as needed. DRS does not consider the longer-term view of capacity, taking into
account historical trends or projecting future growth that may impact VM or cluster performance. In
order to reduce costs through optimization of the VMware environment, it will be critical to combine
both the short-term perspectives of DRS with the longer-term perspective and detailed analysis offered

http://www.systar.com/products/omnivision/for_VMware 7
by automated capacity management solutions. Where automated capacity management solutions help
to provide sustainable, high performance environments (especially through rapid growth), DRS is best
suited for handling occasional shifts in workload that are difficult to plan.

Impact on Capacity

Now that we have reviewed the fundamentals of VMware resource management and Distributed
Resource Scheduling, we can further examine how these concepts impact capacity management. We
will use three examples to highlight VMware configurations, their common uses and their impact on
capacity.

• Default: no Reservation, no Limit, normal Shares

• Conservative: high Reservations, Share priority matches business priority

• Optimized: Reservation matched to historical baseline/trends, Share priority matches business


priority

Default Settings

Using Default settings is the most common approach among VM configurations, especially in the early
stages of VMware adoption. The Default approach takes into account the standard settings for VM
resource management: no Reservation, no Limit, and normal Shares.

In early stages of VMware deployments, experienced Administrators with a history in physical server
environments hesitate to tweak the default VMware settings too much, as they strive to become more
comfortable with operations of the new virtualized environment.

For this example, we will assume that all VMs have been assigned single vCPU and 1GB configurations.
In the Default environment, a VM can always be added to the host and Powered On because
Reservations will never exceed capacity of the host (n x 0 = 0); technically, if you pushed this
configuration to its limits, you could stack VMs on the host until it storage filled up. Resources will be
allocated to a VM when needed, providing every VM as much resource as it needs. When workload
demand for resources begins to exceed available capacity then allocation under contention policies
begin.

As noted earlier, Shares determine resource allocation under contention. Since all Shares are the same
and the configurations are the same, VMware’s fair share allocation scheme will allocate available
resources to each VM equally. Entitled resources will be the same for each VM. With no real allocation
scheme, each new VM will decrease each existing VM’s allocation by 1/numVMs amount. For example,
if 40 VMs are located on a host, each VM will receive 1/40 of the CPU and Memory capacity available to
the host. The host is not in Reservation violation but will quickly become overcommitted on required
resources, given enough demand.

Without an ability to clearly quantify VM resource utilization trends today, Administrators and
Architects have stacked fewer VMs per host in order to avoid contention for vCPU and Memory. This

http://www.systar.com/products/omnivision/for_VMware 8
type of low density configuration helps to maintain service levels, but results in higher VMware costs
than environments that are properly balanced for capacity requirements. Capacity and flexibility have
been prioritized higher than stability.

For those organizations striving to reduce costs and optimize the performance of their VMware
environments, this approach is not sustainable.

Conservative Settings

On the opposite end of the spectrum, we can consider the case where every VM is given an overly high
Reservation setting, there is a mix of configurations and each VM is given a Share property that matches
its priority to the business (or at least to the application). This is a very conservative approach, and is
the second most common technique we have seen – especially in organizations that are still “getting
their feet wet” with VMware in production environments.

In this scenario, each VM will run quite smoothly, even above its normal load, since it is guaranteed a
very large amount of resources. The host will stop allowing new VMs via Admission Control when its
total Reservations exceed available capacity. Since the VMs are over-sized the host will be limited in how
many new VMs can be started. Although quality of service is optimized for the few VMs running in this
environment, resource sharing is not optimized for the applications running on the VMs.

Additionally, due to the bloated size of the VMs in this configuration, the load balancing mechanism of
DRS will be limited in effectiveness. As DRS searches for open space within a cluster, the high
reservation requirements of the VM will limit potential destination sites.

Although density of the data center may have increased with the Conservative configuration, Architects
will not be able to achieve the VM density and resource utilization objectives that were initially
promised by VMware. Capacity and flexibility have been traded for stability.

For those organizations striving to reduce costs and optimize capacity as the footprint of VMware usage
expands within their IT organization, this approach is also not sustainable.

Optimized Settings

In our third scenario, every VM is given an accurate Reservation setting, there is a mix of configurations,
and each VM is given a Share property that matches its priority to the business (or at least to the
application).

Taking the Optimized approach, the host will stop allowing new VMs via Admission Control when its
total Reservations exceed capacity with a new VM. Thus the host cannot be oversubscribed at the
Reservation level. However, if the VM sizing was accurate, minimal resources will be wasted.

Resources will be allocated to VMs as they are required until their Reservation is met. Beyond that, if
VMs require more resources when demand spikes up, the Host will strive to provide each VM with its
Entitled resources (they contend for these extra resources based on Share priority).

http://www.systar.com/products/omnivision/for_VMware 9
Although this approach may seem like the most logical configuration choice, it is not often pursued due
a lack of understanding about the historic capacity requirements of the applications begin placed on
VMs in the data center. For example, to set an accurate Reservation of 0.5 vCPUs, VMware Architects
must first have established a baseline for capacity requirements during normal operations as well as
during periods of peak demand. Baselines, peaks, and performance trends can be assessed easily with
automated capacity management solutions. Following the capacity assessment, VM demand for these
resources could then be balanced across the hosts and clusters in the data center.

This approach is a less common one for Architects planning enterprise wide deployments of VMware but
is essential to keeping costs under control as the environment expands. In addition to reducing costs of
hardware required to support the VMware environment, it will also assist in reducing server
administration and operational costs. For those organizations targeting enterprise wide expansion, the
Optimized approach is preferred.

Recommendations for Policy Setting

In environments where IT organizations are still “getting their feet wet” with VMware and growth is not
significant, Default or Conservative settings within DRS and its VM policies are sufficient.

In situations where IT organizations have decided to dramatically expand their VMware environments
across the enterprise, and cost considerations are a constant concern, one can see that it is very
important to set the Reservation and Shares for a VM carefully – avoiding risky default settings. For
organizations looking to reduce costs while managing expanding VMware environments, we offer the
following guidelines:

• Taking the default Reservation setting of 0 is not a good best practice. It is imperative to set
reasonably accurate Reservations for VMs for VMware resource management and DRS to work
effectively.

• Setting the Reservation for a VM too small allows good resource sharing but can trigger resource
contention as more VMs are added – even during periods of normal demand.

• Setting Reservations too high can prevent the Host from accepting a new VM when excess
capacity is often available. Setting Reservations too high also allows a VM to acquire too many
resources easily when demand spikes, thus impacting other workloads unfairly.

• Thus the goal is to size the Reservation to handle the “normal” workload with guaranteed
resources and then contend for resources to handle peaks based on its priority. For now, one
technique is to set the Reservation around the 50th percentile for historical resource utilization.
To be more aggressive, drop below that and to be more conservative go above that. This works
best for sustained workloads – we will look at peaky and other workload types in an upcoming
paper.

Summary
In this white paper, we have learned that capacity management is very different for a VMware DRS
enabled cluster. We have seen the capacity impact of the sizing concepts that VMware uses for

http://www.systar.com/products/omnivision/for_VMware 10
resource allocation like Reservations, Shares, and Limits. Finally, we have discussed what DRS can and
cannot do to manage capacity.

We can conclude this introductory white paper with some basic ideas that will help you make the best
capacity management decisions possible at this point:

• Remember that it is imperative that you set the Reservations property for your VMs. Relying on
the default settings can adversely impact a number of VMware capabilities by:

o Taking some of the effectiveness of Admissions Control away, allowing any and all VMs
to be added to a host without regard to available capacity.

o Requiring a VM to compete for resources at all levels of demand instead of being


guaranteed the resources it needs to handle “normal” demand.

o Weakening the effectiveness of DRS Load Balancing.

• Do not expect DRS to completely solve your capacity management issues. DRS is very good at
balancing load across the hosts in a cluster and enforcing policies.

• DRS works best when paired with automated capacity management analysis, reporting and
proactive actions.

• As VMware Administrators and Architects first begin to work with VMware, Default and
Conservative resource management settings and policies are acceptable because optimization is
not seen as an early goal in the technology adoption cycle.

• As Administrators and Architects aim to expand their VMware environments while keeping costs
and quality of service under control, Optimized settings and policies are required. Without
Optimized settings, cost reduction objectives will not be achievable.

The next white paper in this series we will offer some new definitions of “Effective Capacity” for a
VMware Cluster and discuss concepts for optimal management in VMware high availability (HA)
environments. We will also begin to explore the concept of workload profiling and how it can be
invaluable in setting the Reservation property and placing workloads proactively to minimize migration.

http://www.systar.com/products/omnivision/for_VMware 11
About Systar

Systar is a leading worldwide provider of performance management software. Systar’s OmniVision


product suite enables customers to achieve the optimal alignment between IT resources and business
requirements in both distributed and virtualized server environments. Systar’s proven capacity
management solutions deliver the full benefits of virtualization by enabling customers to gain visibility
into these complex environments, tune for optimal capacity and move business-critical applications into
production with full confidence.

United States London EC1N 2JT


8618 Westwood Center Dr. Tel. +44 2072 692 799
Suite 240 Fax +44 2072 429 400
Vienna, VA 22182 info-uk@systar.com
Tel. +1 703-556-8400
Fax +1 703-556-8430 Germany
info@systar.com Mergenthallerallee 79-81
D-65760 Eschborn
France Tel. +49 211 598 8520
171 bureaux de la Colline info-de@systar.com
92213 Saint-Cloud Cedex
Tel. +33 (0) 1 49 11 45 00 Spain
Fax +33 (0) 1 49 11 45 45 Centro de Negocios Eisenhower
info-fr@systar.com C/ Cañada Real de las Merinas, 17
Edificio 5 - 1º D
United Kingdom 28042 Madrid
Systar Ltd Tel. +34 91 747 88 64
Ground Floor Left Fax +34 91 747 54 35
3 Dyer’s Buildings info-es@systar.com

Systar, BusinessBridge, OmniVision, BusinessVision, ServiceVision, WideVision and Systar’s logo are
registered trademarks of Systar. VMware, ESX Server, and all other brand names, product names and
trademarks are the property of their respective owners.

http://www.systar.com/products/omnivision/for_VMware 12

You might also like