HA Failover Capacity

We get an enormous amount of questions about VMware’s HA (High Availability), especially when users see a message stating there are Insufficient resources to satisfy HA failover. We have already discussed the mechanism that HA uses to provide high availability here. Now we need to understand capacity calculations. In current versions of ESX (3.02) and earlier the following calculation applies for failover capacity.

HA Failover Capacity

Failover Capacity is determined using a slot size value that is calculated on the cluster. Slots are calculated by a combination of the total CPU and Memory that are in the physical hosts. The calculation for failover capacity works as follows:

Let’s say you have 4 ESX servers in your VMware HA cluster and Configured Failover capacity on the cluster is set to 1.

Physical memory in the hosts is as follows:

ESX1 = 16 GB
ESX2 = 24 GB
ESX3 = 32 GB
ESX4 = 32 GB

In the cluster, you have 24 VM’s each configured and running. Of the 24 VM’s running, determine the VM which has the highest “configured memory”. For this example let’s say this is 2GB. All other VMs are configured with less or equal to 2GB.

With this information we can now do the calculation:
1. Pick the ESX host which has the least amount of RAM. In this case, it is ESX1 and the minimum amount of RAM is = 16 GB

2. Divide the value found in step 1 with value for the maximum RAM in a VM. In my example, this gives us 8 (16 divided by 2). This means we have 8 slots available per ESX host in the cluster.

3. Since we have 4 hosts and the configured failover capacity for the cluster is 1, we are left with 3 hosts in a failure situation. Hence the total number of VMs that can be powered on these 3 servers is 24 VMs. (i.e. 8 multiplied by 3 = 24)

4. If the total number of VMs in the cluster exceeds 24 then it will give us “Insufficient resources to satisfy HA failover” and the “current failover capacity will be shown as 0”. If the number is less than 24, we should not get this message.

Note: If you are still seeing the message and you have fewer VM’s running than the calculation allows for, check both the CPU and Memory reservations on both VM’s and resource pools, as this can skew the calculation. You should avoid unnecessary memory or CPU reservations on VM’s as this can cause these types of errors to occur because we have to ensure that the resource is available.

There are multiple ways to fix or get around this calculation. The most common are as follows:

  • Set the “Allow Virtual Machines to be powered on even if they violate availability constraints” in the configuration of the cluster. In this case, it ignores the above calculation and will try to power on as many VM’s as possible in case of HA failover. If this is the option chosen you can also set restart priority in the ‘Virtual Machine Options’ section of the cluster configuration. This way any high priority VM’s are powered on first, and then the lower priority up to the point where we cannot power any further VM’s on
  • If you have one VM which is configured with a very high amount of memory, you can either lower its configured memory or take it out of the cluster and run it on any other standalone ESX host. This will increase the number of slots available with the current hardware
  • Increase the amount of RAM on servers so that there are more slots available with the current RAM reservations.
  • Remove any CPU reservations on any VM(s) that are greater than the max speed of the processors in the hosts. For example, if the CPU Usage on the summary tab of your ESX Server shows as follows:

Then you will see the error message popup if you have a CPU reservation greater than 2793MHz on a VM.


Note: The above calculation method is very limited and is going to be revised in future releases of VirtualCenter to improve calculations for HA failover.

5 thoughts on “HA Failover Capacity”

  1. I have read your article on HA memory requirement and I didn’t think this would be the case because they allow for memory to be over-committed and have page sharing so why limit the ability to power on if you violate? So in ESX 3.5 has this changed?

    Thank you

  2. Remember, HA is written by Legato, VMware’s sibling under the EMC umbrella.
    It does not follow the same rules as the rest of the product.

    I’ll say this info is most likely outdated now as this seems to change in some regard with every release. Talk to VMware for the latest info.

  3. FYI, there is an excellent software out there that eliminates the HA failover called Melio FS. I believe it is made by a company called Sanbolic.

  4. Very interesting article and it seems that there are not too many other places that have this information. I take it that the number arrived at by this calculation is in addition to the already powered on guests on the remaining hosts, (spread out over the 3 remaining hosts)?

    Regards
    Dale

Comments are closed.