VMotion in 3.5 DRS enabled Cluster causes Guest CPU to rise Dramatically

Ok, here is the KB article discussing the problem where the CPU usage of a virtual machine might increase significantly after VMotion migrates the virtual machine in a cluster with DRS enabled. As a result, the performance of the virtual machine might be degraded.

“Starting with ESX Server 3.5 and VirtualCenter 2.5, DRS applies a cap to the memory overhead of virtual machines to control the growth rate of this memory. This cap is reset to a virtual machine-specific computed value after VMotion migrates the virtual machine. Afterwards, if the virtual machine monitor indicates that the virtual machine requires more overhead memory, DRS raises this cap at a controlled rate (1MB per minute, by default) to grant the required memory until the virtual machine overhead memory reaches a steady-state and as long as there are sufficient resources.

For VirtualCenter 2.5, this cap is not increased to satisfy the virtual machine’s steady-state demand as expected. Thus, the virtual machine operates with an overhead memory that is less than its desired size, which in turn may lead to higher observed virtual machine CPU usage and lower virtual machine performance in a DRS-enabled cluster.”

If you feel you are running into this issue read https://kb.vmware.com/kb/1003638

I read over this knowledge base article and tried these settings myself.  Interestingly, every time I set the Mem.VMOverheadGrowthLimit parameter to 5 and closed the window, the setting is getting set back to 0 on me.  I even see it getting set back in the vmkernel logs.

I can only assume virtualcenter is doing this. I think a much better approach is the second method given in the knowledgebase:

To fix multiple ESX Server hosts

If this parameter needs to be changed on several hosts (or if the workaround fails for the individual host) use the following procedure to implement the workaround instead of changing every server individually:

   1. Log on to the VirtualCenter Server Console as an administrator.

   2. Make a backup copy of the vpxd.cfg file (typically it is located in C:\Documents and Settings\All Users\Application Data\VMware\VMware VirtualCenter\vpxd.cfg).

   3. In the vpxd.cfg file, add the following configuration after the <vpxd> tag:

      <cluster>

              <VMOverheadGrowthLimit>5</VMOverheadGrowthLimit>

      </cluster>

      This configuration provides an initial growth margin in MB-to-virtual machine overhead memory. You can increase this amount to larger values if doing so further improves virtual machine performance.

   4. Restart the VMware VirtualCenter Server Service.

      Note: You will need to restart the VMware VirtualCenter Server Service, after which the new value for the overhead limit should be pushed down to all the clusters in VirtualCenter.

2 thoughts on “VMotion in 3.5 DRS enabled Cluster causes Guest CPU to rise Dramatically”

  1. This is an Update 2 problem where the VM Overhead Growth Limit was set to zero (0) instead of minus 1 (-1). Set back to minus 1 (-1) in Update 3. With Update 2 and the fix, if you watch Syslog carefully, you see the value get set to zero, and after a while, the VC Setting comes along and it gets set to 5.

    I don’t believe that VMware KB. In everything else, minus 1 (-1), means as-needed, unlimited, or the maximum value, so why should this be different? 1MB per minute seems ridiculously constrained and I don’t think the Developers are trying to choke the VM?

Comments are closed.