VMotion fails at 10 percent

A common problem I see is when VMotion fails at the 10% mark with a timeout. You might see a dialog popup which says-

Operation timed out
Tasks: A general system error occurred:
Failed waiting for data. Error 16. Invalid argument

The VMware Knowledgebase writers have assembled a great list of possible reasons for this problem and I keep a printed out copy on my cubicle wall for this (ie: I didn’t write this)

Each step provides instructions or a link to a document with further details on how to eliminate possible causes and take corrective action to resolve the timeout. The steps are ordered in the most appropriate sequence to isolate the issue and identify the proper resolution.

  1. Verify that restarting the VMware Management Agents do not resolve the issue. For more information, see Restarting the Management agents on an ESX Server (1003490).
  2. Verify that VMkernel networking configuration is valid. For more information, see Unable to set VMkernel gateway as there are no VMkernel interfaces on the same network (1002662).
  3. Verify that VMkernel network connectivity exists using vmkping. For more information, see Testing vmkernel network connectivity with the vmkping command (1003728).
  4. Verify that Console OS network connectivity exists. For more information, see Testing network connectivity with the Ping command (1003486).
  5. Verify that Name Resolution is valid on ESX. For more information, see Identifying issues with and setting up name resolution on ESX Server (1003735).
  6. Verify that time is synchronized across the environment. For more information, see Verifying time synchronization across environment (1003736).
  7. Verify that valid limits are set for the VM being VMmotioned. For more information, see VMware VMotion fails if target host does not meet reservation requirements (1003791).
  8. Verify that hostd is not spiking the console. For more information, see Checking for resource starvation of the ESX Server service console (1003496).
  9. Verify that VM is not configured to use a device that is not valid on the target host. For more information, see Troubleshooting migration compatibility error: Device is a connected device with a remote backing (1003780).

***UPDATE (2/12/2010) There is now a great video on YouTube on how to fix this problem below.

Note: If VMotion continues to fail at 10 percent after trying the steps in this article, open a case with tech support

10 thoughts on “VMotion fails at 10 percent”

  1. Hi,

    Our ESX Servers (3.5U1) do have some major vmotion issues. can you send me valid links to the documents, the published links are broken.

    thanks,
    Irene

  2. Hi,

    this can occur also when the “receiving” ESX server has I/O errors.
    A rescan of the storage can help in this case.

    Regards,

    Bart

  3. Another thing to try would be reconfiguring (flipping) your Vmotion NIC settings between 1000/Full and Auto-Negotiate (or whatever fits your network’s speed). This resolved our ‘VMotion failing at 10%’ issue. I believe we have a port auto-sensing problem with our switches.

  4. I’ve just resolved an issue which was causing one of my VMs to fail a VMotion at 10% too. Other VMs were fine, it was just this one. Suggested additional entry :

    11) Check VMs vmware.log for error messages.

    Somehow the .vmx file workingDir= entry had managed to drop the last character, the log file reported “This virtual machine cannot be powered on because its working directory is not valid”. VC only reported a Timeout error, so it was only the vmware.log file which showed the actual error.

  5. We hat the 10% issue on a customers cluster, too. Problem: another network device (HP ILO) had the same IP as the kernel interface of one of the cluster nodes. The VMs did not report that in any log. After changing the kernel IP to an unused address the 10% issue was resolved immediately.

  6. One more possibility that we recently ran into. All the guests on a host would vmotion except one. It would get to 10% and then fail. Tried all the suggestions above but it still wouldn’t budge. Finally found the issue – for some reason HA had gone beserk a few weeks earlier and caused the host to create nearly 5000 log files in the VM’s folder on the SAN. After deleting all but the current log file the guest vmotioned without any issues.

  7. Hi,

    Just in case it could be helpful to anyone else : the VMotion will time out when you modify the datastorage of your cluster (meaning, for exemple, removing one VMFS and replace it with three new one).

    You’ll need then to stop the VM and do the “migrate” action. you’ll then be prompt to let ESX knows where to store the disk (option “keep in the same location is then greyed out but with the advanced settings you’ll fond back the same action).

    Cheers.

  8. Awesome! I am having storage vMotion fail at 10% with a timeout. I have suspected some of these as the cause, it is very nice to have this list as a checklist. This should keep me busy for a day or two.

Comments are closed.