cdrom_pc_intr The drive appears confused

Had an interesting case today where the customer reported the following error message in /var/log/messages every ten seconds on an ESX server:

Dec  5 09:42:36 esx3 kernel: hda: lost interrupt
Dec  5 09:42:36 esx3 kernel: hda: cdrom_pc_intr: The drive appears confused (ireason = 0x 1)
Dec  5 09:42:46 esx3 kernel: hda: lost interrupt
Dec  5 09:42:46 esx3 kernel: hda: cdrom_pc_intr: The drive appears confused (ireason = 0x 1)
Dec  5 09:42:56 esx3 kernel: hda: lost interrupt
Dec  5 09:42:56 esx3 kernel: hda: cdrom_pc_intr: The drive appears confused (ireason = 0x 1)

The weird thing is, we checked all of the VMs and none of them were configured to use any cdrom devices!

I saw something like this once after a vmware tools install. Currently, the system ejects the CD at the end of the tools installation without checking the locks held by the guest operating system. After this occurs, the guest operating system might not recognize the state of the drive.

If the guest operating system doesn’t recognize the CD-ROM state, either by showing old CD-ROM contents or giving I/O errors, you can resolve the issue by rebooting the guest operating system after the tools installation. The trouble is, this VM had some ’issues’ from the get-go so the customer had tried to delete it.

The clue came when he tried to delete the files from the VM.

rm: cannot remove : Device or resource busy

This sounded like remnants of that VM were still around. If you login as root and do this:

# esxtop

This will display a list of running VMs (and other stuff). Now,

# ps -ef | grep -ir <name of VM>

where <name of VM> is your defunct VM. We got something like this:

root1209710 Dec04 ?00:00:02 /usr/lib/vmware/bin/vmkload_app /usr/lib/vmware/bin/vmware-vmx -ssched.group=host/user -@ pipe=/tmp/vmhsdaemon-0/vmx7b99e7aa583e4b1b;vm=7b99e7aa583e4b1b /vmfs/volumes/45b5eb1a-808343db-ecab-00114335854b/dfsd/dfsd.vmx

root28974 287700 13:12 pts/400:00:00 grep -ir dfsd

What we needed to do was kill this vm. Do this by executing:

# kill -9 12097<– substitute the PID from your output.

***** These steps are not for the faint of heart or linux newbie. If in doubt, call tech support!

Sure enough there was a PID that could not be killed for that VM. A sure sign something went very wrong with that VM.

A reboot of the ESX hosts is the only recourse once you have a process that can’t be killed.

The drive appears confused…. indeed!