Watch Your Guest Maintenance When Using VM Monitoring in HA, or “Why does my VM keep rebooting?!”
- Matt Liebowitz
I’m taking the vSphere Design Workshop class this week, and during class one of my classmates mentioned a scenario he encountered that I hadn’t heard of. The situation involves using VM Monitoring as part of VMware HA and performing guest maintenance.
VM Monitoring is a feature of HA that will automatically restart virtual machines when the VMware Tools heartbeat is lost for a certain period of time. Normally the heartbeat isn’t lost unless something bad has happened to the guest, like a blue screen or other crash. In that case HA will restart the VM in an attempt to get it back online.
The scenario that was brought up in class involved this feature combined with guest maintenance. Suppose you need to reboot your VM and keep it running but not booted into the OS for an extended period of time. Things like booting from an ISO to use imaging software, booting from a GParted ISO to resize a partition, or even going into a VM’s BIOS are valid examples of when you might do this.
As you might expect, when the VM is booted from the ISO there are no VMware Tools heartbeats and so HA detects this as a failure and restarts the VM. I confirmed this behavior in my lab with a test VM by booting with a Windows 2008 R2 ISO and letting it sit at the install screen. Sure enough after 30 seconds or so the VM rebooted and I saw the following event for this VM (the screenshot is a nice touch):
This virtual machine reset by HA. Reason: VMware Tools heartbeat failure. A screenshot is saved at /vmfs/volumes/493973c5-a2392745-74e6-001d0-97282e4/TestVM/TestVM-screenshot-.png
Seems like an obvious thing but it might take some people by surprise. At best it’s just an annoying distraction or at worst the reboot interrupts some kind of system repair that causes damage to the guest. There are configurable values to determine how long HA will allow a VM to stay up without receiving heartbeats as well as how many times HA will restart the VM when it has no heartbeats. So this might not always affect everyone but could still be an issue if you’re not careful.
The moral of the story – if you’re using VM Monitoring with HA, remember to temporarily disable it if you need to do guest maintenance in which the VM will be down for an extended period of time.
5 thoughts on “Watch Your Guest Maintenance When Using VM Monitoring in HA, or “Why does my VM keep rebooting?!””
Thanks for this insight. I like the screenshot feature, but not sure we would want to use HA monitoring since we have other monitors in place such as SolarWinds Orion…
Hey Paul, thanks for the comment. I agree and haven’t seen anyone utilize this feature much in production. I think the logic in using the feature is that by the time you receive the alert from SolarWinds the VM would more than likely already be restarted by HA. Sounds great, but haven’t seen many folks actually use it.
Like the blog, love the title! Another option would be to change the host monitoring to Leave powered on for your VMs. Obviously, that carries its own caveats but it’s given me something to think about. Thanks for posting.
Thanks for the comment and recognizing what the name of the blog actually means!
I’m not sure the setting “leave powered on” would change anything. What you’re talking about is the host isolation response, which is how HA handles things when the host becomes isolated from other hosts. This article talks about HA at the VM level so that setting likely wouldn’t have any effect.
I haven’t seen many folks use this feature but as it matures I’m sure it will become more popular. Do you see folks using this feature that much?
Thanks again for reading and the comment!