This Saturday, Microsoft published a new white paper entitled “Best Practices for Virtualizing Exchange Server 2010 with Windows Server 2008 R2 Hyper V” that provides a lot of great info that is applicable to VMware as well. One of the most important things in this entire document is a change in policy regarding supporting virtualized Exchange 2010 with Database Availability Groups (DAG) in combination with hypervisor high availability and live migration. Previously Microsoft did not support the use of high availability or live migration even on its own Hyper-V platform. In the VMware world this of course means HA and vMotion. The whitepaper states the following:
Exchange server virtual machines, including Exchange Mailbox virtual machines that are part of a Database Availability Group (DAG), can be combined with host-based failover clustering and migration technology as long as the virtual machines are configured such that they will not save and restore state on disk when moved or taken offline. All failover activity must result in a cold start when the virtual machine is activated on the target node. All planned migration must either result in shut down and a cold start or an online migration that utilizes a technology such as Hyper-V live migration
This new document, as well as a post on the MS Exchange Team blog, confirms the new support stance. The Technet page has been updated as well. Note that you must be running Exchange 2010 SP1 in order to support these features.
Folks may know that I’m a big proponent of virtualizing mission critical, tier-1 applications like Exchange 2010. I’ve written about it here, touched on it here, and commented to TechTarget on the subject here and most recently here. It’s clearly an important subject to me and I applaud Microsoft for introducing this change. I think this will help to convince organizations that it is safe to virtualize Exchange 2010 on all hypervisors.
But..there must always be a but..
Remember that just because Microsoft now officially supports something doesn’t actually change anything in terms of functionality. Did VMware HA and vMotion work properly in combination with Exchange 2010 and DAGs before this policy change? VMware HA – sure, it just wasn’t officially supported. VMware vMotion – umm, hang on a minute there.
The DAG ultimately relies on Windows Failover Clustering to work, and WFC is notoriously finicky about even brief drops in network connectivity and loss of heartbeat. When performing a live migration using vMotion there is usually at least one ping dropped, and in my experience that single drop is often enough to cause databases to failover to other nodes in the DAG.
Does this mean that even though Microsoft supports vMotion now that you still can’t use it? Of course not, but it does require a slight change in your design to increase the cluster heartbeat timeout value to allow for the brief network interruption.
The values that need to change are the following:
SameSubnetDelay: The value (in milliseconds) of the cluster heartbeat frequency. By default, this value is 1,000 milliseconds.
SameSubnetThreshold: The value represents the amount of missed heartbeats that will be tolerated before a failover event occurs. By default this value is 5, so combined with the above value that means 5 seconds of lost heartbeats will result in a cluster failover by default.
Five seconds seems like enough time for a vMotion to complete, but in practice I’ve seen databases failover at multiple clients when using the default heartbeat values. Luckily you can change these values very easily by using PowerShell. The following commands show how to raise the timeout to 10 seconds (the Microsoft recommended max) from the default of 5, taken directly from the Microsoft whitepaper:
Depending on your environment you may not need to make this change, so always test first before implementing any cluster wide change like this. Make sure you have enough bandwidth on your hosts to account for migrating an Exchange VM that may have 32GB of RAM or more. And of course always stick with configurations that are supported by Microsoft.
I’m happy that Microsoft made this change, and hope that it signals a trend towards more virtualization friendly licensing in the future.