Exploring the performance benefits of VAAI
- Matt Liebowitz
Over the long Thanksgiving weekend I decided to do some testing of one of the coolest new features in vSphere 4.1 – vStorage APIs for Array Integration. My original thought was to see if the performance benefits of using VAAI would justify more heavily using the eagerzeroedthick VMDK format because of the faster deployment times. I’ll get to the results of that testing in a second, but first some background.
VAAI is a technology that allows the ESX/ESXi host to offload certain storage functions directly to the storage array rather than processing the data itself. A typical operation such as deploying a VM from template requires the ESX/ESXi host to read the data from the template via whatever storage protocol is in use (fiber, iSCSI, etc) and then write that data to the storage when cloning the VM. That isn’t the most efficient use of resources, and it is compounded when cloning multiple VMs at once as those read/write operations become redundant.
By leveraging VAAI, those operations are offloaded to the storage array and so it eliminates much of those redundant reads/writes. As a result these operations complete much faster and with reduced CPU overhead to manage the process. In order to use VAAI you’ll need both vSphere Enterprise as well as a storage array that supports it. Although the number of supported arrays is small that number will most certainly grow.
For my testing I used a Dell EqualLogic PS5000E running the 5.0.2 firmware which fully supports VAAI. My original thought was to see how much quicker deploying eagerzeroedthick VMDKs was with VAAI compared to without VAAI. Using eagerzeroedthick disks helps with performance of the VM by zeroing out all of the blocks in advance instead of when they are first accessed. This format is required for VMware Fault Tolerance and is recommended for high I/O servers such as Exchange and SQL.
To the results:
Time to create a 50GB eagerzeroedthick VMDK without VAAI: 10 minutes generating approximately 750 write IOPS on the array
Time to create a 50GB eagerzeroedthick VMDK with VAAI: 1 minute 30 seconds, could not measure IOPS (more on that later)
Clearly there is a significant difference in creating the blank eagerzeroedthick VMDK. How about when Windows 2008 R2 is installed on that VMDK and then converted to a template? How fast can we deploy that template?
Deploying 50GB eagerzeroedthick template without VAAI: 19 minutes generating between 1,200-1,600 IOPS (half read/write, which makes sense since it has to read from and write to the same array)
Deploying 50GB eagerzeroedthick template with VAAI: 6 minutes (again, couldn’t measure IOPS)
Again we see a huge difference in the time to deploy this VM from template. Six minutes is longer than it takes to deploy a VM that isn’t using eagerzeroedthick but in my mind it’s worth it for the performance benefits of using the eagerzeroedthick format. Using this format prevents you from taking advantage of thin provisioning in vSphere which may be a trade off for some, but most storage arrays also offer this feature which can help offset that.
One thing I noticed as a part of this testing is that when either creating a VM or deploying a VM using VAAI, I was unable to actually monitor the IOPS from the array. Dell’s monitoring software for EqualLogic, SAN HeadQuarters, shows no measurable IOPS information when VAAI is in use. Whether this is a bug in the software or actually representative of the IOPS required during deployment (which I doubt) it makes it tougher to monitor actual storage utilization.
Is six minutes too long to wait to deploy a virtual machine using a disk format that offers great performance? Not for me. I’m expecting to see more regular use of the eagerzeroedthick format now that VAAI is here.