Exploring the performance benefits of VAAI
- Matt Liebowitz
- 14
- 2563
Over the long Thanksgiving weekend I decided to do some testing of one of the coolest new features in vSphere 4.1 – vStorage APIs for Array Integration. My original thought was to see if the performance benefits of using VAAI would justify more heavily using the eagerzeroedthick VMDK format because of the faster deployment times. I’ll get to the results of that testing in a second, but first some background.
VAAI is a technology that allows the ESX/ESXi host to offload certain storage functions directly to the storage array rather than processing the data itself. A typical operation such as deploying a VM from template requires the ESX/ESXi host to read the data from the template via whatever storage protocol is in use (fiber, iSCSI, etc) and then write that data to the storage when cloning the VM. That isn’t the most efficient use of resources, and it is compounded when cloning multiple VMs at once as those read/write operations become redundant.
By leveraging VAAI, those operations are offloaded to the storage array and so it eliminates much of those redundant reads/writes. As a result these operations complete much faster and with reduced CPU overhead to manage the process. In order to use VAAI you’ll need both vSphere Enterprise as well as a storage array that supports it. Although the number of supported arrays is small that number will most certainly grow.
For my testing I used a Dell EqualLogic PS5000E running the 5.0.2 firmware which fully supports VAAI. My original thought was to see how much quicker deploying eagerzeroedthick VMDKs was with VAAI compared to without VAAI. Using eagerzeroedthick disks helps with performance of the VM by zeroing out all of the blocks in advance instead of when they are first accessed. This format is required for VMware Fault Tolerance and is recommended for high I/O servers such as Exchange and SQL.
To the results:
Time to create a 50GB eagerzeroedthick VMDK without VAAI: 10 minutes generating approximately 750 write IOPS on the array
Time to create a 50GB eagerzeroedthick VMDK with VAAI: 1 minute 30 seconds, could not measure IOPS (more on that later)
Clearly there is a significant difference in creating the blank eagerzeroedthick VMDK. How about when Windows 2008 R2 is installed on that VMDK and then converted to a template? How fast can we deploy that template?
Deploying 50GB eagerzeroedthick template without VAAI: 19 minutes generating between 1,200-1,600 IOPS (half read/write, which makes sense since it has to read from and write to the same array)
Deploying 50GB eagerzeroedthick template with VAAI: 6 minutes (again, couldn’t measure IOPS)
Again we see a huge difference in the time to deploy this VM from template. Six minutes is longer than it takes to deploy a VM that isn’t using eagerzeroedthick but in my mind it’s worth it for the performance benefits of using the eagerzeroedthick format. Using this format prevents you from taking advantage of thin provisioning in vSphere which may be a trade off for some, but most storage arrays also offer this feature which can help offset that.
One thing I noticed as a part of this testing is that when either creating a VM or deploying a VM using VAAI, I was unable to actually monitor the IOPS from the array. Dell’s monitoring software for EqualLogic, SAN HeadQuarters, shows no measurable IOPS information when VAAI is in use. Whether this is a bug in the software or actually representative of the IOPS required during deployment (which I doubt) it makes it tougher to monitor actual storage utilization.
Is six minutes too long to wait to deploy a virtual machine using a disk format that offers great performance? Not for me. I’m expecting to see more regular use of the eagerzeroedthick format now that VAAI is here.
14 thoughts on “Exploring the performance benefits of VAAI”
Leave a Reply Cancel reply
This site uses Akismet to reduce spam. Learn how your comment data is processed.
Great post!
Ran the same test on a single ISE without VAAI. Took 2 mins 5 seconds for a 50 GB eagerzerothick VMDK. 3400 IOPS (rounded average over the 125 seconds).
Rob – thanks for the comment.
I wanted to point out that this wasn’t meant to be a vendor comparison or talk about whether one particular storage vendor is better than the other. It was simply meant to show the benefit of using VAAI on the array that I have (EqualLogic) that supports VAAI. I was less interested in the actual numbers and was more interested in the overall improvement once VAAI was introduced.
Thanks for sharing your experience! Your post is very clear and to the point.
Cheers
Mallik
Using eagerzerothick disks has a negligable performance increase over thin disks: http://www.vmware.com/pdf/vsp_4_thinprov_perf.pdf
Not that there aren’t other considerations with Thin disks (space montioring) and not that VAAI isn’t cool, but I’m not sure that VAAI makes using eagerzerothick disks significantly more desireable.
That depends on your use case. For database applications, we can measure 10x throughput improvement for insert/update activity using Eager Zeroed Thick vs. Thin (and yes, we have VAAI enabled). Most folks would consider that significant. and reducing Oracle ETL times by 75% or more was a biggie for us…
Good stuff Matt, it definitely shows the value of VAAI for the Virtualization Community. Added your article to mine as a reference http://www.yellow-bricks.com/2010/11/23/vstorage-apis-for-array-integration-aka-vaai
@Rob: That is insane, however cannot really compare it to the EQL. Still, great numbers.
Duncan – thanks for including this post on yours!
Jeff – I agree that we’re not talking about an enormous performance benefit with using eagerzeroedthick disks, and that there aren’t things to consider when using them. My thought was that when designing for performance all of the little things that make a small difference can add up to real differences. Things like using eagerzeroedthick disks for write intensive applications, ensuring large pages are in use, ensuring NUMA locality, etc, can all add up to making the virtual machine perform better.