This week VMware released a blog post and several knowledgebase articles about Transparent Page Sharing (TPS). In the next patch for ESXi, TPS will be disabled by default between virtual machines. TPS is the technology that allows virtual machines running on the same ESXi host to share identical memory pages, allowing ESXi to overcommit memory. This technology has been part of ESX/ESXi for years and is one of the (many) differentiating factors between it and other hypervisors.
VMware made this decision due to independent research that showed that in a highly controlled environment the contents of those shared pages could be read. Though VMware admits it’s highly unlikely that this would be possible in a production environment, being security conscious they made the decision to disable the feature by default. It can be re-enabled if your environment could still benefit (see KB articles above for instructions).
There have been a number of blog posts on this topic already and I don’t want to add to the noise, but I did want to add my perspective. I’ve been working with ESX/ESXi since 2002 so I’ve seen how TPS has been used over the years and can hopefully add some perspective.
Will This Affect Me?
That’s likely the key question you’re asking yourself. One of the things I’ve seen over the years is that many folks don’t really understand how TPS works with modern processors. Using modern processors, ESXi uses large memory pages (2MB) to back all guest pages (even if the guest OS itself doesn’t support large pages) in order to improve performance. ESXi doesn’t attempt to share large pages, so in the vast majority of cases TPS is actually not used. Only when an ESXi host comes under extreme memory pressure does it break large pages into small pages and begin to share them.
I’ll state that again to be clear – unless you consistently over-commit memory on your ESXi hosts, TPS is not actually in use. So does this really affect you? Chances are it probably doesn’t unless you rely heavily on overcommitting memory.
What About Zero Pages?
There’s one area where TPS kicks in regardless of whether the ESXi host is over-committed on memory or not: zero pages. When a VM powers on, most modern operating systems will zero out all memory pages of the allocated memory (meaning write zeros to all memory pages). On ESXi, TPS kicks in and shares those pages immediately right after the VM powers on. It doesn’t matter if memory is over-committed, and it doesn’t wait for the normal 60 minute cycle where TPS scans for identical pages. TPS kicks in and shares the zero pages immediately.
As an example, see this screenshot from esxtop and pay close attention to the ZERO and SHRD columns. I just powered on NJMGMT01 and you can see in the highlighted columns that 3.2GB of its allocated 4.0GB are currently being shared (see the SHRD column). Of those pages, 100% of them are zero pages (see the ZERO column).
If that host was already overcommited on memory, sharing the zero pages and having them slowly get utilized by the OS could give time for ballooning to kick in and reclaim memory from other guests. Without TPS in this scenario, it is likely that ESXi would be forced to swap/compress memory before ballooning can kick in to reclaim unused memory from other guests.
Does this mean there’s a real risk to disabling TPS? Below I go through a number of scenarios where it may or may not make sense to re-enable TPS, but I don’t think zero pages (explicitly) is one of them. If you already make use of memory over-commitment then you’ll want to re-enable it either way. If not, as long as you properly plan your environment then sharing zero pages shouldn’t matter except possibly in the event of an ESXi host failure.
Should I re-enable TPS?
The next question you might be asking yourself is: should I re-enable TPS once it’s disabled by default in the next update? As always it comes down to your individual requirements so there is no “one size fits all” answer. Let’s look at some scenarios to better illustrate the situation.
I’ve worked with a lot of customers over to years to help virtualize business critical applications, and I’ve seen one thing remain consistent: for business critical applications (and most production workloads in general), customers do not overcommit memory. For the most critical workloads, customers will use memory reservations to ensure those workloads always have access to the memory they need.
These days, it’s not uncommon to see ESXi hosts with 256GB to 1TB of RAM so overcommitting memory is an unlikely scenario. I’d say in most cases it’s unnecessary to re-enable TPS for production workloads.
If there’s a scenario where it seems like TPS was created just to solve a specific problem, it would be VDI. Consider a typical VDI environment using non-persistent desktops – there could be 50-100 identical virtual desktops running on each ESXi host. The potential memory savings in this scenario is huge, especially since having high consolidation ratios can help drive down the cost of VDI.
For VDI, I think it makes a lot of sense to re-enable TPS to take advantage of the huge memory savings potential.
In development environments, you’ll often want to increase consolidation ratios since performance is not typically the most important factor. Development environments also have the potential to have many similar workloads (web servers, database servers, etc.) as new versions of applications are tested.
For dev/test environments, I think it makes sense to re-enable TPS.
I think this one is easy – most of us don’t have large budgets to support our home labs so memory is tight. In my lab I frequently overcommit memory, so I will be re-enabling TPS after the update.
I only covered a small portion of the potential situations that could be out there. You’ll need to make a decision based on your own individual requirements. Think about what’s most important – maximum performance, consolidation ratios, security, etc.?
Remember – due to the way TPS works with large pages, in most scenarios you’re probably not even using it today. In the event that you need to overcommit memory – such as during an ESXi host failure or DR scenario – ESXi has other methods of memory reclamation such as ballooning and swapping. Swapping is generally the last resort and results in significant performance reduction, but you can offset that a bit by using an SSD and leveraging the swap to host cache feature of ESXi.
I admit I used a lot of words to basically say, “Don’t worry about it – this probably doesn’t affect you anyway” but I wanted to lay out the common scenarios I see at customers. I believe VMware made the right move here disabling TPS out of an abundance of caution even if the scenario to exploit this is unlikely to occur in production.
Are you re-enabling TPS? Feel free to leave a comment on why or why not.
As I type this, I see I haven't written a blog post in over 5 months. I wish I had a good excuse for that but I don't. The truth is it was really a number of different things:
1) Writing 3 books in one and a half years (from Nov. 2012 to April 2014) consumed a huge amount of time and made it hard to find time to blog. Once the books were done I was a bit burned out and needed some time off.
2) I took a new role at EMC at the beginning of 2014 which has taken a lot of my time. It's an exciting new role where I get to help EMC's Professional Services deliver the best virtualization services to our customers. That new role has left me very little time to blog.
3) I have young kids which happily take a lot of my free time.
Another aspect that has changed is that I've changed my virtualization focus as a result of this new role. Instead of focusing mostly on virtualizing business critical applications, my focus is now more on hybrid cloud and the technologies that accompany it. I've been getting up to speed on vCloud Automation Center, vCenter Orchestrator, and a little bit of Application Director. In this new role I cover all of virtualization, but I've been focusing on trying to learn these cloud/automation tools.
As a result I haven't had a lot to blog about. I may start blogging on my experience in learning these applications. I suspect my "pivot" towards cloud automation will be somewhat common in our industry, so my journey may be relevant to others. I'll do my best, but I recognize I'll never be a prolific blogger that makes 6 or 7 posts per month.
To those that are still following this blog, thanks! I still love blogging and even writing this post has been fun. Hopefully I can get around to producing new content soon.
I’m very happy to announce that my new book called VMware vSphere Performance, along with co-authors Rynardt Spies and Christopher Kusek, has been released! It’s available right now on Kindle format and the hard copy will be available on May 12th!
The book is focused on providing guidance for how to design vSphere to get the best performance out of your virtual machines. I wanted the book to be more than a paperback version of vSphere’s own performance best practices paper – we’ve also included tips, tricks, and stories of what we’ve seen in the field with real customers. We also include some tips for troubleshooting performance problems that you may encounter. We hope you’ll find it valuable across current and future versions of vSphere.
I first became involved with this book back in late 2011 as the book’s technical editor. There were numerous delays and issues that aren’t worth getting into, but there was very little progress. I had been emailing back and forth with one of the authors of the book, Jonathon Fitch, trying to help where I could and give guidance whenever possible. It was during those email exchanges that he told me he was fighting cancer.
Fast forward to early 2013, and I’m in the middle of writing Virtualizing Microsoft Business Critical Applications on VMware vSphere. I had been emailing with Jonathon to see if he was making any progress when he told me that his disease had progressed to a point where he didn’t think he’d be able to finish the book. He tried to get into some experimental treatment program but unfortunately he didn’t quality, and his doctors told him he had just a few weeks left. The news hit me pretty hard even though I didn’t know him well.
I told Jonathon I would take over his chapters, making sure as much of his content remained in the book as possible, and that his name would remain on the cover. (Sybex originally agreed to that but changed their minds and his name is not printed on the cover – I'm not happy about it but I understand their decision) After not hearing from him for a little while and thinking the worst, I got the bad news that on April 1, 2013, Jonathon had died. His wife told me that he was very proud of being a part of the book and it was important to him.
I didn’t know Jonathon before becoming involved in this book and mostly communicated with him through email. He wasn’t active on Twitter or other social media so he wasn’t well known. Even still, he was a member of our virtualization community and it’s sad to lose one of our own. I tried to keep as much of his original content in the book so that his hard work would survive. We dedicate this book to Jonathon and I sincerely hope that his family is proud of what he was able to accomplish while fighting his disease.
It was a long road to finishing this book – I was still writing the VBCA book and was also involved in Mastering VMware vSphere 5.5, so the idea of taking on another book was not high on my list but I made a commitment to Jonathon so I stuck with it. It wasn’t without it’s challenges to say the least. I wanted to keep as much of his content in the book as possible, but his chapters were still largely based on the content he’d written in 2011, some of it going as far back as vSphere 4.1. What do you do when you don’t fully understand what he meant in a sentence or paragraph and you can’t ask him to clarify? Or how do you keep the original author’s words intact when he’s writing based on a version of vSphere that’s 3 revisions older than the current version?
Much of the time I felt like Dante from Clerks when he would say, “I’m not even supposed to be here today!” If you’re familiar with Clerks then you’ll know what I mean. Or, see below.
Thankfully I was able to push through the issues and challenges and produce something I think Jonathon would be proud of. I did what I could to keep his words intact while updating the content to be relevant to the most recent version of vSphere. I wish we could have kept his name on the cover, but his hard work paid off and we have all dedicated the book to him. I really hope it brings his family some comfort.
Rather than end on a sad note, instead I’d rather post some pictures of my adorable kids holding a copy of the book.
We’ve all seen those ads on the Internet that promise to teach us “one weird trick” that “<someone> hates” to do something easily. Make money fast? Yep. Lose belly fat? Check. Grow “other” parts of your body? Easy.
I couldn’t resist stealing the headline for something that I’ve seen come up time and time again when virtualizing large applications or monster VMs.
Raise your virtual hand if you’ve heard, or said, something like this in the past:
All VMs should start out with only 1 vCPU.
Most VMs don’t need more than 1 vCPU.
Giving a VM more than 1 vCPU could hurt performance.
I hear this one all the time. I even used to say something like it myself. When I think back to my early days with virtualization I used to say things like, “If a server needs more than 3.6GB of RAM, it probably isn’t a good candidate for virtualization.” (3.6GB was the max RAM supported in a guest OS in ESX 2.x) But is that the right way to approach sizing all virtual machines?
These days, most of the time I hear something like this as a result of a software vendor’s sizing guidance for their application.
“Why should I give that VM 4 vCPUs and 16GB of RAM? There is no way it really needs it!”
The thing to remember here is this: While starting small when sizing VMs is a good practice, the application’s actual requirements should dictate the sizing. That’s the “one weird trick” – understand the application’s requirements before rushing to judge the sizing. Just because you don’t like the idea of giving a VM a lot of resources doesn’t mean it doesn’t actually need them. Trust me – I’m a big believer in starting small and sizing VMs with fewer resources if the resource requirements are not well known or defined. That doesn’t apply to all VMs.
I hear it too often – folks who automatically rush to judge the sizing of a virtual machine just because they think it’s “too big.” We need to get past the mentality of “anything larger than 1 vCPU is bad.”
Before you head to the comments and say something like, “Vendor supplied requirements are BS. Vendor x told me that their application needed y requirements and it doesn’t use even half of that.” hear me out. Is that true sometimes? Definitely. But should that be your default answer? Definitely not. If the application you’re trying to virtualize is currently physical, a good way to determine if the sizing is accurate is to simply measure the existing load. But don’t forget – the current utilization alone is not enough to size the VM. You need the application’s actual requirements. I’ll illustrate that point with a story.
I worked with a customer last year to help them virtualize an important Microsoft SharePoint FAST Search implementation. We ran capacity assessments on the existing physical servers with both VMware Capacity Planner and Microsoft Assessment and Planning Toolkit. Both tools independently came to the same conclusion: based on the current utilization, these servers likely need 2 vCPUs and 8GB of RAM.
When I worked with the FAST architects, they wanted their servers sized with 8 vCPUs and 24GB of RAM. Naturally there was confusion as to why the current physical servers were only using a quarter of those resources. It came down to one thing: requirements.
The customer had a specific requirement to be able to support a certain number of QPS (queries per second) at a certain level of latency. In order to meet those requirements we needed to size the servers with 8 vCPUs and 24GB of RAM. We validated that design by running FAST performance tests at different CPU/memory sizing on the FAST VMs. Each time we were not able to meet the QPS requirements unless it was sized correctly.
What’s the moral of this story? Don’t be afraid of giving a virtual machine the resources it actually requires. Remember – there are tools like vCenter Operations that can tell you, over time, how a VM is using its resources. This provides you the opportunity to right-size the workload if the VM truly doesn’t need the resources.
Understand the vendor stated requirements as well as those of your customer/application owner. If you get a request for a SQL Server (or similar) and your immediate response is, “I’m not giving it that many vCPUs…” stop and make sure you truly understand the application’s requirements. Validate the sizing with performance testing and adjust if necessary. You’ll have happier application owners and better performing applications.
It’s been too long since I’ve blogged anything on a regular basis. Sorry about that. I would love to blame it on the three books I’ve written in the last year but that isn’t the only reason. Hopefully I can turn it around and start writing on a regular basis again.
I’ve been sitting on this one for a while but finally had some time to get it down. When it comes to virtualizing business critical applications, the perception of poor performance from virtualized application becomes reality whether it is true or not. It might be because organizations tried to virtualize performance intensive applications years ago when the ESX/ESXi platform was less mature and that memory lingers. It’s up to us to help educate those organizations that VMware has improved vSphere considerably and it is now capable of scaling to meet the demands of just about any application.
Fighting perception often means fighting nonsensical things that vendors say. Before I get into this I wanted to say that I consider Microsoft to be very virtualization friendly. They provide excellent guidance for virtualizing many of their products. Unfortunately not all of their guidance is so great.
Take a look at the following guidance on virtualizing FAST Search Server (component of SharePoint 2010) – bold emphasis is mine:
FAST Search Server 2010 for SharePoint supports virtualization, but for larger deployments, we recommend that you only use virtualization for your test and development environments- not the production environment. Our rationale is as follows:
FAST Search Server 2010 for SharePoint is a heavy user of CPU and IO
Virtual machines only have access to a limited number of CPU cores (Hyper-V = 4, VM ware = 8)
Virtual machines will give 30-40% decrease in overall performance
Yes – it’s bad enough that I’m actually ignoring that they spelled VMware as “VM ware” and focusing on the fact that they say virtual machines will give a 30-40% decrease in overall performance. I’m not sure where they’re getting this data (perhaps on older versions of vSphere or Hyper-V) but it isn’t true today. Having been personally involved in the performance tests of production FAST Search Server deployments I can say definitively that that is not the case as long as it’s sized corrected (like any other app).
My advice to you: If you see a vendor make a claim like this, ask for the data to back it up. Perform your own testing and validate whether it’s true. Make them prove it.
If we all fight this perception maybe vendors will notice and finally stop saying things like this. We all benefit when that happens.
I have to admit - Exchange virtualization is one of my favorite topics. I love talking about it with customers, colleagues, or anyone who will listen. It has a very well known workload profile which makes sizing consistent, it’s virtualization friendly in that it supports vSphere features like vMotion and HA, and I like talking about the challenges and creative solutions that are sometimes required to virtualize it. Yep, Exchange virtualization is my pal.
I’m also very familiar with its support limitations as I’ve written about them in the past (see here). One of the support policies has to do with CPU over-subscription. That is, assigning more vCPUs than there are physical pCPUs in the host. From Microsoft’s support policy page for Exchange 2013 virtualization (though this applies to Exchange 2010 also):
Many hardware virtualization products allow you to specify the number of virtual processors that should be allocated to each guest virtual machine. The virtual processors located in the guest virtual machine share a fixed number of logical processors in the physical system. Exchange supports a virtual processor-to-logical processor ratio no greater than 2:1, although we recommend a ratio of 1:1. For example, a dual processor system using quad core processors contains a total of 8 logical processors in the host system. On a system with this configuration, don't allocate more than a total of 16 virtual processors to all guest virtual machines combined.
Note the section I bolded above – that says that you cannot go beyond a 2:1 vCPU to pCPU ratio and 1:1 is recommended. Not only are we used to much larger consolidation ratios, once you factor in DRS automatically moving VMs to balance out utilization it becomes almost impossible to enforce.
Combine this with the relative disparity between the maximum amount of RAM in a host compared to CPUs. We have customers that have deployed ESXi hosts that have 16-24 physical CPUs but 1TB of RAM. If you want to virtualize Exchange on those servers, you’re left with only two possible outcomes:
1) You ignore Microsoft’s support restriction and exceed the 2:1 vCPU/pCPU ratio.
2) You don’t use anywhere near the amount of RAM you have in the host.
The good news is there’s a third option that allows you to work around this problem: CPU reservations. CPU reservations in vSphere allow you to guarantee that important virtual machines will have access to the CPU resources they need even if CPU resources are constrained. This works great for virtualizing Exchange for a few reasons:
1) You can effectively exceed the 2:1 vCPU:pCPU ratio for workloads other than Exchange while guaranteeing Exchange VMs access to those CPU resources. Doing so does not violate Microsoft’s support policies – see this great TechEd presentation on virtualizing Exchange where Microsoft’s Jeff Mealiffe discusses this as an option.
2) Hosts that have a disproportionate amount of RAM to CPUs can still be utilized without wasting resources.
The other nice thing about CPU reservations is that when resources are not over-subscribed, CPU resources can be shared evenly with VMs that do not have a CPU reservation. In this way they work differently than memory reservations, which do not allow reserved memory resources to be shared with other VMs even if memory is plentiful. With CPU reservations, when resources are available then all VMs get equal access to the CPU. Once CPU resources become over-subscribed, the VMs with a reservation get guaranteed access to their reserved CPU resources while VMs without CPU reservations do not.
It’s worth mentioning that this post came about mostly as a result of a Twitter conversation that Chris Wahl (@ChrisWahl) brought me into recently. This is one of the reasons why I love Twitter – great conversation that led to sharing of information.
I hope this helps those that are looking to virtualize Exchange and are concerned about the CPU over-subscription limitations. Using CPU reservations in vSphere will let you stay compliant with Exchange support while allowing you to over-subscribe CPU resources for non-Exchange workloads. Everybody wins!
Looking at this blog over the past 12 months kind of makes me sick. If you ignore posts that have to do with books and a guest blog post by my brother, you’ll see I’ve only published maybe two or three posts all year. That’s very bad, though the explanation for why can be found in those posts.
I’ve made it very obvious that I wrote a book about virtualizing business critical applications. I’ve been less outgoing about the fact that I’ve actually been involved in three book projects. Yep – three books in one year. In addition to the VBCA book, there’s also Mastering VMware vSphere 5.5 and another called VMware vSphere Performance that will be out next year (I’ll write a blog post on the Performance book as that’s a bit of a unique situation). I’m planning on writing a blog post that talks about writing a technology book in general but suffice to say the effort that goes into writing along with the fact that I have a young family (4 year old and a 1 year old) I’ve had very little time.
The third book is nearly complete and that should free up a lot of my time. I have a task list with several blog topics to write about and more in my head so I’m hoping to once again fill this site with useful virtualization information.
To those who still visit the site and still have me in your RSS feeds – thanks for sticking with me. Hopefully over the coming weeks you’ll see a lot more from me.
Amazon has the Kindle edition of Virtualizing Microsoft Business Critical Applications on VMware vSphere at a great price. It's currently only $22.99, compared to $38 for the paperback version.
You can grab the Kindle edition here: http://www.amazon.com/Virtualizing-Microsoft-Applications-Technology-ebook/dp/B00E8HSWQU/
I'm a little late in announcing this so hopefully the price stays at this level for a little while. If you already have the paperback and want the Kindle edition for on the go reading, now is a great time to pick up a copy.
To those that have already purchased the book - thanks so much for your support and we hope you enjoy!
Now that VMware has officially announced vSphere 5.5, we’re very happy to announce that the new version of the best selling Mastering vSphere series will be released as well! Mastering VMware vSphere 5.5 will be out a little bit later this year and brings with it all that you need to know about the new version of vSphere.
Nick Marshall deserves a ton of credit here for taking the reigns on this book and helping to drive it. He did a ton of work to keep up with all of the beta releases and incorporate all of the new features into the product. He did a great job a deserves a lot of credit. Scott Lowe, a veteran of this book series, was excellent as usual.
I was thrilled to be involved in this project and think you’ll really enjoy the new version this year. Along with me was Forbes Guthrie of vReference fame. Forbes has contributed to this book in previous versions and is also the author of two editions of the great book vSphere Design. We also had Josh Atwell, automation master, adding his awesome content as well.
There’s lots of new features and capabilities in vSphere 5.5 and we’re confident that this book will help you understand everything you need to know. Looking forward to the release!
Interested in getting your copy of a VMware Press book signed by the author(s) while at VMworld this year? VMware Press will be holding a Meet the Author event at VMworld to give you the opportunity to do just that!
Come to the bookstore at VMworld on Wednesday, August 28th from 12:30-1:30 for a chance to meet the authors, chat about anything you want, and get your book signed. I’ll be there, along with co-author Alex Fontana, signing our new book and other VMware Press authors will be there too. We really hope to see you there.
No need to sign up or anything – just show up and say hi!