As we’ve gone through the vSphere 5 beta program, Storage DRS is the one piece that has both fascinated us, as well as scared us. As you all know, I’m a huge fan of transparency, and I want to be crystal clear with this post.
I won’t speak for EMC, HP, and Dell/EQL, but it’s likely that if you’re running your vSphere environment on a shared storage array from any of us, you’re likely not going to have a pleasurable experience with Storage DRS, unless you just don’t like taking advantage of our features. I’ll assume, for this post, that you do.
Over the past few years, NetApp has brought to the light storage efficiencies such as thin provisioning, deduplication, snapshot-based backups, et al. These constructs rely heavily on the storage array being able to manage the storage and these efficiencies at a granular level, most times invisible to the end-user.
So what happens when you move a .vmdk from one volume to another? What’s all the fuss about?
Let’s talk about Deduplication first. As we’ve all known for some time, a Best Practice of ours is to stuff as many VM’s as you can into a single Datastore [volume] in order to get the highest returns on deduplication. Users can see as high as 80%+ space returned (I know, I was one of them, and did) on their VM’s by collectively placing them in the same container with no performance penalty for doing so. Moving data [i.e. vmdk’s] between volumes will “un-dedupe” that data being moved, and you will have to re-run the original deduplication scan on that new volume in order to re-coop those savings after the move. While this is not the end of the world, it is a nuissance, and you should be aware of it as a caveat to StorageDRS, or generally moving VMDK’s between datastores.
Thin Provisioning. Nothing really “breaks” with thin provisioning, but you’re allowing an outside construct to control placement of data on what it “thinks” is a 1TB volume that you’ve thinly provisioned, when in actuality there is only 100GB truly available to write to. What happens when Storage DRS moves something there? Boom. I think most of you know what happens when a thin provisioned LUN exceeds it’s available space.
Side note: This is another brilliant feature of NetApp that often goes unnoticed, or untalked about. We have certains settings you can turn on that Autogrow both Volumes as well as the LUNs inside them.
Snapshots. This one is pretty straightforward. When we snapshot volumes, or the data inside, we store the initial tier of snaps within that same volume. What happens when you [or some outside construct] begin to move the items inside the volume to another volume? Those snaps (i.e. Your Backups) become invalid.
Look, bottom line, at the end of the day, the promise of SDRS is a good one… mitigate capacity and performance issues in storage. But… it’s not sophisticated enough yet to know about all the backend array-specific value-add, and take that into account in the DRS Recommendation algorithm, and you end up causing further issues that may even be more impactful to you than the one you were trying to address in the first place.
We’ll call this segment Unicorns & Rainbows.
In a perfect world, what would happen is that VAAI would come back into play with some additional primitives. vCenter needs to be less of the do-er, and more of the middle-man traffic director. From what I understand from VMware, this is the next evolutionary step of Storage DRS, as well as other things.
I envision a conversation to go something like this:
vCenter: “Hey NetApp!”
NetApp: “Hey vCenter! What’s up?”
vCenter: “You know that VM you’ve got in Datastore1? It seems to be growing pretty rapidly and is chewing up a lot of disk I/O. Think you could provision a new datastore and move this VM to it?”
NetApp: “No problem!”
- NetApp plugin VSC provisions a new datastore and mounts all hosts in the cluster
- DataMotion snapshots and moves the VM’s files over to the new volume via array-based CopyOffload.
NetApp: “OK, vCenter, all done!”
vCenter: “Wow that was super fast! Thanks! Looking much better now!”
Technically, this is the essence of VAAI; these types of “conversations.” And there’s no reason that we shouldn’t see Storage DRS become a big part of it in the future. I’m not privvy to any special information, and I’m not saying that it WILL be done, I’m just an ex-admin like a lot of you still are, and I’m really hoping it’s one of those things that gets developed.
I’ll wrap up this post with what we’re likely going to publish as our Best Practice for you to use… a Good/Better/Best sort of configuration scenario. None of this is set in stone yet, but due to the volume of questions I have received, I didn’t want to leave them unaddressed.
Basically, it all comes down to the DRS Recommendations, and how you set the slider bar to throttle the application of those recommendations.
Good: “manual” Storage vMotion is a good solution to migrate data
Better: Storage DRS is a better solution than having to move things manually
Best: At NetApp, we are going to recommend that you forego the use of Storage vMotion, and instead utilize our DataMotion (remember I like to call it “nMotion”) to move your volumes around on the backend storage. While this does not address individual VM-related moves, it does address performance bottlenecks, as well as offloading the act of the move to the storage controller, and maintaining all of your storage efficiencies that we talked about above.
I’ll leave you with a few final recommendations to keep in mind…
1) Set SDRS to manual mode and to review the recommendations before accepting them.
2) All datastores in the cluster should use the same type of Storage (SAS, SATA etc.) and have the same replication and protection settings.
3) Understand that SDRS will move VMDKs between datastores and any space savings from NetApp cloning or deduplication will be lost when the VMDK is moved. Customers can rerun deduplication to regain these savings.
4) After SDRS moves VMDKs, it is recommended to rebaseline Snapshots on the destination datastore.
5) It is highly recommended not to use SDRS on Thinly Provisioned VMFS datastores due to the risk of reaching an out of space situation.
If you have any further questions, reach out to me on Twitter, or leave a comment below and we can further the discussion.