The other day, I saw a tweet from someone that claimed the following:
NetApp Snapshots are using up to 60% of your storage!
Where did they get that number?
Is that even possible?
Well, kind of… but not really. If your production storage system space is 60% allocated by Snapshots, YOU ARE DOING IT WRONG.
Snapshots aren’t really a concept that is unique to NetApp, but they do have a unique approach that’s different from the typical copy-on-write style. Oh, and well, we do also hold the registered trademark on the name. Lots of other vendors use their own version of Snapshots – Microsoft, VMware, CommVault, to name a few. NetApp Snapshots are fast to generate and restore from. They don’t hurt system performance, and they don’t eat up a lot of space. Allow me to explain…
How snapshots work in ONTAP
To understand why Snapshots grow in ONTAP, you first need to understand how they work. A Snapshot is a point in time backup of a filesystem (such as a FlexVol), similar to a photo taken with a Polaroid camera. In ONTAP, it’s essentially a small file pointer to the actual blocks stored in WAFL. It doesn’t contain any actual data – it just preserves the blocks within an inode map until the Snapshot is deleted.
However, as data is deleted from the active filesystem, the Snapshot will begin to balloon as the data is moved from “active” to “snapshot.” New data does not add to the Snapshot, as the Snapshot does not know about that data. Instead, it only knows about the data at the time of the snapshot. It will also not allow data blocks to be permanently deleted or overwritten [read: reclaimed] if it has an active link to a Snapshot, even if the data has been deleted from the active filesystem.
Snapshots (and Polaroids) are NOT Archives!
If I were wanting to preserve my family memories forever, I would not trust them to a Polaroid instant camera picture. This isn’t because they’re not good – on the contrary, Polaroid photos last a very long time if you properly care for them. But what happens if I spill water on them? Or lose them? Or my house burns down? And they *do* eventually fade…
Similarly, I don’t use Snapshots as archive or DR. Snapshots are intended as short term backup and retrieval systems on production systems. This way, you can quickly backup and restore, but you don’t leave Snapshots lying around to bloat and eat up all your space. Products such as SnapDrive, SnapManager, SnapProtect, OnCommand, etc, can help manage these backups via Snapshot, especially for block storage/database applications. ONTAP itself provides easy to manage scheduling of Snapshots for volumes and aggregates that don’t need to be quiesced for LUNs or applications. But in the end, you simply don’t keep Snapshots around on production systems indefinitely.
But what if I want to archive with Snapshots?
Well, the good news is… you can!
“Wait.. you just told me I can’t!?”
Right! While it’s not a good idea to use Snapshots on a production system as an archive or disaster recovery (DR), it’s possible. But what you really want to do is leverage Snapshots + DR solutions like SnapMirror and SnapVault. With SnapMirror and SnapVault, you can copy data and Snapshots to a remote location (or locations). This removes the “burning house” scenario. Multiple sites removes the “spilling water” scenario. SnapMirror would provide DR functionality, so if your primary site ever went down, you could quickly bring the DR site up ready to use. SnapVault provides archival functionality for long term retention. Generally, you’d do that on less beefy systems with larger, slower SATA disks, while your primary site runs on super-speedy flash drives and your DR runs on fiber channel or SAS drives to limit the performance hit in the event of a disaster.
What’s the difference?
I remember a time when backups meant kicking your tape drive on a daily basis to get it to work. Back then, you worried about schedules, backup types (incremental, full, differential), and who to use to ship tapes off-site for safe keeping. Now, backups are much more diverse and efficient.
With NetApp Snapshots, there are three main concepts to keep in mind:
This is your “Sally from accounting just deleted her favorite cat picture” backup plan. Retention of backups may vary, but generally range between 1 week to a month. Backup admins tell everyone that this is the case, to set expectations for restore times. Snapshot space should not be an issue here because retention polices will be more aggressive and Snapshots that are older will roll off and free up any space that was deleted from the active filesystem.
This is your “a meteor just hit our primary datacenter, how do we stay online” contingency plan. Ideally, this will be an exact replica of production, all the way down to hardware. But we all know that’s expensive, so you live with as much degraded performance as possible until you can dig the production site out of the rubble. However, keep in mind that your DR site might just need to be the new production site… Snapshot space won’t be an issue here because the data is not being deleted on DR site (particularly because it’s all read only until a SnapMirror is broken) and the replication schedules will roll off older Snapshots.
This is the “I’m never going to look at this data again. But if I ever delete it, it’s guaranteed that my CEO will want to retrieve something the very next day” plan. This is useful for a variety of things, including HIPAA (which requires 7 year retention), film/video game production, seismic data rendering, etc. Essentially, any data that is taking up a lot of space and isn’t actively being used. In these cases, Snapshot space isn’t really an issue because the data doesn’t get deleted. Nearly all of your data is in the active filesystem. We also have additional solutions, such as SnapLock, that act as a WORM (Write-Once-Read-Many) solutions that act as DOD-certified lockdown solutions, where data can never be deleted within long-term archives.
Oh, and did I mention that you can now backup and archive to the cloud?!?
In fact, Nick Howell blogged about that very thing here: