Do you test your backups?
So you’ve worked out how often to take your backups, what sort to take, and where to store them. You’ve created your jobs and have them running regularly, now what? Now you need to test that they’re working, and not just now, but forever into the future. There is only 1 way to know that your backup worked, and that is to restore it.
When disaster strikes, it is essential to have working tools to fix the problem. In the case of SQL Server, your backup is one of your most useful tools – it can be used to help you recover from corruption, accidental deletes, server crashes, UFO abductions, pretty much everything. But it’s only useful if you can actually restore the data.
Database backups can break for a number of reasons, it could be that the data was already corrupt when you took the backup (checksum verification can help with this), the data was corrupted when creating the backup, a disk corruption occurred after the backup was created, and so on. I once had a case where copying the backup file to a different server caused corruption due to a faulty network switch.
How do you test that your backups worked? My favourite way – and I’d argue the only way – is to restore them. After your backup has finished, have a job that restores the data somewhere. It doesn’t have to be your primary system, it doesn’t have to be a production system at all, the storage doesn’t have to be any good – connect a USB drive if you have to, your server doesn’t have to have much memory or CPU, all you need is a server that you can schedule a restore on.
What should you restore, and how often should you restore it? That comes back to the question of how much data can you afford to lose? If you can’t afford to lose more than a days worth of data, then test your daily backups. If you can’t afford to lose more than 5 minutes worth of data then test you log backups. How do you do that?
At a number of companies I have worked at, they have a pre-production or staging environment where all deployments are tested before they go live. These environments are used to confirm that the steps we’re about to run in production will not break the system. Every night, these databases are refreshed with production data (with security treated like a production system). How do these companies know that there backup last night worked? Because the restore job completed every night, we could be confident that the backup worked and be reasonably confident that it’d work again (there’s still a chance of disk corruption after the backup was taken)
That works for nightly, but what about Log backups? When data is that critical that you can’t afford to lose a log backup, you normally have some sort of HA/DR system in place. Log Shipping is one of the simplest to setup, and will also validate all of your log backups. How can you be sure that it worked? Because you restored your log file onto another server. Another reason why Log Shipping is still great today.
If you are not restoring your backups, you don’t know if you have a backup until it’s too late. Find a server, find a USB drive if you have to, and start restoring your backups.