Best Practice » CarrollNet

Database Backup Retention

bw-bestpractice-200x187 We’ve seen much written on database backups: how often, should you include transaction logs, how often to perform full backups, and what about testing.

We see almost nothing about database backup retention. Without understanding this topic, you could find you are not able to restore what you need. Or you could be wasting time and money keeping backups for too long.

Need a quick answer? You should keep your database backup for a minimum of 2 weeks.

Let’s see how we got to that number.

The Cost (and let’s not forget “secret” costs)

Retention is entirely a financial discussion. If retention costs nothing you would never get rid of old backups. You would keep backups forever.

Remember that retention has very real costs: the cost to make and store backups and the cost to keep them organized so you can find what you need.
What about the secret cost – the cost of migration. Let’s say you’ve decided to keep five years of backups. And during that time, you switched backup hardware twice and did a major backup application upgrade. If you need to see the backup of five years ago and don’t migrate backups with an upgrade, you may not be able to do so. When you set retention, you commit to migrating backups.

Ok, Back to Step One

Always start your discussion of retention with a rules and regulations review. If the database is used in a regulated operation, the regulating entity may have rules that dictate retention. Rules and regulations about data backup popup everywhere:

Healthcare providers (HIPAA)
Financial Services (SEC 17a)
Government Services

Your organization could also have special rules if:

it’s publicly traded (Sarbanes Oxley)
the organization handles hazardous materials (OSHA MSD)
or if the organization accepts credit cards (PCI-DSS)

Database backup retention requires understanding three issues that uniquely apply.

Be familiar with the application creating the database records. While the database server stores and manages the database infrastructure (catalogs, tables, views, indices, etc.), database content is managed by another application. It is this application that dictates precisely what is stored in the database, how it’s stored, and whether users can delete records which materially affects backup retention decisions.
Determine if the applications have some sort of data pruning going on. Say, for example, you have a database of patient records. Does the application ever purge these records or are they kept forever. If the database keeps the records forever, each backup is a complete copy of all the patient details. But if the database is purged and you need to restore transactions from the past, you must retain the database backups from the required period.
Remember a database restore is often an all or nothing affair. When you restore from backup, the entire database is reset to a point in time in the past. If that point is too far in the past, it could represent a tremendous amount of duplicate data-entry to bring the system back up to the present.

What Can Go Wrong

When designing a retention plan, you should consider under what circumstances you might need to do a restore. These are the general reasons you might want to perform a database restore:

Site failure
Server hardware failure
Database corruption
Audit compliance
Backup testing

Let’s look at how each affects retention.

Site Failure

We’re talking about bad stuff here. Fire, flood, hurricanes, earthquakes, zombie apocalypse…you get the picture. If the disaster is big enough, you may have to confront a total loss of infrastructure and reconstruct everything from scratch. And there are likely to be several days where public safety will prevent you from accessing the site.

Protecting database backups from site failures requires you store them at a secure location, such as a bank safety deposit box or an environmentally secure datacenter.
The offsite copy needs to be refreshed regularly to ensure you could bring the organization back to a useful operating point.

The best protection from site failure is of course to use an Offsite Backup Provider that automatically makes backups and then transmits them to a remote data facility. In this case, the discussion of retention is strictly a matter of budgets.

Server Hardware Failure

All servers will fail; it’s just a question of when.

Server hardware failure is the most common reason to restore from database backup.
Once a server is repaired you then restore database content from the last good backup.

In the case of server failure, the best place to access the restorable data is from within the same network as the server being restored. If you need to pull the data over the Internet, there could be a long delay as you download the content. And because the restore cannot be started until the download completes, the time to begin the restore operation is blocked on this download. The best architecture is to ensure your Offsite Backup solution is a hybrid of both onsite and offsite. This way restoring for the most common case will be fast and with minimal delay.

Database Corruption

It’s never pleasant to consider, but database corruption can happen.

Corruption is when the server is still operational, but the content is no longer consistent.
Limited corruption could mean slow performance while an index is rebuilt, but if the corruption hits the data store, it could mean lost data.

Typically, bad hardware causes corruption. Faulty power supplies can quietly damage RAM and CPU leading to silent corruption. Corruption can also be caused by:

power outages
a server crash
or anything that prevents an orderly shutdown of the database service

Audit Compliance

Not a favorite topic for system administrators, but if your organization is subject to rules and regulations, sooner or later you’ll be confronted with an audit. Any data audit includes questions about backups. And if you’re unlucky, the auditor may ask you demonstrate your ability to restore.

Keep in mind, in some industries failure to successively respond to audits could put you in hot water. In some cases, if the audit is the result of an investigation (such as HIPAA violations) it could even result in stiff fines. At a minimum, failure to comply will result in some very uncomfortable meetings where your career could be in jeopardy.

Backup Testing

Here’s a simple fact, if you don’t test your backups ahead of time, your first test is your first disaster. People who value their blood pressure (and sanity) will make time ahead of time to regularly test their backups.

Testing doesn’t have to be complicated or time consuming. Most people find quarterly or annual testing to be an acceptable trade off.

The best way to test is to keep a record of the procedure used to restore and recreate the environment. These experiences will be invaluable if you need to restore from an actual disaster.
Another option if your organization has an active software development activity is to coordinate the restore testing with the developers. Developers often like to have a development environment reloaded with realistic data. If your testing refreshes their copy – it kills two birds with one stone.

Finally, if you keep written notes documenting your testing activity, it may serve as a suitable response to auditors. If they ask you to demonstrate your ability to restore, give them a copy of your most recent restore. If your notes pass their approval, you’ve got a solid plan.

Bring It Together

So now that we know why we backup, let’s think about retention. Of the five reasons to restore, three of them have the same basic retention needs: site failure, server failure, and backup testing. In each of these cases, retention is short and sweet and the restore will be of the most recent good backup. This is your basic 1-Day Retention.

Audit Compliance recovery is driven by two factors:

the rules and regulations governing the organization
the Database Application’s pruning activity

If the Database Application does not prune data records, each backup is a complete record of all historical records. If it can be demonstrated the Application preserves all data, in most cases the nightly backup should satisfy the Auditor’s requests. If the Database Application preserves all data records, the 1-Day Retention should also satisfy an Auditor’s request.

If you’re in the unenviable position of supporting a Database Application that prunes records or does not protect regulated data from user deletion, retention is more complicated. In this case, if the rules and regulations require three years of data retention, you could be confronted with storing every database backup for 36 months. One way to confront this issue is to ask for advice from the auditors assigned to your organization. Often they will provide reasonable guidance.

Database corruption retention is the most complicated one to consider. The issue you need to consider is how long it might take staff to recognize a database corruption has occurred, and then pick a retention period that exceeds this number of days. We’ll look at this below.

Minimum Database Retention

Since Database corruption is the factor that drives most retention decisions, this is what we need to protect against. In the absence of other minimums, a good rule of thumb for most modern database systems is to choose 2 weeks of database backups.

You can sometimes get by with less, but doing so presents risk that subtle corruption could go undiagnosed if it coincides with company holiday’s or staff vacations.
If the corruption period exceeds your backup retention, you may loose the ability to restore from known good backup.

Keep in mind, even a modest database backup may exceed 100-GB of storage.

If you choose to keep 2-weeks of backups that would translate to 1.4-TB of storage.
This can add up quickly, so know what your costs per GB are when making the decision about how much to retain.