A few weeks ago I kicked off a survey about disaster recovery testing and how the plan copes with human factors. You can see the original survey here.
This Dilbert cartoon is a pretty accurate portrayal of most clients' disaster recovery plans when I first start working with them: http://dilbert.com/strips/comic/2000-08-15/.
Here are the survey results:
The "other" responses are:
-
6 x As a company we haven't done it in decades + but I do test my backups every day.
-
2 x We have a dr strategy but know it's broken.
-
1 x 4x per year, but no fixed schedule.
I'm pretty disheartened by these results – only 41% of respondents test their DR plan at least once a year, but at least 80% of respondents actually *have* a DR plan.
Apart from the obvious reason to test a DR plan initially so you know it works, it's very important to test it regularly as very often assumptions made when the DR plan was written are no longer valid. For instance, if the database size increased then it's going to take longer to restore, and so may break the RTO agreement with your management. What if someone made a change to the backup procedures and now your restore sequence is broken? What if you're not monitoring database mirroring correctly and the REDO queue on the mirror is such that a failover takes longer than the RTO? What if the backup generator is broken? The list goes on and on.
I really didn't expect anyone to pick answer #4 – I'm shocked. How can sane management include preventing the technologists from testing the plan that's going to potentially save the company if a disaster occurs?
One line I like to use as a consultant when talking to senior executives is: would you rather find out that your disaster recovery plan is broken through a controlled test when all the senior folks are standing by to put things right or when an actual disaster happens in the middle of the night on a public holiday when only the most junior folk are on duty and the chances of monetary losses are significant?
(Don't get me wrong – junior does not equal incompetent in any way in my book, but that's the kind of reasoning I've found to work with senior executives in corporations who are far removed from the technological coal-face.)
So how does human nature factor in here? Well, it's human nature to not be worried about disaster recovery – until a disaster happens. It's kind of the "out-of-sight, out-of-mind" mentality. There's also the possibility that people know the DR plan sucks, but no-one wants to confront that fact and have to go fix it – this is sheer irresponsibility on someone's part (maybe not the DBA if they're not given the time to go fix it). There's also the "in won't ever happen to me" mentality. How many of you reading this post have walked around your house with a video camera making note of your belongings in case your house is destroyed? I know I haven't gotten around to it yet – it keeps getting pushed down the to-do list. It takes an effort of willpower to make these things bubble to the top of the to-do list and stop procrastinating.
Go test your disaster recovery plan – you'll be amazed at what you'll find is broken. I wrote a blog post about this back in 2009 after conducting a survey of what people discovered when testing their DR plan – see here.
The "other" responses are:
-
1 x I am so not coming to work on that day.
-
1 x Our DR site is 1200 miles away, but assumes compliance by the DR site folks. A nationwide disaster would be tough to overcome.
These results are not surprising at all. The majority of companies do not consider human nature during a disaster. Saying that, however, I think a distinction should be made between countries that are highly disaster-prepared and disaster-conscious, like Japan, and countries that in general aren't, like the US (go read this blog post that discusses Japanese disaster preparedness if you think I'm wrong here).
I think it comes down to what I said above: "it won't happen to us". Most DR plans that I've seen assume that the disaster being recovered from is one that's only affecting that company and isn't affecting the personal lives of those responsible for doing the disaster recovery. But in a widespread disaster most people are going to be focusing on themselves and their family, not thinking about whether the production database is still available. Does your company realize that?
Time to rethink your disaster recovery plan? No-one else is going to do it for you… that's human nature.
8 thoughts on “Human nature is a significant hurdle to successful disaster recovery”
Hi
I can’t see the idea in having a DR plan, for a worse case scenario where ppl will abandon everything and run to save there lifes and family, if for instance a giant tsunami washes all north america into a desert. Lets say I have an internet based shop that is selling fishing equipment. When an event like that hits the US, there are no customers left, I have no equipment to sell, why should I bother to have a well functioning web site that can accept orders lets say 10 hours after the event occurred? What’s the point?
If I’m a manager of a hospital, then I need to make sure that in case of a natural catastrophe, the hospital can function without IT. We as human need to adjust to cell phones, ATM’s etc. not working, it will be chaos, but you cannot do anything to prevent this, unless you build a redundant USA in the other side of the world, and transport the entire nation in case of an event.
"I can’t see the idea in having a DR plan, for a worse case scenario where ppl will abandon everything and run to save there lifes and family, if for instance a giant tsunami washes all north america into a desert."
This view is flawed for several reasons.
First of all, it’s called "recovery". What do you do when the disaster is gone? How do you cope? Short of a disaster that destroys the entire Earth, there are always ways of making sure you aren’t completely empty-handed at the end of it all. Do you have a remote secondary site? If that’s gone, do you have international emergency funds available? Who are your contacts for rebuilding? Even if, in the course of such a plan, you find out that there is no cost-effective way to recover and you will probably have to give up and start over, writing that down can save a great deal of grief and worry at a time you can’t deal with it. Of course a plan for Hyperwidgets International is going to be more comprehensive and cover more scenarios than that of Joe’s Ice Cream Stand, but even Joe can plan for the future.
Second, the plan is not just for saving your precious hardware. A good plan will deal with the people as well. When disaster strikes, you are responsible for the people working at your company. Do you say "everyone for themselves, please take some backup disks on your way out" or do you think about what you could do to ensure you still have employees left in the end? For major disasters, you need to look beyond the scope of just your business, even if what the plan deals with *is* just your business. After family and friends, colleagues are an important part of most people’s lives. A company can help keeping those links intact even in the face of disaster. In fact, in time of a major disaster that’s probably the company’s primary responsibility: making sure there are people left to run it.
"We as human need to adjust to cell phones, ATM’s etc. not working, it will be chaos, but you cannot do anything to prevent this"
It’s not about prevention, it’s about coping. Eventually someone will be able to use a phone again. What’s the first call they should place after taking care of family matters? You should train people now — they’re not going to think about it later.
I view #4 as a legitimate response (though it’s not my response). Testing a DR plan is like paying a premium on an insurance policy. The premium is to reduce your risk. The company doesn’t make any money by paying the premium.
Sometimes the cost of the premium isn’t worth the insurance. When the manager doesn’t want to test the DR plan, that’s all it is. It’s not that they don’t care.
We all want to do best practices all the time, but time and money are limited resources and you have to spend them wisely.
I have to disagree with you Clay – I don’t think it’s wise to not test a DR plan, given the very high percentage of failures I’ve seen when a DR plan is put into effect for real. Systems are so complex these days that it’s almost inevitable that something has been forgotten in a plan that is not tested. I believe that if a manager truly wants to protect the company when an unplanned disaster occurs, they can’t be sure the DR plan will work unless it’s tested. Unless of course they have 100% trust in the capabilities of their staff to create and implement a completely comprehensive plan.
I hear what you’re saying but I’m seen failure far too many times to agree with not testing.
Cheers
"Sometimes the cost of the premium isn’t worth the insurance."
Ask a manager who says they don’t want to test if they’ve done a cost-benefit analysis. I expect you’d get blank stares. Testing the plan costs money and not testing it doesn’t cost money, that’s probably the extent of their analysis — but that’s deeply flawed.
Reducing risk is a measurable benefit. If there’s an X chance your plan fails, with associated cost Y, reducing the risk to <X by testing it reduces your projected loss. If this projected loss exceeds the costs of testing the plan (which it almost certainly will), testing is a net benefit. There’s diminishing returns for testing often, but never testing seems absurd to me — that can’t possibly be sound cost-wise.
Certainly something as banal as testing if your backups are actually functional cannot possibly take so much time that it’s cheaper to run the risk of having no backups when you need them. Sure, the more comprehensive a plan is, the more costly it is to test, but the costs associated with the plan failing likewise skyrocket. It’s actually not so much a question of "do we test this" but "how often should we pay to test this". There won’t be many plans where the economically appropriate answer is "never" with the plan still being useful.
OK point taken.
Paul, I get emails when we post comments but not when you do. I’d like to know about your responses. I’ll likely never read your response in this case because I rarely reread posts.
Hey Clay – there’s a checkbox below the comment – did you check that? Maybe it’s different for the blog admin. Anyway – we’re moving to WordPress soon.
I did. I received the email for the AC comment from the Netherlands, but not yours.
Don’t matter anyways since you’re moving to WordPress. Glad to hear the website is getting updated. :)