It Could Happen To You!

When people in our industry mention disaster recovery, almost all of us envision the same thing. Whether it's an explosion, fire, tornado, earthquake, flood or whatever, the images conjured up are of charred or melted computer equipment, halon dumps clouding the computer room, and (always) fire trucks. But let's step back from the shock-value descriptions, and take a closer look at the disaster recovery and contingency plans most of our installations have created, because what I want to talk about this time is a disaster recovery plan that no one has bothered to consider.

Over the years, I've had occasion to not only review disaster recovery plans, but also to participate in formulating some as well. The main focus of most recovery blueprints is the rapid re-establishment of a hardware platform (including communications), and the ability to rebuild or restore the software platform of operating systems, applications, and databases to get the business back "on the air". Disaster recovery rehearsals, while interesting, are not very much fun. Though somewhat more serious than a fire-drill, the only challenger you're competing against is the clock. Sure, there may be problems with the hardware or unpacking some tapes, but each of these things will be addressed and fixed and, if the script has been written well, a reasonable (if not full) recovery will be achieved. Now let's consider the human factor.

There are only two choices that can be made by the folks calling the shots during a disaster recovery rehearsal. Either the people involved (the operators, programmers, hardware installers, systems engineers, managers, and let's not forget vendors CE's and SE's) have all been forewarned about the rehearsal, or they haven't! The former makes for a much easier and tolerable scenario. In fact, many people actually get together and rehearse for the rehearsal. On the other hand however, the unexpected telephone call at 9:15pm on a Thursday night not only puts an immediate kink in your evening plans, but does add an air of urgent realism to the drill. After those first few "cc's" of adrenaline have hit the ticker, the truth rears its ugly head and you discern from the phone call you've just received (and the one you will undoubtedly make to the data center) that terrorists have not detonated a nuclear device next to Prime CRAS, and that this is just a dry run. In either event, the troops will eventually gather, set about at the task of recovery, and once again the flag of corporate leadership will fly high and proud.

Unfortunately, what very few people are willing to discuss or even consider is the potential for an actual catastrophic incident involving a sizable loss of life. Most data centers are in the same building with the people who support and maintain those operating systems. An actual disaster could very well destroy the "recovery personnel" along with the data center. Add to that the emotional pain of the survivors, and there really wouldn't be much opportunity for recovery.

So now my friends I come to the matter I would like you to consider. What would happen if we throw a small twist into the story? Once again it's the unexpected phone call at 9:15pm on a Thursday night. Only this time it's not the data center that's gone. It's the whole company. It doesn't matter if it's a major project that's been "suspended", or an airline forced out of business, the bottom line is you're out of work! The economy stinks, the job market is soft, and now you and hundreds of what used to be your co-workers are unemployed and looking for jobs! We are talking about an unpretentious disaster of enormous proportions here. How do the people recover from this kind of catastrophe, and what role should the TPF industry play in the recovery process?

I would like to propose a new requirement for the TPF User Group, and suggest the formation of an Industry Disaster Recovery sub-committee. Not just a task force concerned about hot-sites and back-ups, but a group of individuals prepared to deal with the human side of disaster recovery. By combining the technical and personnel resources of every member company, an effective industry response to a scenario that leaves hundreds of TPF people jobless would benefit everyone concerned. As specialized technicians we are small in terms of numbers, but very large in terms of skills and abilities. Many of us have already worked on "joint" projects where several companies made an investment, and everyone reaped the benefits. This is yet another good reason to leave politics behind, and for us to work as one community, with one goal: recovery and survival.

Every outage hurts us all. Every screw-up costs us more. Every project failure reflects on each of us, and every business closure diminishes our strength. There are people out there who could make a contribution, but can't find a job. It's also likely that those ranks will swell before year end. If we choose not to take care of ourselves and our own, we face the prospect of courting a disaster from which we cannot recover. We may very well become extinct.

Alan Sadowsky