Relieving Operational Constraints: The 90 Second Database Backup

Relieving Operational Constraints: The 90 Second Database Backup
By Ian S Worthington

The Problem
Identifying the 'right' time of day to perform your ALCS (or TPF) system backup is not necessarily totally straightforward. Even though most ALCS sites now use DFDSS rather than ZDATA DUMP, enabling independent scheduling of the backup utility, it is still desirable (if possible) to identify a relatively quite period in order to minimise the number of log tapes which would have to be rolled in for a restore.

At many sites though, a 'quiet' period in terms of transaction load may be a peak time for offline processing, and tape drives may be a critical resource. And whilst DFDSS can be started while ALCS is running, an ALCS outage requires cancelling the DFDSS backup process before ALCS can be restarted, forcing a restart of the backup process.

The Solution
All off these problems are easily eliminated if you happen to have created your database on RVA, or the newly announced ESS, DASD. These devices are already very attractive to ACS sites for a number of reasons, for example:

Built in data compression to eliminate all that 'wasted space' we hate in our physical records
The ability to eliminate MVS UCB contention by defining the ALCS database one extent per volume without wasting all that extra volume space
The ability to choose sequential file block sizes to optimize performance, rather than having to worry about DASD utilisation.

In this case however we exploit a feature of the 2 level internal structure of this subsystem which allows us to take very rapid, zero space occupancy copies of datasets by making use of the IXFP product's SNAPSHOT utility. The practical effect of this is that we can take a copy of a 23.5GB (2743 x 3390-3 cylinders) database from the production packs to the test system packs in only 90 seconds! We then backup these copies of our production database at out leisure, totally eliminating both any ALCS operational constraints and any constraints on tape drive utilisation.

Usually this database copy would be automatically deleted after a successful backup to tape (or two, if on- and off-site copies are required), but it can be left on DASD if it is anticipated that circumstances necessitating for an equally rapid (90 second) restore might arise. This is extremely useful during major system migrations and other potentially risky database changes to provide a rapid fall-back mechanism.

The formal full restore procedure is a similar two-step procedure:

Restore to a non production set of packs
SNAP back to the production packs.

This mechanism allows an anticipated restore from tape to be started whilst investigation, or attempted fixing proceeds on a damaged live system, possibly only for comparative purposes, without requiring a commit to the overwriting of that system. When finished the restored system can be updated by the rolling in of log tapes and only at the last moment SNAPed to the production packs.

Test Systems and Live System Resolution
The same SNAPSHOT utility can, of course, be used to refresh the test system database quickly and easily on a regular basis (see footnote 1). More interestingly though, it can be used to help debug live system problems which cannot be:

Reproduced on the existing test system
Reproduced on a TDB over the live system, or
Debugged on the live system directly for operational reasons.

In these cases we can use SNAPSHOT to rapidly clone a private copy and perform problem resolution on this, freed of the above limitations. And if the debugging is not a prolonged activity the real space requirements for this copy are minimal.

Summary
The unique storage architecture of RVA (and the new ESS) DASD presents tremendous new opportunities for ALCS system management. The only cost is having to break old patterns of thought about how disk storage works!

(Footnote 1)
During system cutover we built our critical test system and system backups during a brief 2 minute ALCS outage to ensure total database consistency. Now that we are in production we are refreshing our test systems during normal operations in order to investigate the possibility of performing log-less restores by evaluating the level of data chain inconsistency compared to that of a system with rolled-in logs.

Ian S. Worthington is an independent ALCS/MVS systems consultant currently retained by British Airways Speedwing on behalf of Avianca in Bogota, Colombia. He is always interested in working on new projects and can be contacted by email at IanWorthington@usa.net, or through his New York City virtual office by fax on +1.212.208.4463.