Lights Out At Federal Express
by Alan Sadowsky
Back in July I wrote an editorial entitled Phooey On GUI which talked about console automation for TPF. At that time I made the statement that the Console Automation Group at Federal Express was shooting for Lights Out in `96... total hands-off TPF operations by the end of the year. With a good deal of pride and accomplishment, I am very pleased to announce that we've achieved that goal. FedEx TPF operations has been running 99.5% or better since early December.
Before I get into the details, let me answer a few of the obvious questions some people are likely to have. Is there an Operator sitting at the console? Yes there is. Aside from the political reasons, the Operator has two prime responsibilities: keep the tape stackers loaded (we're not using SILO's), and advise the On Call person when the system crashes. The SILO issue will be addressed when we upgrade to a current release of TPF (we're on 2.3 right now), and we're working on automating the On Call advice as we speak. One other function the Operator serves is to initiate on-demand jobs or utilities which by their nature cannot be included in our automation scheduler.
Federal Express is running the Automated Console Expert (ACE) product available from Diversified Data Resources (DDR). The automation software base is comprised of 300 XPROCS (ACE procedures) and less than 20 C language, QuickBasic, and REXX .EXE modules, and an ACE database of just under 700 lines of targets and actions. In production since March of 1994, the automation has successfully issued over 1,750,000 commands to TPF, without any Operator intervention!
The easiest way to approach the automation itself is to break it up into three distinct areas of discussion: Normal Processing, Utility Processing, and Exception Processing. Under normal conditions, the user makes a request on the terminal and gets a response back. In these cases (which certainly comprise the majority of the transactions processed by TPF), the automation has little if anything to do except monitor the system for acceptable throughput, and resource utilization. Pretty dull stuff. Things do get a bit more interesting though when we talk about utilities.
Utility processing is undoubtedly the most interactive, Operator-intensive function on TPF, and the automation handles it just fine. We've automated all of the standard TPF utilities (Online Capture, RECOUP, PDU, JCD, etc.), all of the Applications based utilities (Nightly File Maintenance, specific database captures, Application database pilot loads, archive retrievals, etc.), and even routine maintenance functions (IPL of the Loader General File, loading of the Communications Pilots, creation of system TLD tapes, loading of OLD tapes, etc.). Once again, with the exception of on-demand requests (ie. OLD loads), the Operator doesn't touch the keyboard. The scheduler queues the utility, and the automation initiates the run. On-demand requests do require Operator input (ie. DO OLD), but these are also easily automated with a simple interface we've designed which doesn't require any Operator input.
Exception processing is extremely critical for us at Federal Express. As in any TPF shop, time is money, and unplanned outages or network interruption (when they do occur) must be kept to a minimum. If TPF takes a catastrophic error, the automation will recover the system and get us back to NORM state in under 3 minutes (in most cases less than 2 minutes). The automation also goes through full network recovery in 60 seconds or less. The Operator does NOTHING! In fact, there have been times when the Operator has been away from the console, and wasn't even aware that an outage occurred. Out of the multitude of operating system platforms running production systems at Federal Express, TPF consistently has the best uptime and availability in the entire complex... thanks to automation!
Now that we've got the generalities out of the way, I want to wet your appetite with some specifics. I think that once you've had a true taste of what automation can do for you, you'll be a bit less anxious about automating operations at your installation. I've chosen Online Capture as an example to get you started on the right path.
Capture is one of the most Operator-intensive utilities in TPF for several reasons. First, there are numerous operator commands that must be issued to allocate the necessary tape drives, stop logging, mount new RTX tapes, start logging, clear the Capture keypoint, set the Capture delay factor, alter the ECB level (for multi-channel captures), etc., etc., etc. Secondly, monitoring the Capture requires a fair amount of diligence. Not only must the Operator insure continuous processing of Capture, but at the same time has everything else in the system to keep track of. To add yet another level of "awareness" to the process, the logging of all Capture tapes, RTX tapes, and the KPC tape is usually a manual process which leaves no room for error. Now throw in the optional variables of frequency of Capture, the time of day Capture is run, and (just for giggles) Operator knowledge and experience. Suddenly the prospect of automation takes on a new light. Let me entertain you...
At the pre-designated time, a message is displayed on the console advising the Operator that Online Capture has entered the request queue. The Operator has the option to abort Capture if necessary. The Operator is reminded to ensure that the tape loaders are full, and then the automation "sleeps" for 5 minutes giving the Operator time to check the drives and loaders.
The automation stops logging and removes the active and standby RTX tapes.
The Capture Keypoint is cleared, and the Capture delay factor is set to the proper level.
The tape drives for Capture are automatically allocated from the available drive pool, and the Operator is advised which drives on each tape channel have been acquired for Capture.
The RTX active and standby tapes are mounted and RTX logging is re-activated.
The automation issues the Capture start command, alters the ECB level for dual-channel processing, adds the second tape drive for each channel, and displays the Capture status for the Operator.
When all online modules have been captured, the KPC tape is mounted and Keypoint capture is initiated.
While Capture is running, the automation is logging all of the BX? capture tapes, all of the RTX tapes, and the KPC tape to a file on the console PC. When Capture is finished, the automation prints a complete and accurate CAPL:OG report on a laser printer attached to the PC console. This report shows the date of the Capture, the start time for Capture, the end time for Capture, the tape drives utilized, all of the online modules captured, all of the tapes associated with each module, all of the RTX tapes used during Capture, the KPC tape used during Capture, and has a place for the Operator's signature. One copy of this report is filed in a CAPLOG binder, and another copy goes to the tape library so the proper tapes can be pulled and taken offsite to the vault.
Of course safeguards have been built into the automation to prevent things such as stealing an allocated Capture tape drive before Capture actually starts (ie. if an RTA tape switches and a new standby must be mounted), but this takes us into the realm of tape automation, and that's worthy of a separate discussion entirely.
What is important to note however, is that during this entire process, the Operator hasn't touched the console keyboard once! The utility is scheduled automatically, initiated automatically, acquires the necessary resources automatically, executes the necessary TPF commands automatically, monitors the progress automatically, keeps the Operator advised automatically, and even prints out the report... automatically! If your brain cells haven't gone into overdrive quite yet, consider the fact that similar automation is also running our RECOUP's, our PDU's, our Data Collections, our system maintenance functions, and all of our restart and recovery procedures.
My opening statement in this article was (admittedly) bold, but it bears repeating because our accomplishments at FedEx can also be your accomplishments. The TPF operating system at Federal Express is run on a mainframe, and run by a PC. Lights-Out operations is not a dream. It is very much a reality that anyone has within their grasp.
Chris Haag and Al Sadowsky are the two Senior Systems Programmers responsible for the creation and maintenance of the TPF console automation at Federal Express in Memphis, Tennessee.