ONLINE POOLS - OFF THE DECK AND INTO THE WATER
The new Online Pools Package is currently running in production at four major installations. US Airways, Galileo, and British Airways are running the total Package, while Amadeus runs a portion of the Recoup code. The Package is now in the hands of IBM and is expected to be released to the entire user community sometime in the future. The following describes some of the experiences each of the current users have had since migrating to the new ONLINE POOLS REWRITE PACKAGE.
When Dave Glass and Steve Pietsch first sat down to design a Recoup Rewrite for US Airways in 1991, they had no idea it would escalate into the major package it is today. The original requirement was to eliminate as much human error as possible. To accomplish this, a design was put together to eliminate the need for tapes and general files in the Recoup environment. This design progressed into a total rewrite of how file pool was manipulated at US Airways. There was a lot of hard work by a lot of people* and the first design was loaded Online at US Airways over a period of several months in 1992. Coincidentally, a requirement for a total Pools Rewrite was working its way through the TPF User's Group at the same time. The entire industry agreed that a rewrite was needed, but, as usual, had little time and manpower to proceed forward with the requirement.
Finally, in September of 1993 a meeting, attended by many TPF users and chaired by Amadeus, was held in Poughkeepsie, NY to initiate work on the User's Group requirement. During this meeting Dave Glass and Steve Pietsch of US Airways presented their Online Pools Package. Though their package was a far cry from what is now Online Pools, it was agreed that the US Airways package would be used as the base for the project.
Over the next several meetings a Task Force was set up, a mission statement was published, a time line was established, basic requirements were identified and US Airways was established as the coordinator of the project. The Task Force consisted of 9 companies throughout the industry - Amadeus, British Airways, EDS, Galileo International, IBM, Japan Airlines, US Airways, Delta and Worldspan. Each Task Force member contributed greatly to the design, coding and implementation of the Package.
The mission statement was to create significantly enhanced POOL Utilities in the TPF environment, which satisfy the majority of TPF customer requirements. Those basic requirements were:
1) Elimination of tapes and general files.
2) Provide multi-processor, multi-istream capability.
3) Run all utilities, all phases in Norm State.
4) Increase performance and integrity.
5) Provide easy fallback capability.
6) Provide historical data.
7) Eliminate all Offline procedures.
By April of 1994 the technicians were hard at work, however, legal documents and contracts had to be worked out between all companies to provide for code sharing, code ownership, etc. This legal process was not complete until January of 1996! By then most of the design was complete and the Task Force was concentrating on the integration of all pieces into one complete package. All pieces of the package were complete by mid-1997.
Which brings us to the US Airways Online implementation. The current File Support staff of Steve Pietsch, Alex Renko, Charlie Crowe, Barbara Shannon and Gil Queipo spent several months developing, integrating and testing all features of the Package in anticipation of an August 2, 1997 production cutover. Many others** also contributed to this migration over the previous month's, and even years, leading up to August 2. Online implementation did occur August 2, without a hitch, and our experience with the package since has been nothing short of exceptional.
The US Airways File Support team wishes to thank all Task Force members, as well as all participants in the project, for helping make our cutover smooth and uneventful. We especially want to thank British Airways and Amadeus, who played a major role in the testing of the final product, and Galileo International, who not only played a major role in testing but took over leadership of the project early in 1997.
* Participants in the original US Airways design and implementation include Dave Glass (project lead), Steve Pietsch (lead technician), Bill Cook, Ron Johnson, Ron Carlson (technicians) and Mick Mitchke (manager).
** Participants in the implementation of the TPFUG Online Pools Rewrite at US Airways include Dave Glass (project lead), Steve Pietsch (project lead and technician), Alex Renko (lead technician - Recoup), Charlie Crowe (lead technician - PDU), Barbara Shannon, Gil Queipo, Debbie Miller, Dave Skirzenski (technicians), Henry Meyer (legal contact), Pat Howard (director) and Mick Mitchke (manager).
The evening of August 2 was warm, and eerily quite in Winston-Salem. Online problems prevented US Airways from running its first recoup until 3 days after the 4.1 cutover. At that point, available pool was down to about 8 hours. PDU's were not giving enough pool back. Finally the green light was given. Prior to entering the dreaded ZRECP PROCEED command, programmers and operators were sweating out a database problem with 1000's of broken chains due to a misbehaving application. It was 5AM, 1 hour to opening of the East Coast airports. 50% of all pool is about to be rolled in (were lost addresses). Brand new code, will it work, or will we be flipping hamburgers?
US Airways was the first company to cut over to the new package in its entirety. This was done at the same time as the cutover to TPF4.1. Up until that time much manpower was expended on completing the final task force product and extensively testing the integrated package. Says Alex Renko at US Airways: We all put in a couple of solid years getting ready for cutover, and we ran into every possible problem, from design problems with the online chain chase and multi-processor chain chase (they were not designed to work with each other), to old-vanilla-code problems that did not perform properly, to testing and accounting for every pool address out of millions. We actually had a working package 1 year before cutover, and then redesigned and refined it again for better integrity and usability.
Well, US Airways experience with this new package is nothing short of outstanding. Database integrity has been maintained throughout, while achieving all of the benefits expected. That is not to say everything has been perfect. Anytime 140,000 lines of code is installed and running there will no doubt be a bug here or there. In our case, all of the problems encountered were minor and no outages or database damage ever occurred. US Airways already had an online recoup package, but that package ran only on one processor, with one chain chase item at a time. US Airways still needed to get a significant recoup turn around improvement, a significant reduction in downtime for reallocations, and a pools package that would be supported by IBM. Recoup results are as follows:
In-use Time (not including coverage checkout time)
Phase III & IV
Phase V & VII
3.0 hrs UNI
US Airways prefers to run phases 5&7 in order to generate and keep printouts of lost and erroneously available addresses. Overall turn-around time for a recoup therefore went from 11-12 hours to about 3.5 hours, or 2 hours not counting phases 5&7. The results above are the most recent results and are the result of additional tuning says Alex Renko:
- Running multiple L/C and T/C processors, some with a high number of ECB's (about 200 ECB's per CPU).
- Designating all ID's with long chains to run on the primary processor, and to be kicked off first.
- Designating the largest database to run on a secondary processor, that processor having the largest share of ECB's.
- Designating the largest TPFDF structure to start chain-chasing before other TPFDF structures start chain chasing.
As a result of this tuning, recoup will generally run "flat out" on all processors all the time. Because of this we had to increase timeout values for certain ID's. With so much parallel processing, the risk of timeouts increased. Because our DASD queues grew with 2 or more CPU's performing chain chase, we do not believe we can improve on our best phase 1 time of 57 minutes. A few of our databases were cross-chained, which showed up as "double counted" ID counts after phase 2 completes. We assigned those databases to run on the same processor and the problem disappeared. US Airways is also using the online deactivation code, and is very satisfied with it. We have not yet tried the online reallocation but expect the downtime requirement for a reallocation to go from 45 minutes to about 10 minutes.
US Airways wishes to thank all task force members including IBM for the hard work and dedication they put into the package. For the most part, there was little money in everyone's budget to fund this effort, and so it took a bit of time to complete. All the people who worked on this package at the various companies were able to put together a world-class package, working remotely, with partial support at best from their management. It was a pleasure to meet and work with them, whose company in -15 degree Poughkeepsie weather we really enjoyed.
At British Airways the new product had to measure up to a very high specification in-house Recoup known as Probe. The 36 meg average in-use pool records, took about seven hours to chain chase. With the offline MVS job followed by online lost address report and subsequent analysis of the outputs. The overall turnaround of Recoup was approximately 24 hours.
The new package takes slightly longer during chain chase but saves a significant amount of time and operator intervention by removing the offline phase. The turnaround time has been halved. We hope to reduce this dramatically when we start to use the loosely coupled capabilities of the new Recoup. Although the dependency on an MVS system has been removed, BA has added a utility to send the Recoup reports offline, in order to continue using the analysis tools which are key to maintaining the integrity of the TPF databases.
The new Recoup introduced large-scale changes which are probably more extensive at BA than our partners. Consequently testing had to be very extensive. With it's heavy reliance on VFA for performance, the new Recoup makes high demands on resources for large scale test systems running under VM.
Implementing the new Recoup has been a substantial achievement and enhancement to our TPF systems which we will continue to build upon with further developments in the near future.
BA Implementation Team
After three years of work, hundreds of man-hours, and 140,000 lines of code, the new Recoup system has been turned over to IBM. Was it worth the effort? Absolutely, said Judy Jesuroga, Manager of Host and Communication Systems. This new system will save us time and open up a really tight utility schedule. Galileo participated in a three-year technical effort to rewrite Recoup, an important system utility that's part of the released base operating system for TPF. The project kicked into high gear in the last two years, when the programmers at Galileo were busy designing, coding, testing the new package, which loaded to all TPF 4.1 systems during the third and fourth quarters of 1997. This took a tremendous effort, not just from Galileo people but from people at US Airways and the other TPF User Group (TPFUG) Task Force members, said Judy. Working together, we were able to accomplish so much.
Galileo had modified and rewritten Recoup over the years to be fast, but there was still a need to refine it further. Galileo modified it to run at its optimum speed both online and off-line, rewrote the DYOPM code in Assembler and cleaned up the Chain Chase package. Here are some of our old runtimes for both the online and the off-line portions:
|System||Online Chain Chase Time||Off-line Processing Time|
|Apollo||12 Hours 54 Minutes||1 Hour 39 Minutes|
|European System (PRE)||8 Hours 7 Minutes||40 Minutes|
|North American Fare Quote (FQS)||1 Hour 15 Minutes||12 Minutes|
|International Fare Quote (IFQ)||2 Hours 5 Minutes||15 Minutes|
When we loaded the new Recoup package, which included Multiple Processors, Multiple Groups, and Tightly Coupled Chain Chase support, the following table shows the results.
|System||Online Chain Chase Time||New Online Phase 2/4 Process Time||# of Processors|
|Apollo||5 Hours 28 Minutes||24 Minutes||3|
|4 Hours 10 Minutes||24 Minutes||4|
|European System (PRE)||5 Hours 46 Minutes||21 Minutes||1|
|3 Hours 39 Minutes||21 Minute||4|
|North American Fare Quote (FQS)||44 Minutes||3 Minutes||1|
|International Fare Quote (IFQ)||20 Minutes||2 Minutes||1|
It should be noted that the two fare quote systems are part of a Multiple Data Base Facility (MDBF) complex. With this new version, we can run Recoup on the Fares system, the International Fare Quote system and the basic subsystem all at the same time, cutting overall runtime down even further.
Our next phase is to start tuning the various parameters of Recoup. We will be increasing the number of Entry Control Blocks (ECB's) to find the optimum number to run with during Chain Chase. We currently run with only 10% of our allocated ECB's. We also will be ordering our Descriptors to have the longer running groups start sooner to get more benefit from the Multiple group support. We are hoping to be able to shorten our Chain Chase time by another 20 to 25 percent with these minor changes.
Overall, this a been extremely rewarding project, said Ky Slickers, Consulting Engineer, TPF Systems. We were able to speed up a long-running utility and had a chance to work with some very knowledgeable people in the TPF community. It was great to see the members of the TPFUG Task Force come together to build a product that we hope all TPF shops will benefit from someday.
Amadeus has been involved since the beginning of the Task Force and was fully committed to the development, test and implementation of the package that the Task Force planned to deliver as a replacement for the IBM product. However, in the last months of 1995 Recoup Phase I/II in our shop was taking around 26 hours of elapsed runtime. The Amadeus growth projected for 96/97 would result in almost doubling our Database in size, thus increasing the Recoup runtime once more to an unacceptable level.
While we committed to continue participating in the Recoup Rewrite Task Force, it was deemed that the enhancement provided by the Task Force would not be ready in time to support our growth. We then decided to design our own Online Recoup, focusing on eliminating Phase II offline.
In July 96 we retrofitted 40% of the Recoup task force code into our base, and we implemented the package in our Production System in January 97. The Recoup runtime dropped to 5 hours Phase I/II total, with a number of 100 million Pool records. Currently our Online Recoup is at PUT03 level, with L/C and T/C support and has an average runtime of 6 hours with a database size of 160 Million Pool records. It should also be noted that we run Recoup in 4 processors and with 30% of our total ECB allocation, which we think can be further increased to reduce the total runtime.
Overall, and in spite of a long and painful Testing Phase, since we implemented Online Recoup the runtime reduction has fully met our objectives and production problems in Recoup are now almost non-existent. We believe that the Recoup Task Force has been so far the most successful task force in the TPFUG, being able to produce and deliver a product that fully meets most of the requirements of all TPF Users. It is now IBM's responsibility to release this product to all the users that can surely benefit by its functionality and quality.