Reduce I/O Rates

ALCS Tuning
by Geoff Lowry

What with Year 2000, new code share agreements, the Euro, the Asian crisis, Electronic commerce, everyone in IT is very busy. Shortage of skills means that non-essential work is delayed. Performance of the "Res" system is overlooked, or only worked on when response times reach critical thresholds. "If it’s not broke, don’t fix it!" is often used as an excuse not to spend time on this function. You may want to focus on performance of the ALCS system for some very worthwhile reasons.

Many airlines in the Asian region, had, and still have, much pressure on improving the profitability of their airline. All budgets are scrutinized and all managers are made to look for savings. Purchases are delayed.

By tuning ALCS, and reducing the CPU cycles consumed, it is possible to cater for growth without the need to increase budgets that would otherwise have to allow for increased license costs from the higher software group rating on a larger CPU.

Reduce I/O
Each real I/O consumes CPU, channels and devices. This is one of the major consumers of CPU in the system. It is also a major contributor to transaction response times. Save an I/O and you save CPU and response time. Double whammy. Saved I/O also reduces channel activity, device activity, controller activity, enhancing I/O response times of the remaining I/O. Triple whammy!

VFA
The first "knob" to twiddle is VFA. This does provide a rapid payback, but…you have to have the memory! Hyperspace does help, but a "bigger bang for your buck" is obtained from a similar increase in VFA. With massive amounts of central storage available in the newer processors on the market, it is now possible to utilize larger VFA to reduce I/O.

The sizing of the various VFA buffer sizes may be done by trying to maintain a similar "minimum residency time" for each record size. That is, how long a record would remain in VFA if referenced just once. This does not increase linearly with VFA increase, as the improvements in VFA hit rates positively influence this "residency time". By allocating available memory for VFA in this manner, a more balanced hit rate across record sizes is obtained. By increasing VFA it is quite possible to achieve read hit rates of 95% and write hit rates of better than 85%. Saving a write is always better than saving a read, as one write saved is two I/O.

Increased record sizes
This is a longer-term approach to tuning. With TPFDF records it is a simple matter (sometimes) to increase the physical record size to accommodate more LRECs, so reducing real I/O. This has a double up effect when using block indexing support as more indexes in the larger prime record can reduce I/O even more.

New records defined to the system and new database records should always be defined to use the largest possible record size where this has the potential to reduce I/O.

Short-term pool errors
Short-term pool is a great way to save I/O. A record can be dispensed, filed, found, and finally released without as much as one I/O ever taking place. Neat huh? When applications "forget" to release short-term pool records, the records ultimately have to be written out to preserve their contents, even though they may never be referenced again. When ALCS introduced long-term pool errors, some smart programmers "fixed" these by making certain record IDs short term records. Did someone mention HY records? While this hid the pool usage problems, or should I say pool abuse problems, it did nothing to help the performance of the system.

With some installations experiencing more than a million short-term pool usage errors per day, mostly timed out short-term records (not released after more than 24 hours) this means that each of these errors cause two unnecessary I/Os. Not only does this consume CPU cycles to perform the I/O, but it also reduces VFA efficiency with a "flushing" effect that adds to the I/O burden. While it is not possible to fix all short-term pool usage errors, it is the "big hitters" that need to be targeted. It should be feasible to reduce this by 60% to 80%.

Delayed file
This is always a good idea to enhance VFA performance. Typically all long-term pool defaults to "file immediate". Every time a long-term pool record is updated, it is written out, twice! Now, if we make these same records delayed file, then they may be updated several times before reaching the top of the age list when they will be written out. This has the potential for saving many I/O operations for records that may be updated frequently. For example, Resource Control Records (RCR) for printers may be updated many times a minute when the printer is busy. By making these delayed file a significant amount of I/O may be saved.

For this to work, you will also need sufficient VFA to hold the records long enough to avoid the multiple I/O from the multiple updates. This is where the "minimum residency time" theory comes to the fore. Just remember that when a record is marked as "delayed file", ALCS assumes that you don’t need to log this record. So if you need to log this record for backup reasons, then the log exit will also need to be coded to force logging for this record ID. Many records may be low integrity and really do not need to be logged. Records that hold statistics, agents assembly areas (AAA) etc. Another thing to remember here is that if the forward chains are delayed file, then make the head of chain the same. This does prevent some nasty chaining problems after an outage where VFA may be lost. This is a very rare occasion at most sites.

Enhance regular run utilities
Most ALCS installations will have routines that run periodically, perhaps initiated by some automated process (TFT). Sine-out police is one, host-to-host timeout, and perhaps others. By reducing the work that these utilities have to do can save valuable CPU cycles. For example, a "sine-out police utility" may read every AAA to determine if each AAA needs to be signed out due to a certain period of inactivity. By placing a time stamp in the user data area of the comms table entry with each input message, much of this I/O can be avoided. The time stamp can be interrogated through a COMIC to quickly remove a AAA from being a candidate for sine-out and thus avoiding the I/O in reading in the AAA. Also avoiding the overhead of the FIND even if the record is on VFA. The sine-police will run in less time, and generates significantly fewer I/O. Other utilities that run every 5 minutes to check for certain timeouts, may be changed to run every, say, 7 minutes, for example, with very little change to the function of the utility, but saving more CPU and I/O by running less frequently.

These all add up to saving CPU and ultimately money...

Reduce CPU
By reducing I/O this will also save CPU cycles. There are more ways than just I/O avoidance to save CPU cycles.

Dumps
How many dumps per hour, during peak times, do you experience? 100? 50? Just think that each time you experience a dump, the ALCS system freezes and does nothing else for a number of seconds while this dump is written out to the DIA file. How long is ALCS frozen for? This depends on your dump options and the size of your system. Seven seconds is not an unusual time for an unsuppressed dump. Not a long time, but if you get one of these a minute, that is a little over 10% of your CPU you are throwing away! For six or seven seconds, out of every 60, the ALCS system is frozen. Okay, now I do know that none of your systems are that bad, but it does illustrate the effect that dumps have on your system. Application programmers need to be aware of this impact so that they may code decent messages, rather than issuing a dump. Even NODUMPs consume resource.

Activity Control Variables
These "knobs" are the easiest and quickest to adjust in times of stress. When the response times get slow, lower ACV6. After hours, to get recoup to go a little faster, increase ACV6. Hmm… The more parallel ECBs, the faster the process… Not always! Sometimes the opposite is true, especially if you are already a little tight on CPU cycles. ALCS has this wonderful device that runs around in the system whenever an ECB requests a resource that is currently held by another ECB; the "Deadlock Detector". This is essential to prevent the "Deadly Embrace" situation, where one ECB wants a resource that is held by another that is waiting for the resource that the first ECB has held already. Now, if you have some ECBs running that all require to hold the same resource, then the more ECBs there are in the system doing this, the larger the queue of ECBs for this one resource. With each ECB requesting the resource, the deadlock detector goes to work, checking all the deadlock chains to ensure that there is no deadly embrace about to happen. The longer the queue, the more work the deadlock detector has to do. So if I have say, AV6 set to allow 100 ECBs, to process a single utility that writes out to a single sequential file, and the sequential file is doing, say 200 I/O per second, then there is potential for the deadlock detector to check the queue of 100 ECBs, 200 times a second. This consumes and inordinate amount of CPU cycles (greater than 50%) so slowing down the process due to lack of CPU cycles. By reducing the activity control variable, the process has fewer ECBs, shorter queue for the single resource, and more CPU cycles available to drive the process.

This also affects on-line transactions, if you have, for example, a common function that all ECBs go through that need to hold a resource. For example, you may write all your input messages out to the RTA sequential file. If some interruption occurs in the system, say a dump, for example, there could be 200 ECBs all queued up waiting for the RTA to do their write. The deadlock detector would be going crazy at this stage, so prolonging the interruption by consuming the much needed CPU cycles that are required to write out the input messages and process the transactions. The answer is to have the ACV values set to the lowest the system can tolerate, without affecting performance and response times. Then when there is a dump, the recovery is far quicker.

Reduce ENTER/BACK activity
How many ENTER/BACK macros are executed every second? A whole bunch of them. How much CPU does this consume? A fair whack! By combining programs this can be reduced. I combined two programs that had been separated many years previously due to the 1K size restriction that was enforced at the time. This was reported by SRG as executing 30% of all enter/back activity in the system. The net saving of this exercise was 10% CPU. Sometimes it is quite easy to combine programs that are small enough to fit together. Other times, it becomes more complicated, where both programs have similar subroutines that once were the same before they were split. A register required for saving the return address may be difficult to obtain. A save area to save registers so that they can be reused also used to be a problem. With the Local Program Work area (LPW) this has made it much easier to combine programs. Previously an ALSC block had to be obtained for this purpose. I am not sure that overhead is not similar to ALASC usage, but it is easier to code and use.

Pool Usage Errors
Like all errors, dumps, etc., these just waste resource. With vast quantities of pool usage errors, especially noticed when short term pool usage error reporting is turned on, the waste can make it worth while to investigate and fix the "big hitters". As mentioned previously, the impact on VFA hurts the most, but scanning the short term pool directories looking for an available record, also costs in CPU cycles.

Hyper Active Terminals - "Budgies"
How many PCs programs and macros are initiating transactions and consuming resource on your system? At one site it was found that over 8% of all transactions came from fewer than 0.1% of the configured terminals. The transaction rate was greater than one per second from all of these. Analysis of these transactions showed many of them were mindlessly checking empty queues, not very efficiently either, doing a sine-out and sine-in for each iteration. By working with the PC program developers, throttles were implemented that slowed the "empty queue check" to once a minute. This saved another 8%.

We found one "Budgie" that network guys had created, just to monitor network response times. This just sat in a loop displaying a AAA sine-in status… Three times a second!

What are the benefits?
Apart from personal satisfaction, the following lists some of the benefits obtained from a tuning exercise.

Avoided increased software charges
Reduced ECB life
"Smoother" running system
Tolerate instantaneous peak loads
Fewer system and application problems
Faster running utilities

Tuning the system is not a "once-off" exercise but does need to be revisited periodically, as transaction loads change and applications evolve. This effort can be justified.