Preparing either software or hardware for use in an environment as demanding as TPF is problematic

TORNADO : A Blast Of Fresh Air For TPF Stress Testing
by Patrick O'Connor

Preparing either software or hardware for use in an environment as demanding as TPF is problematic. However well your personal testing went, on that minor coding change, you know that there is always that chance that when it is loaded live, and many thousands of scenarios occur within minutes or hours, some volume-related problem will occur and cause unknown repercussions. It is often the same for new hardware. A device might perform laudably, both under specific device-based volume testing, as carried out by any self-respecting manufacturer, or when connected to a TPF test system, with ‘real’ TPF entries being made by testers. Unfortunately, that same device, when faced with a TPF system under stress, and itself under the peculiarly TPF-style access profiles, might well exhibit alarming and totally unexpected behaviour. This article aims to present the problems facing testers, mention how these problems have been tackled to date and then discuss a new product, Tornado, which might be a useful addition to the TPF developer’s toolbox.

Software Testing In The TPF World
How would we all like to test our new software/hardware ? There are some basic goals when testing, for example:

Ensuring software/hardware works in all logical situations
Ensuring software/hardware works in all illogical situations
Ensuring software/hardware works under high loading

The only real way to ensure all software/hardware works in all logical conditions is to create all those logical conditions in the test environment. This is most often attempted by the use of automated scripts. The same applies to the illogical conditions, although here some other interventions may be required, such as pulling cables, IPL’ing at inopportune moments etc. etc. Unfortunately I can offer no short-cuts for these steps. The best hope for the future is that as applications become more modular (object-based) then the amount of testing per change should reduce as code changes are more isolated in their function (the so-called Black-Box testing).

Item 3, on the other hand, will not be helped greatly by object technology alone. There will still be the need to see how the new or altered software/hardware performs under as realistic a system load as possible.

Up to now the principal technique used to create such system loads has been some form of Capture/Replay package that captures input messages, often from a production system, and then replays them into a test system. The input message rate to the test system can be adjusted to increase the load. This seems to be the perfect solution to 3 because it is using real messages, into a real TPF system. There are many drawbacks to the use of Capture/Replay in practice however.

The main problems are:

A full duplicate database must be created to accept the captured messages
If messages update the database they cannot be re-used without a restore of that database
There are problems associated with not having a network in the test environment i.e. where do the responses actually go ?
There will probably be some message mixes that will produce illogical conditions if the message rate is increased too much i.e. if time is artificially ‘compressed’
There is no guarantee that any set of captured messages is particularly ‘representative’ of system activity
It is hard to repeat tests with the same conditions for comparison

A Different Approach To Volume Testing
Most TPF shops have someone responsible for monitoring system performance, probably using IBM’s Data Collection package (sometimes augmented by ETIM). To a performance specialist the activity of a TPF system consists, not of logical strings of processes intended to achieve a data-oriented result, but as a dynamic collection of work items being handled by the TPF Control Program or the I/O subsystem of the processor itself. Once the ‘restriction’ of needing to consider ‘real’, ‘logically connected’ data is removed, some of the problems we’ve mentioned about volume testing are also removed.

This was part of the thinking behind the design of a new TPF tool called Tornado. Tornado is a TPF Activity Simulator, to my knowledge the only one that has been developed so far. It approaches the volume test problem by creating system activity based on information from the Performance Measurement reports and an understanding of the transaction profiles of key applications.

Central to its operation is the concept of a Transaction Profile. This is really just a sequence of system events that when put together represent the actual TPF activity produced by a known application on a particular system. This might be an Availability transaction, a Z-functional message, a utility of some sort, a dump or any definable collection of TPF system tasks. The Activity Profile of any TPF system can be expressed as a collection of Transaction Profiles. As overall TPF system-wide activity changes over time this can be represented by changes to the frequency of individual Transaction Profiles or changes to the mixture of Transaction Profiles themselves.

Basic operation of Tornado involves the following stages:

Collection of Application activity information (via Data Collection, ETIM, etc.)
Building of a Tornado database from the Application workload profile
Running Tornado
Analysis of performance measurement output (Data Collection, ETIM etc.) or any errors if this was to test system-level code or microcode
Reset system to re-run test or prepare for a different application workload profile
(repeat step 3 or begin again at step 1 or 2)

Tornado is a low-level simulation of Application workload and distills application activity into the constituent parts relevant to its usage of TPF resources i.e. instruction path-length, I/O activity, VFA-usage etc. By such low-level simulation it makes it possible to refine an existing simulation to accommodate application modifications relatively easily.

TPF applications tend to operate in similar, basic ways:

Accept input
Retrieve data from database
Act on information within the data
Update (or not) database
Send output (or not)
Exit

Tornado simulates this workload closely by performing similar tasks. The action of building the simulator database causes records to be built, out on the TPF database, which contain information within them that will be acted upon by the simulated transactions themselves. Some of the information held in these records is:

How many (and what type of) instructions to perform before the next I/O operation
How many data records to retain simultaneously
What sort of record to retrieve next (size, residency, type etc.)
Whether to take a dump
Whether to perform the next retrieval with ‘hold’

A single transaction type, as defined to Tornado, will make use of defined ordinal ranges within the database in order to generate the simulated activity for that transaction profile. All transaction profiles are controlled from a central table which holds information about each profile such as:

How frequently to activate this transaction type
What database records hold the relevant information for the simulation
Whether the transaction is restricted to a given instruction stream, CPC or both
etc. etc.

More advanced features include the ability to ‘randomize’ usage of the ordinals within the simulator database to provoke collisions on I/O requests etc. and the inclusion of a facility to programmatically ‘flush’ VFA buffers and DASD cache areas between tests to allow more accurate comparisons between runs.

This descriptor technique also allows the creation of multiple databases, if required, which can also be easily archived (e.g. using DBR or something more sophisticated…) and kept ready for use with minimum lead-time.

Overall control of the simulation itself is via an in-core table and the use of Z-msgs to preset conditions for the simulation or alter characteristics during the simulation run itself.

There are certainly features of the Tornado tool that are not ideal. Since it creates its own database of descriptors, to hold the data it needs to process the simulation, it takes up valuable space on the TPF database. The actual amount of space is determined by the user, and to a lesser extent by the variety of transactions or workload simulations they wish to have online at any one time. Theoretically there need only be one descriptor and one fixed file record (with pools chained from it) to produce a simulation. Of course there would be some serious contention for those records if you tried to run at 2000 such transactions per second !? So far it has proved possible to populate suitably sized Tornado databases in ‘spare’ areas of existing test systems.

Another tool supplied to assist in making subsequent runs of the simulation as consistent as possible is a means of flushing DASD cache and processor VFA buffers. This is based, in the current release of Tornado (v1.5) on a simple method of displacing cached or VFA resident records between runs with other data from the database. For some of the larger systems, with outrageous amounts of cache per string, this might seem a little pedestrian and programmatical means to ‘reset’ cache and VFA are in the pipeline.

What the Tornado package allows is the creation of ‘real’ TPF activity that is almost infinitely variable, repeatable and consistent. In fact Tornado has already proved valuable in verifying system configurations and new hardware, e.g. new CPCs, new DASD, new Tape drives, new channel configurations, IPC connections, MPLF etc. Hitachi Data Systems (HDS) in Santa Clara, CA used Tornado as one of its principal tools for testing the Skyline CPCs with TPF. Marriott International also recently used Tornado to verify performance of IBM’s RAMAC-2 DASD prior to their installation for the MARSHA system. In the case of Marriott it was possible to accomplish what otherwise might have taken weeks in a couple of days.

It is not only hardware and configuration testing that would benefit from the Tornado approach to creating TPF activity. Many shops run some form of shared test system, on which new developments and maintenance changes are loaded and left (to ‘simmer’) for a period of time, to allow the changes to exist on a system that has some activity. Tornado could be used to provide a level of background activity which would add to the effectiveness of the test and which could be tuned for some levels of stress testing out of hours for example.

Because of the level of adjustment that can be made to each ‘Transaction Profile’ within Tornado, there are many specialised uses it could be put to, from unit testing, through package testing. An example might be an installation that is considering expanding from a uni-processor to a loosely coupled complex (or potentially an existing loosely coupled user) that wishes to determine whether they have a requirement for Communicated Candidates for VFA. It would be relatively straightforward to simulate the situations of both retaining potential shared records as VFA candidates, and using some form of communicating mechanism, or to make the records non-VFA and see if the DASD subsystem could cope using DASD cache alone. Several scenarios could be quickly modelled and precise determination of any database configuration changes could be made. Essentially there are numerous possible applications for Tornado, probably most not envisaged by the original designers !

In the future it might also be possible, through collaboration with IBM, to develop a direct relationship between Data Collection output and Tornado input, which could make the setting up of a system-wide activity profile almost automatic. Even before that level of integration is achieved Tornado seems to provide a host of useful, and unique, features to compliment any installation’s test tools. For anyone not entirely convinced of its usefulness, Tornado can even be employed on a ‘project’ basis, without the need to purchase the product. It is then possible to buy it at a reduced cost if you find you just can’t do without it.

Additional information on Tornado is available at: http://www.sebek.co.uk