To DPR or not to DPR: Is that the question

To DPR or not to DPR: Is that the question?
by Norman L. Laefer

When it comes to Dynamic Path Reconnect (DPR), TPF users have certainly suffered the slings and arrows of outrageous fortune. If you doubt that, ask yourself, "Why is this feature available in MVS, the UTS system, VM, and even VSE, but not TPF?" The only answer available from the vendor appears to be a brief statement in response to a user's question, that benefits would be small and of limited applicability. Rather than a generalized answer, of this sort, it would seem appropriate to explain why TPF users want DPR, the work Amdahl has completed to develop this functionality, and the benefits being derived by a site using it in a production environment.

What is DPR?
With the introduction of the XA architecture in 1983, I/O operations became the realm of the channel subsystem for the purpose of scheduling and routing device activity. Prior to that, the System Control Program (SCP) was normally responsible for selecting a path, scheduling the I/O based on path and device availability, and managing contention. When this function was moved to the XA channel subsystem, all paths between the processor and a device became known to this subsystem. This change allowed the device to dynamically select any of the predefined paths known to the channel subsystem for the purpose of reconnecting to the processor. While the DPR concept is simple, the effects on availability and performance are complex.

How does DPR improve I/O performance?
When starting an I/O operation (initial selection) to a device, TPF 3.1 has the ability to choose from any of the available paths that are defined for that device. If all paths are busy, the I/O must wait until a path becomes available. In a TPF environment this delay is typically 1 or 2 milliseconds. This capability is often referred to as "multi-pathing" and is a method of increasing availability through the use of alternative paths. Any performance improvement is relatively small. Multi-pathing is also a prerequisite for DPR. After initial selection and once the device is aware that it is correctly positioned for data transfer to occur, it requests a reconnection to the processor. With DPR all of the defined paths are used. Without DPR only the initial selection path can be used for reconnection. The window of opportunity for this request to be recognized is 1 sector(65 microseconds or less). If the device fails to find an available path on which it can be recognized as needing reconnection, it must wait for a complete revolution before initiating another attempt (approximately 11 to 17 milliseconds). The impact of the first failure to reconnect is 10 times greater than that of finding all paths busy during initial selection. When systems are heavily loaded, as is typical of TPF environments, reconnection may require multiple attempts which result in an even greater impact on performance.

In summary, "multi-pathing" allows TPF's I/O to be scheduled on any channel that is defined as having access to the desired device. Without DPR, only one path, the initial selection path, can be used for the reconnection portion of the I/O operation. With DPR, reconnection can occur on any one of multiple paths, thereby increasing the probability of finding an available path for reconnection and improving both the throughput and the response time for a given configuration. DASD control units that either do not disconnect from the channel or ignore attempts to activate DPR will not obtain performance improvements.

How will DPR affect my DASD capacity planning?
With DPR, capacity planning will be simplified because most modeling tools assume DPR is active. Until now, planners either wrote their own models, modified existing tools, or relied upon the vendor for this function.

Because DPR can be an economical alternative to using cache, planners can avoid cache solutions and would not be required to estimate cache hit ratios for future workloads or rely on sophisticated tuning efforts. Avoiding the use of TPF unique hardware features also pern7its greater flexibility in cross utilization of hardware between TPF and non-TPF environments .

Many TPF capacity planners have been forced to expand the number of DASD devices for I/O access reasons, even though there is sufficient space available on the existing volumes. These sites will be able to access a greater percentage of the space on their TPF DASD when DPR support is installed. In addition, with the smaller DASD configurations obtained with DPR, migration will be simpler while providing the same level of I/O service time.

Systems that are currently at their capacity limit can defer any planned expansion along with its associated cost and disruption by using DPR to increase the capacity of the existing configuration. In addition, excess capacity that may have been desired or required to cope with periods of reconfiguration, equipment failure, or the introduction of new (untuned) applications will be minimized or eliminated due to the internal string and channel balancing that occurs with DPR.

Why does TPF run without DPR?
In simplest terms TPF turns off DPR at IPL time by instructing the control unit to treat it as a non-XA SCP. Why TPF may have been coded to perform this function requires some explanation.

1) For a single-image TPF system operating in a uniprocessor (UP) environment the only reason may be that nobody was willing to ensure that all of the code necessary to handle exception and error conditions was present and tested. A less altruistic view might be that a hardware vendor does not have an incentive to spend development money on a project designed to reduce hardware sales.

2) For a single-image TPF system operating in a tightly coupled multi-processor (TCMP) environment it depends on whether the processor vendor requires modifications to the channel microcode or hardware to handle the high I/O rates typical of many TPF systems. At least one such RPQ/feature, "Channel Redrive", removes the support for DPR and is therefore incompatible with this functionality.

3) For loosely coupled TPF systems, it also depends on whether the processor required the modifications discussed above due to the increased number of device and control unit busy conditions that occur as a result of sharing DASD. The implementation of Limited Lock Facility (LLF) and the Extended Limited Lock Facility (ELLF) specify one path per device as a requirement and preclude any possible benefits from DPR. The Multi-Path Lock Facility (MPLF) implementation, while allowing multiple paths to a device specifically causes the 3990 control unit to ignore any attempt to turn on DPR functionality. Until a locking mechanism exists that does not preclude multiple paths to a device and/or blocks DPR, loosely coupled systems will continue to be denied this functionality.

What should I expect from DPR?
The results from DPR are highly dependent on the activity level, characteristics of the I/O, and the configuration of the DASD subsystem. There are three different ways to view the benefits.

1) Improved response time at fully loaded capacity:

During peak periods many TPF users will experience a reduction of 10% to 15% in the average I/O service time as measured from the application request to I/O completion. This is the time an application would wait for a physical 1/0 after issuing a FINWC type macro. In most systems the time associated with these events is the largest contributor to message life and ECB existence time. Because more main storage is required the longer an ECB exists, there is a cost associated with slower I/O response times. A more significant impact, prior to implementing TPF 4.1, is the availability of storage below the 16-megabyte line that frequently constrains the number of I-streams and the choice of processor speed. TPF users at or near the i6-megabyte constraint should view DPR as an appropriate method of extending the life of their system until TPF 4.1 is implemented.

2) Increased capacity at the specified end of life response time:

DPR increases the probability of a DASD device finding an available path for reconnection to the host and therefore reduces device busy time per I/O. This time is available to serve additional I/O requests at the same level of device utilization as without DPR. DPR also produces a favorable change in the shape of the response time curve. This is a direct result of going from 4 queues each with a single server to a single queue with 4 servers at reconnection time. These two effects combine to give most TPF users a 15% to 25% capacity increase with no I/O elongation.

3) Lower costs and increased reliability based on a reduced number of channels, control units, and DASD to meet the specified activity level and response time requirements.

The use of DPR allows all components of the storage subsystem to operate at higher levels of utilization without impacting I/O response time. Therefore, TPF capacity planners are able to design subsystems that obtain higher access rates for DASD devices while increasing string lengths. When desired or required it is also possible to minimize the number of channels With DPR active each of the components can be varied to a much greater degree than without DPR and an overall reduction of 15% to 20% in hardware costs can be achieved in most configurations.

Is the DPR code available?
If slings and arrows have no appeal and you are ready to take arms against a sea of DASD, then it will please you to learn that Amdahl has written, packaged, and installed the code required to activate DPR for TPF 3.1. At this time it is running in two production systems and their associated test environments. It is installed with the default status of "active" and only requires user intervention to deactivate the functionality. Initial analysis confirms the benefits fall within or slightly better than the ranges discussed earlier. This code has been designed for use with TPF 3.1 compatible processors at recent PTF levels and has been tested with all of Amdahl's current processor models in both single and multiple I-stream environments.

Opinions expressed are those of the author and do not necessarily reflect the views of the Amdahl Corporation. Amdahl and UTS are registered trademarks of the Amdahl Corporation.

About the author: Norm Laefer has worked in the data processing industry for 29 years, of which 14 have been in the airlines industry. He is currently the Manager of TPF and Airline Industry Marketing for the Amdahl Corporation in Sunnyvale CA.