Dear Mr. Fantasy...

Dear Mr. Fantasy, Play Us a Tune
by Mark Gambino, IBM TPF Development

What happens if you use an LU or PU without the appropriate control measures in place? If you are a physicist using Lutetium (Lu) and weapons-grade Plutonium (Pu), the answer is so shocking that the blast will make you "fall out" of bed. If you are a Systems Network Architecture (SNA) network administrator, the results are equally disastrous (although your neighbors will not mind as much) because an incorrectly tuned physical unit (PU) or even logical unit (LU) can cause failures, hangs, or outages in the network. When the PUs and LUs are connected to a TPF system, correct tuning is essential for satisfying the availability and throughput aspects every TPF customer requires.

Because the majority of the TPF SNA customers still have subarea (PU 5) connections, we will first focus on the basic principles of tuning the subarea environment. Each PU 5 link is controlled by a virtual route (VR). Connections between TPF applications and remote terminals (or remote applications) are referred to as LU-LU sessions. There can be many LU-LU sessions across each VR (link). Pacing is the primary mechanism used to control the flow of traffic in the network and is performed at two levels; VR (link-level) pacing, which controls the overall flow across a link, and session-level (LU) pacing, which controls the flow for a given LU-LU session. The pacing window size defines how many path information units (PIUs), or messages, a node can send before it must ask for permission to send the next window's worth of messages.

The TPF system uses fixed pacing for PU 5 links, meaning that the VR window size value determined at link activation time remains constant for as long as the link stays up. When the first PIU in a new window is sent, it includes a VR pacing request. The idea is that under normal conditions, a VR pacing response will be received before the current window's worth of PIUs are sent. If so, message flow is uninterrupted; however, if a node has sent a window's worth of PIUs and is still waiting for a VR pacing response, the VR becomes blocked, meaning that no more PIUs can be sent across that link until the VR pacing response is received. You might think that it would be a good idea to define a large VR window size value to avoid VR blocked conditions, but not so fast -- it is not that simple.

When resources like control blocks or buffers are running low, a node will withhold sending a VR pacing response to prevent more work (PIUs) from coming in until the amount of available resources rises back to an acceptable level. When a node that is running low on resources decides to withhold a VR pacing response, it can still be overrun, if the VR window size is large, to the point where the lack of resources reaches a critical level, causing shutdowns or outages. The goal then is to find a happy medium where the VR window size is large enough to avoid constantly hitting VR blocked conditions but small enough to prevent overloading an already congested node.

Benchmark testing with TPF systems has shown that a VR window size of 42 is optimal for NCP (37x5) connections. Note that your actual mileage may vary based on the message rate, average message size, ratio of input to output messages, number of links connected to the TPF system, and other factors; therefore, use a window size of 42 as a starting point and adjust the value accordingly based on the characteristics of your specific network environment. Once you have determined the appropriate VR window size, code it on the WINSIZE parameter that defines the link to the TPF system, either in offline ACF/SNA table generation (OSTG) or online with the ZNDYN functional message.

VR pacing controls the flow for an individual link, but a higher level of protection is necessary for a node like TPF that can have multiple PU 5 connections. Even when traffic at the link level is not congested, spikes in the overall message rate can run the TPF system low on resources (core blocks). In TPF 4.1 there are several parameters on the SNAKEY macro in keypoint 2 (CTK2) that designate when the TPF system withholds VR pacing responses based on the percentage of available block types. For these SNAKEY parameters (ILWPC, ILWPE, ILWPF, ILWPI, and ILWPS) to be effective, they must be coded with values greater than the corresponding input list shutdown values. When the TPF system is starting to run low on blocks, the object is to reduce the rate of input messages (by withholding VR pacing responses) in an effort to allow the level of available blocks to return to normal. If you do not code the withhold VR pacing response parameters in SNAKEY, the TPF system will not withhold VR pacing responses, which is likely to cause input list shutdown at the first sign of trouble (shortage of blocks). While in input list shutdown, the TPF system does not poll any SNA links, meaning that no input messages are read in. This can cause a domino effect where a bad situation is made worse.

When the TPF system is in input list shutdown, messages destined for TPF queue up in the network. If TPF stays in shutdown for too long, an unpleasant condition known as "deadly embrace results. The TPF system cannot send any PIUs because it is waiting for a VR pacing response, but TPF is also not polling the Network Control Program (NCP); therefore, the VR pacing response cannot be read in. The true deadly embrace condition is caused by a large number of TPF messages on the output (SOUTC) queue tying up enough core blocks to drive the TPF system into shutdown, but at the same time the messages cannot be sent (and core blocks returned) until a VR pacing response is received.

In a game of chess, it is sometimes necessary to sacrifice a pawn to save the kingdom. When the TPF system is your playing field, the SNA links are the pawns. When defining a PU 5 link to the TPF system (through OSTG or the ZNDYN ADD functional message), the VRILTO parameter specifies how long the TPF system waits when the VR is blocked and the system is in input list shutdown before TPF writes out a window's worth of PIUs anyway (even though no VR response could be read in). For cases where the VR is truly blocked (TPF is polling the NCP, but no VR pacing response is received), the VRTO parameter specifies how long the TPF system waits before breaking the link connection. The VRILTO value should be small (1 - 2 seconds), but determining a suitable value for VRTO is not as clear cut, particularly for NCP connections. The two main factors to consider are how long the NCP normally remains blocked and the rate at which the TPF system generates output messages for this link. The VRTO value needs to be long enough to survive brief NCP blocked conditions but short enough to prevent a large output message queue from running the TPF system low on, or out of, core blocks.

Just like one person can make a difference, a single LU-LU session can run the TPF system out of blocks if session-level pacing is not used. For LU-LU sessions where one input message generates one output message and the next input message cannot be sent until the response to the previous message is received, session-level pacing is not really necessary. Most terminal traffic is single-threaded and, therefore, self-paced by its nature. However, application to application traffic is a different story, particularly if the session is used as a one-way pipe. For example, suppose a TPF application needs to send a few megabytes of data, the TPF application sends the data (in multiple messages) at a much higher rate than the remote LU can process it, and session-level pacing is not used. Because the rate at which the TPF application generates output messages (issues ROUTC macros) is constant, the NCP will soon become congested because the remote LU is not processing the data fast enough. The NCP will react by withholding the sending of a VR pacing response to the TPF system which, in turn, causes the SOUTC queue in TPF to grow. Within a matter of a few seconds, the TPF system will run out of core blocks resulting in a catastrophic error (IPL).

Session-level pacing prevents one LU from creating network congestion and prevents the TPF system from running out of core blocks. Using the previous example, this time with session-level pacing, the TPF system will send a pacing request on the LU-LU session and exhaust its send window long before a pacing response is received. The TPF application continues to generate output messages (issue ROUTCs), but because the TPF system is waiting for a pacing response on this LU-LU session, the messages are passed to the output message transmission (OMT) package to be queued on file (not in core).

No matter what, the remote LU dictates the pace at which it receives messages from the adjacent NCP and that pace is the same regardless of whether session-level pacing is used or not. The question then becomes where to queue the messages until the remote LU is ready to receive them. Queuing them in the NCP can lead to this one session monopolizing all the buffers (resources) in the NCP, which means that all LU-LU sessions through this NCP are adversely affected. Queuing the messages in the TPF system on the SOUTC queue in core blocks can cause a TPF outage. The only alternative is to file queue the messages in the TPF system on the OMT queue; therefore, session-level pacing must be used. For LU-LU sessions across a PU 5 link, the session-level pacing window size is a fixed value and is specified in the logon mode table entry definition used by the remote LU (a value of 0 indicates session-level pacing is not used on this session).

Now that PU 5 flow control has been discussed, we will examine the PU 2.1 environment. For PU 2.1 links, there is no VR or equivalent concept, which means there is no deadly embrace condition to worry about. Where network congestion in PU 5 causes VR pacing responses to be withheld, the adjacent link station (ALS) goes into slowdown mode in PU 2.1. When the TPF system issues a write (I/O) operation, the ALS responds with slowdown, indicating that the ALS is congested, did not accept the data, and that TPF should not attempt to send more data until the ALS notifies TPF that it is ready to receive again. An ALS in slowdown mode is similar to the PU 5 VR blocked condition in that a large SOUTC queue can build up causing the TPF system to run low on, or out of, core blocks. The SLOWTIME parameter on the SNAKEY macro in CTK2 defines how long an ALS is allowed to be in slowdown mode before the TPF system will break the link. Note that the SLOWTIME mechanism is also used to break PU 5 links that are in slowdown mode for too long.

LU-LU pacing in the PU 2.1 environment is more robust because adaptive session-level pacing is used. Each pacing response includes the next send window size and the value is not constant (fixed). Instead, the ALS adjusts the value based on key factors like number of buffers available in the ALS and number of messages already queued in the ALS for this particular LU-LU session. The sending node can ask for the window size of an LU-LU session to be increased by setting the request larger window indicator (RLWI) in a PIU.

For example, if the TPF system wants to send a message on an LU-LU session but cannot transmit it immediately because TPF is waiting for a pacing response, output messages are being generated faster than the current window size can handle. The TPF system will request a larger window size in this case. If the ALS has enough resources to handle an increase in traffic, it will provide a larger window size value on the next pacing response. Adaptive session-level pacing offers better throughput and congestion control than fixed pacing.

One last reminder before ending this SNA tuning discussion - think of link-level pacing, session-level pacing, the SNAKEY parameters, and OSTG parameters as if they were safety belts in a car; the safeguards cannot help you unless you use them.