Channel Surfing Is Not a Sport

Channel Surfing Is Not a Sport
by Mark Gambino, IBM TPF Development

In the 1980s, with the introduction of 181-channel cable television (TV) and armed with a remote control, a disease known as channel surfing became prevalent in our society. The main symptom of channel surfing is that on one TV set, a person attempts to watch multiple programs simultaneously by constantly changing channels with the remote control. The more programs you try to watch, the more difficult it becomes to follow the story lines. In the early 1980s, Networking Neil was channel surfing (watching the different ice hockey playoff games). Suddenly, an idea came to Neil: If my moderately priced TV set can manage multiple programs, a single channel on my TPF system should be able to do the same thing with multiple "slightly more expensive" 3725 communication controllers. Because it was a given that the New York Islanders were going to win hockey's Stanley Cup (again), Neil turned off the TV set and headed to the raised floor to do some recabling. His tests showed that it was possible for multiple 3725s to share the same channel.

In the early 1990s, this practice of sharing a channel continued with 3745 high-speed communication controllers. Recently, many TPF customers started adding the new 3746 communications controller (Model 900 or 950) to their network. Unlike the 37x5s, which use parallel channels, the 3746s use Enterprise Systems Connection (ESCON) channels. Networking Neil wondered what differences there were in the operation and capability of parallel versus ESCON channels; therefore, he conducted some joint experiments with the IBM TPF lab. The primary goal of these experiments was to find out if multiple 3746s could be connected on the same channel to a TPF system and, if so, what total message rate could be achieved over the channel.

The results of the tests were very positive. Two 3746 Model 900s were connected on one channel to a TPF system. We were able to drive over 900 messages per second across the channel. The definition of a message is an input message and its corresponding output message. This meant that over 1800 path information units (PIUs) per second flowed over the channel. At this message rate, the channel utilization was observed at around 90%. Diagnostic information gathered at various times throughout the tests showed that channel utilization scaled linearly based on the message rate. For example, at 268 messages per second, the channel utilization was 27%; at 797 messages per second, the channel utilization was 79%. To maximize channel efficiency, the number of Network Control Program (NCP) read buffers was set to the maximum value (MAXBFRU=32) and UNITSZ was set to a value large enough to hold an input message (to prevent an input message from needing to be split across multiple read buffers).

Mastering the Basic Skills of Reading and Writing
The other goal of the tests was to exercise the new Systems Network Architecture (SNA) polling code developed by IBM. To understand what this code does, we must first examine some lessons learned from past experiences. The issues are when and how often the TPF system will issue a WRITE channel program (send messages to the NCP) versus a READ channel program (receive messages from the NCP). The original implementation used many years ago was the "wax on, wax off' approach that alternated reads and writes (do a READ, do a WRITE, do a READ, do a WRITE, and so on). However, this approach does not work well in all environments. For example, if the ratio of input to output messages is not equal (an input message generates many output messages), a queue of output messages builds, which eventually causes your TPF system to run low on core blocks and go into shutdown mode.

In 1991, the SNA polling code logic was changed to use the "save yourself' or "women and children first" technique. In this approach, the TPF system continues to issue WRITEs as long as output messages exist. A READ is issued only when the SNA polling interval expires, which back in 1991 was every 50 milliseconds. The rationale behind this approach was to make sure that old work has finished (output messages have been sent) before new work (more input messages) is accepted. In most environments, this approach works well, but like most static designs, there are cases when it does not work well. At peak network activity (very high message rates), a queue of messages destined for the TPF system can build in the NCP because a READ is issued only when the SNA polling interval expires. By doing a WRITE whenever there are output messages to send, it is possible for many WRITEs to be issued, each sending a small number of messages, which is not the best use of the channel.

Sophisticated NCP Adaptive (SNA) Polling
To make the SNA polling code more efficient and dynamic, new logic has been added by APAR PJ24833. When a READ or WRITE operation is completed successfully, the code now checks the number of output messages on the SOUTC queue. If there are enough output messages to fill up a channel program, a WRITE is issued to prevent a large queue from building in the TPF system. If the size of the SOUTC queue is small or empty, the number of channel command words (CCWs) executed in the most-recent READ operation is calculated. If a full or nearly full READ channel program was executed, go ahead and issue another READ now because a queue of messages is likely building in the NCP. If the TPF system does not need to issue an immediate READ or is unable to (because both sets of read buffers are in use), a WRITE is issued if any output messages exist. The combined benefits of this new logic are that it:

Adapts based on current network conditions to handle all message rates
Prevents queues from building in the TPF system or in the NCP
Maximizes channel efficiency by blocking PIUs into a channel program when possible (less start I/O commands means better channel utilization)
Works for both parallel channels (37x5s) and ESCON channels (3746s).

With the performance of 3746s no longer an issue, Networking Neil was able to return home and resume that other channel surfing activity.