Networking Goes Wholesale
Mark Gambino, IBM TPF Development

Gorge User buys certain products from the local retail store. While looking at the date of manufacture, George wondered why it takes so long for the products to reach the retail store. Being curious, George visited the main plant of the company and discovered some interesting facts. The products are first produced at the main manufacturing plant, sent to regional distribution centers and then on to local distribution centers, and then delivered to a retail store where they finally reach the consumer. The other enlightening fact learned was that, in many cases, the products are repackaged along the way. For example, if an item is too large to fit in the delivery vehicles that are used by a given distributor, the item is disassembled into smaller pieces, shipped to the next location, and then reassembled. This process can be repeated several times before the product reaches its final destination. Why? Because each of the distribution centers is independent, deals only with the next center along the route, and is not aware of or concerned about the end-to-end route from the main plant to the consumer.

The mythical product distribution network just described illustrates many of the "challenges" (problems) that also exist in message/packet switching computer networks. Sequential order of data delivery is an important additional challenge that computer networks must face. Connection-oriented networks, such as traditional Systems Network Architecture (SNA) support, calculate the route to be used when the session is established and all packets for the session flow along the same route. The route is calculated based on the class of service (COS) requirements of the session (interactive traffic, batch traffic, and so on). Point-to-point routing is used, meaning that each node is responsible for delivering packets, in the correct order, to the next node along the route. Because all packets for the same session flow on the same path, correct order of delivery is guaranteed. If a packet is too large to flow across the physical network connecting two adjacent nodes, the packet is segmented into smaller pieces, sent, and then reassembled by the receiver.

Connectionless-oriented networks such as Internet Protocol (IP) do not establish fixed routes for sessions. Instead, each packet contains the address of the destination node. When a node receives a packet and that node is not the final destination, it examines its routing tables to determine where to send the packet next. Packets for a given session can flow across different routes; therefore, the destination node must be prepared to handle data received out of order. Intermediate nodes practice the "send and forget" philosophy (also known as "send and pray"). The origin node that first built the packet, not the intermediate node, is responsible for detecting and resending lost packets. The segmentation and reassembly of packets issues that exist in connection-oriented networks also exist in connectionless-oriented networks.

Both networking models have some good points and bad points. Some people say to always look on the bright side of life. However, for the purpose of this discussion, we need to focus on the negative points, especially when talking about high-volume transaction processing. The "Achilles' heel" (main flaw) of connection-oriented networks is that each node along the session path is a single point of failure; if any of the nodes fail, the session also fails. Connectionless-oriented networks have two inherent weaknesses:

  1. No class of service capabilities. This means that all packets are treated equally. For example, if interactive (high-priority) traffic and batch (low-priority) traffic share the same network, increasing the amount of batch traffic will adversely affect the response time of the interactive traffic as well.
  2. Limited congestion control. User Datagram Protocol (UDP) has no flow control. Transmission Control Protocol (TCP) regulates data flow based mainly on conditions of the endpoints, not the transport network. When an intermediate node becomes overloaded, the problem is often made worse by TCP retransmitting the same packets over and over, resulting in network storms.

Rather than having a choice of only steak or lobster, the designers of the High-Performance Routing (HPR) architecture chose the "surf and turf" approach. HPR incorporates the best features of connection-oriented and connectionless-oriented networks and, more importantly, avoids the flaws of each of these network models. Logical pipelines called rapid transport protocol (RTP) connections are established between a pair of nodes that are not necessarily adjacent. One or more LU-LU sessions flow over an RTP connection. All sessions using a given RTP connection have the same COS, which allows HPR to continue to provide the COS routing advantages that exist in today's SNA networks. HPR is an extension to the Advanced Peer-to-Peer Networking (APPN) architecture and uses the normal APPN search algorithms to calculate the route for a session. The route is examined to see if it supports HPR and, if so, the session is assigned to an RTP connection. All packets for an RTP connection are sent along one route until a failure is detected at which time the path switch process calculates a new route, and then all subsequent packets flow along the new route. HPR packet routing is connection-oriented because the header of each packet indicates the specific path in the network to follow. However, packet routing is also connectionless-oriented in nature because the route that packets for an RTP connection will take can change (if a failure occurs). The question then becomes: "Is HPR a connection-oriented or connectionless-oriented protocol?" For decades, the scientific community has been debating whether light is a particle or a wave. The answer to both questions is "yes."

Except for when a path switch takes place, all packets for an RTP connection flow on the same route and, therefore, will arrive in the correct order at the destination. The destination node will correctly resequence any data that arrives out of order. When there is a network failure, data sent by one RTP endpoint will not be acknowledged by the remote RTP endpoint. This triggers the path switch process to begin. To recover data that was lost, the destination (receiving) RTP node tells the source (sending) RTP node what messages were lost and need to be retransmitted. The sending node never retransmits data unless asked to do so by its partner. This prevents the network storm conditions that occur when the sending node is responsible for determining when to retransmit messages and ends up sending the same messages over and over. The path switch process has the following benefits:

  1. The process eliminates single points of failure in the network (assuming alternate routes exist).
  2. It is nondisruptive in nature because no data or sessions are lost.
  3. It does not require operator intervention.
  4. It is transparent to the end user (the user does not perceive an outage).

The primary flow control mechanism used by HPR is the adaptive rate-based (ARB) pacing algorithm. ARB pacing is proactive, meaning that it is designed to detect and prevent congestion conditions rather than wait for problems to occur, and then take action (reactive mode). For those of you whose blood pressure jumps 50 points when you hear the term "virtual route (VR) blocked," you will be happy to know that ARB pacing is time-based, not window-based like VR pacing and, thus, avoids deadlock conditions. ARB pacing takes into account not only just the conditions at the receiving endpoint, but also the intermediate nodes as well.

Regardless of what network you look at, segmentation and reassembly by intermediate nodes is costly. Part of the process to establish an RTP connection is to determine the smallest link size of all the hops along the route. The endpoints of the RTP connection never send packets larger than that size; therefore, intermediate nodes can always forward packets and do not need the segmentation/reassembly functionality.

From a functional point of view, HPR is a superior network technology. It was designed with high-volume networks in mind. By moving most of the functionality to the two endpoints rather than have every node along the route duplicating function, end-to-end throughput is greatly increased. Less storage and cycles are required in the intermediate nodes now because their sole purpose is to forward a packet to the next hop specified in the header of the packet. Intermediate nodes do not need routing tables. Error recovery is the responsibility of the RTP endpoints, not the intermediate nodes. All of these features allow high-speed packet switching backbone networks to be used to transport HPR traffic.

APAR PJ25760 on program update tape (PUT) 9 provides HPR support (RTP node support) for the TPF system. You capacity planners out there might be wondering: "Because HPR moves most of the work from the intermediate nodes to the endpoints and the TPF system is an endpoint, does that mean I am going to use more machine cycles on TPF now?" While we would be more than happy to sell you bigger processors, the answer to the question is: "No, HPR does not have a longer path length in the TPF host when compared to traditional SNA support." How can that be? The RTP endpoint must be prepared to handle data out of order, detect network failures, and request a new route (path switch), but those are not the mainline code paths. The performance-critical paths are network restart and data flow. Those of you who sign the checks out there might be wondering if there is a hidden price tag somewhere. HPR is new software only; no new hardware is required. In fact, you might even be able to decrease the amount of hardware in your network because fewer intermediate nodes running HPR can provide the same capacity as more intermediate nodes running traditional SNA support. So, if cost is not an issue, what about migration? Once you get to APPN, HPR is basically free. The software determines when and if HPR can be used on a session by session basis. Once you install HPR support on nodes, that information is dynamically discovered by the remainder of the network (no need to manually update definitions in many different places). In summary, the issues with connectionless-oriented networks are too hot, connection-oriented networks can be too cold, but HPR is just right.