On-Line Transaction Processing: Achieving Efficiency in TPF Through Advanced
by Thom Kolton
IBM's Transaction Processing Facility (TPF) is the industry leader in on-line transaction processing (OLTP). Yet some companies, frustrated by its shortcomings, are seeking alternatives to support their business needs. Mid-Atlantic Systems Design Corp. (MASD) explores the facts behind this operating system and offers insight into TPF's strengths and weaknesses. With this perspective, MASD introduces the Greyhound Relational Database Management System as a key to new system strategies, helping companies to focus on industry standardization throughout the organization.
I. Introduction to TPF
TPF was originally developed over thirty years ago to support computerized airline reservations systems. Today TPF is found in a variety of industries: banking, commercial credit, consumer finance, government, health care, hospitality, on-line services, pharmaceutical, and railroad. The endurance of this system over three decades is indicative of the unique quality TPF possesses as an operating system: very fast and cost-effective transaction processing. In its primary role as a point of sale system, TPF is indispensable to the vast majority of companies who use it.
TPF has undergone continual improvement since its first introduction. In its last major release, IBM has implemented numerous improvements, delivering a mature and robust operating system with excellent performance and data security features. These improvements include the support for a large subset of the "C" language and TCP/IP communications protocol. Although a proven powerhouse in the world of OLTP with unparalleled performance, TPF is not without significant and costly limitations. Accordingly, some companies are exploring ways to move away from their TPF operating system and onto newer platforms. There are several reasons for this, grouped into three major categories: interoperability, application development environment, and user interface. The significant points on each topic are discussed below.
In the early days of computing, there was little thought given to data sharing between TPF and other operating systems. Data needed between TPF and MVS, its sole companion, was shared via tape. As a method to improve data sharing capabilities, IBM developed a product called TPF Application Requester (TPFAR).
TPFAR is not a database management system, but a product which permits TPF to share information with a DB/2 database residing on MVS. TPFAR solves some of the issues where the need for up-to-date information between TPF and MVS's DB/2 database is required. Still, this communication is limited to those two operating systems only, and applications which must communicate electronically to other systems must resort to other means. These might be through Edifact, proprietary message formats, or "screen scraper" applications. In all cases, this requires work at the applications level, rather than the database or system level.
III. Application Development Environment
Based on experience, Management believes that TPF is difficult to program and that the resulting systems are often inflexible to change. This perception is fact. In spite of the incremental improvements to the development environment introduced over the years, key areas of application support have not been adequately addressed. Consequently, TPF programmers face many of the same problems today as they did twenty years ago.
In order to speed application development while improving maintainability and lowering costs, three fundamental areas of application development must be addressed: application framework, programming languages, and database access methods.
When computer systems were first developed, guidelines were established to help the programmer write applications to work within the existing environment. As application systems grew and became more complex, it was clear that a formal framework was needed to help control and facilitate development. The concept of a standardized application programmer interface (API) was born. From mainframes to personal computers, APIs are now standard components of most operating systems to aid in application programming. Yet for TPF, such a framework was never established.
Today, each individual installation has its own set of guidelines for program development, guidelines which are easily ignored and occasionally simply violated. Because of this, programmers must spend an inordinate amount of time in researching resource utilization, such as common work areas and I/O data levels, before writing any actual code. Even careful analysis still occasionally leads to missed or undocumented usage, resulting in program errors or database corruption. This lack of framework for program control results in additional programmer work, adding to the cost and time requirements for developing and maintaining application code without direct benefit to the company.
While new programming languages are continually introduced on other platforms, TPF has few from which to choose. Aside from some minor products, the two most popular programming languages in TPF are Basic Assembly Language (BAL) and "C" language. BAL is dominant in TPF because of the programmer's theoretical ability to write fast executing code, even though it is expensive to write and maintain. Based on the speed of computers, and also on the actual code efficiency found in application programs today, the original reasons for writing in BAL have lost validity. The only good argument for BAL in new development today is the pool of available TPF BAL programmers.
A "C" language compiler, released several years ago by IBM, is growing in popularity. Still a non-industry standard subset of the complete language, IBM has announced plans to release a full ANSI "C" compiler in the coming months. It is anticipated that this one product will play a significant role in the application development life cycle, helping to speed application development and reduce maintenance costs. So, while "C" language grows in popularity and promises dramatic changes in the way businesses develop applications, one should remember that the success of "C" hinges on the release of an industry standard compiler, and that there are still millions of lines of code to maintain written in other languages.
Database Access Methods
When TPF was first developed, programmers manipulated the database using a direct access method. Knowing certain physical features of the database in question, the programmer would first calculate the file address of the record where the data was stored and, using special mechanisms in the operating system, read and update the record, mindful of the indexing scheme and any overflow conditions which might exist. Conventions, such as filing the overflow records before filing the prime, were standard and, if not followed, could create serious database inconsistencies.
The problems with such a scenario are obvious. The programmer had to understand the physical structure of the database and write program code to support this structure. If the file structure changed, so, too, would the programs which accessed the data. Also, because the data was intertwined with the applications, it was usually the application programmer's responsibility to design the database as well as the applications.
Remarkable as it may seem, this method remains in use today at virtually all TPF installations to varying degrees. This is a primary reason for lengthy application development life cycles and TPF's inability to react quickly to the changing marketplace. In addition to changing current, or introducing new, business rules to the application, the programmer must also rewrite the file access algorithms in their support. A study conducted several years ago indicated that over forty percent (40%) of all application code written was solely in support of database manipulation.
In the 1970's, Swissair created a file management system to substantially decrease the amount of code necessary to develop and maintain TPF programs. Supported today by IBM, the TPF Data Facility (TPFDF) has grown to meet almost all criteria to be considered a true database management system. Though it remains a hierarchical database file manager, TPFDF has helped reduce the amount of code necessary to manipulate a TPF database, a fact which directly results in improved and less costly systems.
In spite of these benefits, TPFDF still requires the applications programmer to have some knowledge of the database they are accessing, and the application program interface (API) remains proprietary. These are major drawbacks and perhaps the reasons why many companies have chosen not to implement it.
IV. User Interface
To the end user of any system, the most important aspect of an application is the user interface, that is, the visual and mechanical interaction method between human and machine. End users today are sophisticated and have come to expect high quality graphical interfaces and mouse support. In this area, TPF has fared dismally. From a visual and functional level, the method of user interaction with the TPF operating system is essentially unchanged from its inception over thirty years ago. Most interaction is still accomplished through data entry commands from 3270 terminals and, in some cases, using 3270 formatted screens.
The fact is that the user interface technology does not only look old in comparison to today's standards, it is old! And the lack of a user friendly computer interface helps to perpetuate the misconception that TPF itself is old technology.
V. The Viability of TPF in the Future
With so many issues surrounding TPF development, Management must surely be scratching their heads wondering, "So, why are we still using TPF?" Some companies have not only posed this question, but have moved, or are preparing to move, to new platforms. What are the issues involved in making this decision? First, it is necessary to consider why the particular business chose a TPF platform in the first place. One must assume that the business transaction rate was expected to warrant the use of TPF. If the company failed to grow at the expected rate or to achieve the number of estimated transactions, then the reasons to retain their TPF system are greatly reduced. Recognizing this, some companies have moved to the smaller and more manageable TPF/MVS platform, while fewer have actually switched to different platforms altogether. The primary reason companies hesitate to move to new platforms is a financial one: legacy systems.
Legacy systems refer to the set of applications which the company purchased or developed years ago, have maintained and supported, and to which they have provided enhanced functionality to support the business. To the management in charge of legacy applications, and to companies who depend upon them, legacy systems are also referred to as "systems that work." Despite the pejorative connotation the term holds, legacy systems not only support the business, but represent millions of dollars in investment by the company. For this reason, it is difficult to cost-justify the expense of replacing an entire business system. One major hotel chain currently planning to replace their TPF system has estimated the cost of doing so at $35 million!
Either due to performance requirements or to the cost of change, most companies simply find it impractical to replace their TPF system. Based on these realities, one must assume that companies will either keep and improve upon their TPF application base, or slowly migrate their application base to another platform over the course of several years. In either case, TPF should be around for a very long time.
VI. Missed Opportunities?
In the last decade, the Information Technology (IT) industry has exploded with new development paradigms. Computer Aided Systems Engineering (CASE) tools made their debut in the mid-1980's and were subsequently tossed aside by the mid-1990's for more expedient methods of applications development, such as Rapid Application Development (RAD) tools. Today we see the acceptance of fourth generation languages (4GLs) and object-oriented programming capabilities. IT is much less adverse to the concepts of "throw away" programming where systems are developed quickly and with little investment. Gone are the days of multi-year, multi-million dollar, projects. Rapid application development is key to meeting the changing needs of the business community.
The client-server model has come into vogue and is at the heart of many companies' current computer strategies. But the excitement in the industry today rests in the area of inter- and intranet developments, the ultimate client-server paradigm with its non-platform specific languages such as Java. This holds tremendous promise in reshaping the computer industry as we know it. Yet one must wonder how IT can exploit any of these new developments with TPF's non-industry standard programming requirements, a database closed to virtually all but the programming staff, and no standardized framework in which to build applications? The answer lays in investing in the future by exploiting new technologies and tools to improve the environment and to provide added value to the TPF operating system.
VII. A Starting Point
Mid-Atlantic Systems Design Corp. has studied the industry problems regarding TPF and is developing several strategies to help TPF installations overcome several of the problems existing in today's environment. Because of the number of issues it helps to resolve, MASD is concentrating its efforts on data management and computer-to-computer connectivity through database management. In these next sections, we discuss database concepts, contrasting hierarchical to relational schemas and studying the benefits in today's distributed data environment.
VIII. Database Management System Defined
As its name implies, a database management system (DBMS) is a set of software which manages shared data resources. It permits users to group data together in a structured way to form a base of data, or a database, to be shared with others. DBMSs provide a centralized control point to ensure the data is representative of the business, rather than of a particular application. Unlike the applications they support, a DBMS is application-neutral and, thus, supports multiple applications.
IX. The Benefits of a Database Management System
Unlike file systems in which programmers must understand the physical structure of the data, DBMSs insulate the user from the data by manipulating the data on behalf of the user. DBMSs offer several advantages over file systems, chief among these being the ability to: reduce the amount of code required to write and maintain applications, minimize the risk of errors in the database, and decrease or eliminate redundant data.
Users interact with the database through an application programmer interface, or API, separating the application program from the database. Data independence, as it is called, relieves the programmer from having to develop an indepth knowledge of the existence, structure, and content of each file. This results in reduced costs and shorter application development time. Furthermore, it reduces the potential impact on application programs when changing business needs require modifications to the database.
Because database files are not manipulated directly by the programmer, the potential of introducing database inconsistencies through human error are minimized. This fact results in improved database integrity and highly reliable data. Many DBMSs also provide error-checking mechanisms, furthering the reliability of data.
X. The Advantage of Relational Database Management Systems
There are many advantages to relational database systems over hierarchical ones. As with all DBMSs, relational DBMSs (RDBMSs) seek to support the management of data as a shared resource of a community of users. It provides this support in a way that will increase user productivity, support responsiveness to change, preserve the integrity and security of the data, and perform adequately for a variety of applications. The relational approach seeks to realize these goals by providing: a simple data model, a high degree of data independence, and a systematic foundation for the development and use of new DBMS capabilities.
From the users' perspective, a database is viewed as a collection of simple tables made up of columns and containing rows of data. Data access and manipulation focuses on the data itself. This differs substantially from prerelational DBMSs, where programmers devise access strategies in the application program so that the applications then become dependent upon the file access methods. In a RDBMS, there is no significance to the relative position of a particular table or column, nor of rows within the tables. This high degree of data independence ensures program code is not dependent on available access paths. Finally, the relational approach is systematic since it applies some of the basic concepts of set theory and modern logic. The operations and statements of the original relational database languages have been precisely defined within the framework of the branch of mathematics known as first-order logic.
XI. Structured Query Language
Structured Query Language is the language used to interact with a relational database. Usually abbreviated as SQL and pronounced "sequel," it is the defacto language of RDBMSs. Its standardization is, in part, due to the ratification of the SQL language by the American National Standards Institute (ANSI) and the International Standards Organization (ISO). Unlike its name implies, SQL is not simply a query language. It also permits users to update, insert and delete data, as well as to define the database and security definitions. While a formalized standard, it should be noted that vendors have each introduced subtle variations to the language in their implementations which, at times, makes it more difficult to port programs from one vendor environment to another.
Yet, SQL is a highly flexible and user-friendly language of which the fundamentals can be acquired in a very short period of time. There are also many tools on the market today that integrate SQL into their product to generate SQL commands automatically based on a visual interface.
XII. Greyhound: a RDBMS for TPF
Designed by Mid-Atlantic Systems Design, Greyhound is a full-featured native RDBMS product and the first to run on the TPF and TPF/ MVS operating systems. It is designed to take advantage of TPF's fast transaction processing capabilities while also providing an industry-standard interface for data access.
The purpose of Greyhound is to:
Separation Application from Database
Through its database interface, Greyhound removes any physical database considerations from the application logic. This permits the applications programmer to focus exclusively on the business application itself. Because database commands are accomplished by using SQL, program code is substantially reduced. Applications are developed more quickly and with improved results and, because there is less code, programs are easier to support.
Centralized Database Control
Centralized control of the database allows improved management of data as a company resource. This means that data is defined by the database administrator (DBA) to support the business and that system tuning is performed to support the database, rather than the application. This centralized control helps ensure that data reflects the needs of the business which it supports so that new applications can also use this data in support of further business opportunities.
Open System Connectivity
Using SQL, the industry standard database access interface, Greyhound permits users to share information between TPF and other operating systems through common interface standards such as Open Database Connectivity (ODBC) and Remote Data Architecture (RDA). This feature is fundamental to those companies planning to include TPF in their client/server architecture.
Direct Database Access
Greyhound permits authorized users to access the TPF database through keyboard commands. This feature helps users to perform ad hoc queries and other data manipulation functions without programmer intervention.
XIII. The Greyhound System Components
The Greyhound RDBMS consists of seven major components:
The System Catalogue is a central data repository which stores metadata, or data about data. It is here that all database definitions are stored.
The Database Definition Language (DDL) is a set of structured commands which permit the database administrator to define and alter aspects about the database.
The Data Manipulation Language (DML) is a set of structured commands that programmers and other users use to manipulate data in the database. The three categories include data manipulation, cursor control, and transaction control statements.
The Database Control Language (DCL) is a set of structured commands used by the database administrator to provide security to the database by controlling access and resource utilization.
The Database Plan Compiler (DPC) is a pre-processing function which analyzes each DML request and, based on information stored in the System Catalogue, calculates the most efficient path to complete the user request. At the core of the system is the database engine which performs the steps necessary to fulfill a database request. All database commands are performed by the engine.
The Database Audit facility gathers statistics and information about database usage as an aid for database tuning.
XIV. Benefits of Greyhound RDBMS
Greyhound offers a set of functions and features which represent a new paradigm for the integration of TPF within the organization. Implementing industry standard concepts, Greyhound is the foundation upon which to build strategic business systems. The Greyhound RDBMS will provide benefit to many areas within the organization:
Supporting common data exchange interface standards, Greyhound permits the sharing of data between TPF and other ODBC- and RDA-compliant operating systems. This facilitates the easy exchange of data within the organization and creates the foundation in which to build better and more manageable systems. As a component in a Client/Server architecture, Greyhound can serve the organizational needs for distributed data, enhancing the existing architecture and expanding the use of TPF data throughout the organization.
Increased Data Security and Integrity
Greyhound is built around an architecture which helps ensure the security and integrity of data. Since the data is controlled from a centralized source and protected from application programs and end users by the system itself, the quality of data is more assured and highly reliable.
Improved Application Development
As a DBMS, Greyhound provides the backbone to facilitate the development and maintenance of applications with a reduced life cycle and cost, while increasing programmer productivity. Using Greyhound, MASD expect companies to reduce by 30% the cost in application development and maintenance. Because Greyhound complies with industry standard technology, the organization is positioned to explore standardized application environments. Coupled with the anticipated release of IBM's next TPF "C" compiler, companies may soon be able to begin using "off the shelf" technology for their application development. Finally, TPF will fit comfortably within the organization.
The TPF operating system has been available for many years and is consistently good performer in the world of OLTP. Yet there are several significant features of TPF which have not kept up with technological advancements. With the expectations of sophisticated users, coupled with Management's desire to improve the operation and development environment, it is clear that TPF needs to change the paradigm under which it exists and to evolve into an operating system for the 21st century.
Greyhound RDBMS from MASD Corp. can help your company improve the integration of TPF within your system architectural framework, faster application development, more reliable applications, and improved data reliability and security. For more information on Greyhound, telephone Thom Kolton at Mid-Atlantic Systems Design Corp. at (410) 342-0444, e-mail at TDKolton@worldnet.att.net. Or write: 516 S. Wolfe Street, Baltimore, MD 21231.