Collection Connection - FAQs about Collections
by Michele Dalbo, IBM TPF ID Core Team, and Daniel Jacobs, IBM TPF Development

Hello everyone! It's Ree Porter, and I am back by popular demand to continue with more investigative reporting at the request of the TPF collection support (TPFCS) team. The team asked me to travel with them to customer sites to provide assistance at meetings between the team and various customer representatives. The team's budget allowed three business days for me to travel from site to site (not staying in great hotels or eating gourmet meals, may I add) to gather information for this report. Because I am not one of those technical types (reporters seldom are), I did not have much to contribute at the meetings, but I was able to report on the results. Once again, all customer and employee names may have been changed to protect the innocent, so let's get started.

Date: Monday , April 5

Place: FreeFall Airlines

Schedule: 10 a.m. meeting with John Parachute (FreeFall database administrator), Albert (TPFCS programmer), Dan (TPFCS programmer), Glenn (TPFCS expert), and Michele (TPFCS technical writer).

C (comment). Ree - I am glad all of you could make it here! Now, John, I think you said that you had some installation questions for the group?

Q (question). John - Yes, Ree. First, does TPFCS have to be installed on the basic subsystem (BSS)?

A (answer). Dan - Yes, TPFCS must be installed on the BSS even if all user-defined data stores are created on other subsystems. Furthermore, long-term pools have to be defined on the BSS. There are various dependencies in the way TPFCS was designed that necessitate this. The primary reason is that the TPFDB data store, which is used by TPFCS to maintain its own internal collections, automatically gets created on the BSS.

A. Michele - The fact that TPFCS has this dependency and must be initialized on the BSS was not explained clearly in our publications. We apologize for this, and I will include this information in future publications.

Q. John - We built CJ0040, and its size is 160CD0 hexadecimal bytes, which is bigger than our control program. Is this normal?

A. Albert - Yes. By comparing the size of two load decks generated in our environment, the one for just CJ0040 is more than 1.4 times the size of the one for just the TPF control program. This is not a surprise, considering the amount of code that is contained in TPFCS.

Q. John - Is the PUT 9 renaming of collections (such as dictionary to key sorted set) really only a rename? Is there a way I can use the new names on a PUT 8 system?

A. Dan - Yes, it was really just a rename; and with the following modifications you can use the new names on a PUT 8 system:

  1. In the c$to2.h header file, copy the old TO2_createDictionary... application programming interfaces (APIs) for the new TO2_createKeySortedSet... APIs.
  2. In the c$to2m.h header file, add the #pragma maps for the new APIs. Note that the TO2_getCollectionType API will still return the old collection type strings with only these changes.

Q. Ree - Was there anything else about installation?

A. John - No, I think that was about it. I do have a database design question though.

C. Ree - Go ahead!

Q. John - Can collections contain other collections?

A. Dan - No, collections cannot contain other collections. They can only reference other collections by containing pointers, that is, persistent identifiers (PIDs), of other collections.

C. John - I also have a couple of performance questions.

C. Albert/Dan - Oh, no . . .

Q. John - What is the performance of TPF collection support? What is IBM's position on it?

A. Glenn - TPFCS contains overhead because of its functions. It is not practical to compare TPFCS to pure FIND/FILE macros because any application that uses FIND/FILE, but performs any function similar to TPFCS, would contain its own performance overhead as well. The position of the TPF lab on the performance of TPFCS is that we will work with customers who are thinking of using it on an individual basis to assist them in determining if the functions they require can be provided by TPFCS with reasonable performance and, if not, to help determine their best options.

A. Michele - We will also investigate if there are any generic performance considerations that can be documented.

C. Ree - Hmm, I might have a reason to visit with all of you again!

Q. John - What is the relationship between performance and the size of a collection? Is it better to have fewer collections with more elements, or more collections with fewer elements? How many I/Os are required?

A. Dan - Good questions! Generally, a TPFCS transaction rate is inversely related to the size of a collection, but it is hard to be much more specific because performance criteria can vary between the collection types. For example, for a key sorted set (dictionary), the performance is determined by the collection key size as well as the data element size and the number of elements. Depending on these sizes, there may be no difference in performance whether there are 20,000 elements or 1.2 million elements in a collection! Another reason to have smaller collections is that only one entry control block (ECB) can update a collection at a time. Having many smaller collections allows multiple ECBs to update more data at one time.

A. Michele - Again, we will investigate to see if we can add this kind of information to the TPF Database Reference publication. I have made a note about it.

Q. John - Which collection is more efficient to use, an array or a dictionary (oops, I mean key sorted set)?

A. Albert - The array might perform better than the dictionary. All collections have an underlying tree structure that keeps track of the actual file addresses. The dictionary also has an index tree where the pointers in the index are relative record numbers (RRNs) and have to be converted to file addresses by looking them up in the underlying tree structure. Let me draw this on the white board to better illustrate this for you.

tree.gif (8304 bytes)

It comes down to the fact that an array has one tree to transverse and a dictionary has two.

Q. John - Would there be data contention when multiple concurrent reads take place against the same binary large object (BLOB)? Does any locking occur when no update is required?

A. Dan - No, a read operation will not cause any locking or data contention to occur.

Q. John - Is there any way to lock a collection at the element level?

A. Albert - No, TPFCS uses TPF record locking, so there will not be element lock capability. A TPF lock is issued against an entire collection; thus, there is no concurrent update capability on a collection.

C. John - Thank you for your time!

The team goes to the local steak house for dinner. Michele passes up dinner for shopping at the nearby mall.

Date: Tuesday, April 6

Place: InSomniacs Inn

Schedule: 9 a.m. meeting with Nancy Upallnight (application programmer).

C. Ree - Good morning everyone; let's get started with some questions from Nancy.

Q. Nancy - Can TPF provide sample code using TPFCS?

A. Albert - Yes, we have an assortment of sample code that is available on request.

Q. Nancy - Are there any plans to support element locking so the application does not have to maintain the element sequence counter?

A. Albert - Someone just asked that question yesterday! No, TPFCS uses TPF record locking so there will not be element lock capability. The TPF locking capability does not give the granularity needed to provide it.

C. Michele - I think we need to clarify this point in our publications because this is the second time the question has been asked.

Q. Nancy - Can I create multiple collections with the same data store and application IDs?

A. Dan - Absolutely! Each data store can create as many collections as it needs.

Q. Nancy - Is there a more direct way to define a collection name for use with the TPFCS (browser) functional messages other than by displaying the PID of the collection when it is created?

A. Dan - Yes. When an application creates a collection, it can also use the TO2_defineBrowseNameForPID function to associate a name with that collection. That name can then be used with the TPFCS browser functional messages.

Q. Nancy - Is there any way for an application to find a PID without storing it in a higher-level data structure, such as an application dictionary?

A. Dan - Well, not really. Let me explain.

TPFCS collections are comprised of pool records and, therefore, need some kind of anchor to access them. The data store application dictionary serves as such an anchor and can be used with the TO2_...DSdict... functions. For collections that are used systemwide, the TPF application dictionary can be used with the TO2_...TPFdict... functions. Other entities besides these dictionaries could be used to anchor the collections, such as embedding references in TPF fixed files, but these could be considered higher-level data structures too.

Q. Nancy - What can you do with a user class ID?

A. Dan - The user class ID, available through the TO2_setClass and TO2_class APIs, is not used at all by the TPFCS system. The field is made available for users to associate (store) up to 32 bytes of information with the collection. For example, perhaps a user creates groups of related collections, such as a group of collections for employees and another group of collections for managers. Each of the employee collections could be considered the same "user type" of collection, and the user could assign their own class to the collection for easy reference. A different class would be assigned to the manager collections. There is no limit to the information that can be stored in these 32 bytes. Information could include class IDs, flags, names, and even pointers and file addresses.

Q. Nancy - Can an ECB issue multiple TO2_createEnv functions for different data stores that it wants to manipulate concurrently?

A. Albert - Yes, an ECB can have multiple environments outstanding at the same time.

Q. Nancy - How do you set the number of elements in an array (or are all arrays 2 GB)?

A. Albert - The number of elements in an array is equal to the index position of the last element. When a collection is created, it initially has zero elements. As elements are placed in the array, the size of the array expands. Elements cannot be deleted from an array (they can only be zeroed), so an array will never shrink in size.

Q. Nancy - Can you do a TO2_atPut to any index position immediately after issuing a TO2_createArray or do you have to have populated the array (by appending elements) first?

A. Albert - Yes, you can place an element in any position in an array. Logically, all uninitialized elements preceding the element added will contain zero, although those elements will not take up any physical space on DASD. For example, if you have a new (empty) array and you place one element at position 1 and another element at position 5, elements 2 to 4 will logically contain zeros, and the size of the array will now be 5 even though only two elements were actually added. Let me draw this on the white board to better illustrate this for you.

array.gif (3975 bytes)

Day 2 ends a bit early and Michele drives away, heading in the direction of the mall (again). The remaining members of the team go for pizza. Hmm . . . I am beginning to notice a pattern here.

Date: Wednesday , April 7

Place: (Bank Rupt Savings and Loan)

Schedule: 7:30 a.m. meeting with Lau C. Teller (operations).

C. Ree - Well, here is the last stop on our trip. We heard that you had a lot of questions, so we scheduled an early meeting so that we could cover everything. What is your first question?

Q. Lau - Is the ZBROW capture and restore of collections supported for blocked tapes? If not, why?

A. Albert - No, TPFCS capture and restore of collections is not supported for blocked tapes. Currently, if you attempt to restore from a blocked tape, unpredictable results will occur and your system could go catastrophic or enter a disabled wait state because of a known problem in base tape code that is being addressed by APAR PJ26139. However, even after this APAR is available, capture and restore of collections using blocked tapes cannot be supported in the current release because of the following restrictions:

  1. Base TPF currently does not support reading from or writing to storage obtained by malloc when performing I/O to blocked tapes.
  2. Because of the way TPFCS is currently designed, it needs to know if I/O was completed successfully when writing collections to tape.

Even if item 1 was not a restriction, with blocked tapes the tape could switch long after the ZBROW collection capture ECB exited, and portions of a single collection could have been written to another tape. As it is currently designed, collection restore would not be able to restore such a collection.

C. Lau - Well, I just read the new PUT 9 updates to TPF Database Reference, so all of my recoup questions were answered; so I guess that is it!

C. Dan - We got up this early just for that?

C. Michele - It did not bother me! I am always up early anyway!

C. Albert - Me too! Dan, you really should try to wake up at 5 a.m. You might like it!

C. Ree - Anyway, this concludes our investigative report into customer questions about TPFCS. If I hear that there are more questions coming from our customers, I promise to be back!

C. Dan - This was a productive trip. It was great to finally meet with our customers.

C. Michele - Yes, I bought four new outfits, but I did not have to make many updates to our publications. I would say that this was a very productive trip!