The Collection Connection
Michele Dalbo, IBM TPF ID Core Team, and Daniel Jacobs, IBM TPF Development

In previous articles, we discussed 9 of the 13 collection types provided by TPF collection support (TPFCS). Each of the types already discussed have one major characteristic in common: they all consist of ordered elements. There are also several collection types that contain unordered elements. In this article, we discuss the remaining four collection types, all of which are unordered: Bag, Set, Key bag, and Key set.

Bag
Characteristics:
A bag is an unordered collection of elements that have a data area, but no key. The maximum length of an element can be 248 bytes, the data values can be nonunique, and element equality (the ability to test if all data members of two elements are equal) is supported. Because there is no order to the collection elements, a cursor is needed to iteratively access and retrieve each element in a collection.

Advantages:
A bag is primarily designed to be used on data that does not need to be retrieved in any particular order. This allows for increased efficiency in adding and retrieving specified elements.

Examples:
An example of using a bag is a program for tracking supplies in a medical supply room. For each item in the medical supply room, you enter the name of the supply type into the collection. If there are multiple occurrences of the same supply type, the item is entered multiple times because a bag supports nonunique elements. You can easily determine if a particular supply type is present (because the collection supports element equality), count the number of occurrences of a particular supply type, and retrieve a single supply although you cannot retrieve a specific item of that supply type. For example, you can retrieve a box of bandages, but you cannot retrieve the box of bandages that was added last.

Another example of using a bag would be a program to track tickets sold for a bus ride. There are a limited number of tickets sold for each bus ride, there is no assigned seating, and a customer can purchase more than one ticket (for friends and family). Each bus ride is represented by a bag collection. Each element represents an available seat on the bus, and the element data contains information about the customer that purchases the seat. Because bags are nonunique, a customer can purchase multiple seats.

Set
Characteristics:
A set collection is very similar to a bag collection except that the data values must be unique and the maximum element length is 256 bytes.

Advantages:
A set is appropriate to represent a data model whose elements do not have to be retrieved in a particular order and when automatic enforcement of unique element values is desired.

Examples:
An example of a set is a program that creates a packing list for a box of free samples to be sent to a warehouse customer. The program searches a database of in-stock merchandise and selects 10 items at random whose price is below a threshold level. Each item is then added to the set. The set does not allow an item to be added if it is already present in the collection, ensuring that a customer does not get two samples of the same product. The set is not sorted and elements of the set cannot be located by key.

Another example of a set would be a variation of the bus example discussed previously. Suppose the bus company needs to track information about each passenger, such as their age (for different pricing structures). A set could be used instead of a bag, and the elements would contain information about each passenger instead of the purchaser of each ticket.

Key Bag
Characteristics:
A key bag is an unordered collection of elements that have a data area and a key separate from the data area. The maximum length of an element is 248 bytes, the data values can be nonunique, and element equality is supported. Because there is no order to the collection elements, a cursor is needed to iterate through all elements in a collection. Although bags and key bags have some similar characteristics, a key bag has a few "key" differences (pun intended). In addition to having a data value, elements in a key bag also have a key value, which is 248 bytes and can be nonunique. The maximum length of the data value can be 1000 bytes and element equality is not supported.

Advantages:
Like a bag, a key bag offers some performance improvements that ordered collections do not provide. Because element equality is not supported, there is no application programming interface (API) to specifically test the collection for the presence of a data value. However, because the elements have a key, there is an API to access a particular element directly without using a cursor, and the maximum data value size is approximately four times greater than that of a bag.

Examples:
An example of using a key bag is a program that tracks assignments of dormitory suites that have six beds each. The key bag element key is the suite number. Each element also has data values for the student name, student ID number, dormitory location, and so on. When you arrive at college, you are assigned a suite, and your student ID number and the suite number are entered into the collection along with some other information. Because multiple students will be assigned to the same suite, the program allows the same key to be added to the collection multiple times. When you move out of the dormitory, the program finds each element whose key matches the suite number and deletes the one element that has your student ID number associated with it.

Once again, the previously discussed bus example provides us with another example of the use of a key bag. If the bus company wants to be able to quickly access all of the seats purchased by a particular individual, a key bag could be used with the key of the collection being the name of the customer purchasing the tickets. Because nonunique values are supported by a key bag, the customer could still purchase multiple tickets. Because keys are used, the application would be able to find the records for each of the tickets without looping through all tickets sold for a particular bus ride.

Key Set
Characteristics:
A key set has the same characteristics as a key bag except that the maximum key size is 256 bytes and the key values must be unique.

Advantages:
A key set offers the same advantages as key bags with the additional benefit of providing automatic enforcement of unique elements and slightly larger keys.

Examples:
An example of using a key set is a program that allocates rooms to patrons checking into a hotel. The room number serves as the element key and the patron's name is a data member of the element. When you check in at the front desk, the clerk pulls a room key from the board and enters the room number and your name into the collection. When you return the key at checkout time, the element for that key is removed from the collection. You cannot add an element to the collection that is already present because there is only one key for each room.

As you may have guessed by now, the bus example can be used one more time. Suppose the bus company wants to assign seats when the tickets are purchased. Each seat on the bus would be numbered and the seat number would be the key in the collection. Because the same seat cannot be assigned twice, a key set would be used instead of a key bag.

That’s All of Them! So What’s Next?
It has been almost a full year since we first started discussing collections. Over the course of the year, we hope you have obtained better insight as to what kind of data models TPFCS provides and when each one would be appropriate. But don’t think this is the end of "The Collection Connection"! We’re not ready to go away, especially after we finally settled on a name for this column that we like! There are many more topics to be discussed in upcoming issues: cursors, properties, and utilities, just to name a few. Be sure to look for additional articles about periodic enhancements as they become available. If you have any suggestions for other topics, please let us know!