With C in TPF - Strings
by Dan Evans

Strings have a strange place in the C language. The compiler recognizes string literals, but there is no string data type. An ANSI C compiler also recognizes string literal concatenation, but there is no string operator in the language. The preprocessor will even "stringize" macro parameters, turning them into string literals.

String literal notation is really a shorthand for declaring an array of characters with a special characteristic: the last character in the array always has the value 0. Although string literals can be any length, the size of an array is determined at compile time, so when a string is declared as an array of characters, its maximum size is fixed. The 0 terminating character lets the length of the string vary from 0 to the maximum length declared. This is always one less character than the size of the array because of the terminating 0 byte. Strings can also be allocated at execution time, by allocating a character array. This is really the only way to handle strings whose length is not known until execution. The example below presents a few simple routines and conventions which make dealing with dynamic strings easy. Each dynamically allocated string is referenced through a character pointer variable. Each string should have an initial value prior to use.

The best approach is to set a global pointer to the address of a static 0 byte. This will be the initial value of any new string. You may see some code which instead uses a NULL pointer as the initial value of a string. This differentiates between a NULL string and a zero length string, a difference which causes much NULL pointer checking in subsequent code. Using a global static pointer to a 0 byte avoids NULL pointer checking, avoids allocating a single byte every time a string is set to zero length, and provides a convenient valid string value. The program below declares the variable NewString for this purpose. To initialize a string, set its character pointer to NewString. Always use NewString whenever a zero length string is needed.

String assignment is done with the copystring() function. Assigning one string pointer to another sets both pointers to the same storage address, so they are not really two strings, but two different names for the same string. To truly assign a string, a copy must be made, so the normal string assignment statement looks like:

str = copystring(abc);

Concatenation is one of the most common string operations. It builds a longer string out of a sequence of shorter ones. The concat_strings() function implements this operation. It also uses a variable length argument list, so that any number of strings may be concatenated. The list is ended by a NULL pointer as the final argument. Since all our string pointers have non-NULL values, a NULL as the argument list terminator is unambiguous. The design of concat_strings() assumes that storage allocation is an expensive operation, and that calculating the length of a string is relatively inexpensive. A good C library will implement strlen() as a TRT instruction. The function makes two passes through the variable argument list. The first pass calculates the length of the resulting string. Then, the requisite storage is allocated. The second pass copies the arguments into the result. The index of the start of the string is maintained so that the C library function memcpy() can be used. A good C compiler will implement memcpy() in-line as an MVCL instruction.

The concat_strings() function also serves as an example of variable argument list processing under ANSI C. The type va_list is defined in <stdarg.h> as an implementation dependent variable argument list pointer. A variable of type va_list is initialized by the macro va_start(), which takes the variable and the last argument in the list before the variable portion. ANSI C requires at least one variable of known type in the argument list prior to the variable portion. The value of each argument is subsequently accessed by the macro va_arg(), which takes the va_list pointer and the type of the argument. When the list is finished, the macro va_end() should be called to perform any implementation dependent clean-up. concat_strings() makes two passes, so there are two invocations of va_start(). It also uses a function called getstore() to perform storage allocation. This allows the error checking for malloc() to be performed in one place, instead of at each point where storage is needed. The second argument to getstore() is used for debugging. In ANSI C, it can be defined out for production by a definition such as

#define getstore(a, b) getstore(a)

The example program uses all of these functions. It reads in two or three arguments from the command line, and assumes that these are a first and last name with an optional middle name. The program creates a single string from these arguments in the order last name, first name middle. The call to copystring() is strictly speaking not necessary, since the value of the command line variable argv[2] will not change, but it is used to illustrate string assignment. The same thing is true for freestring(), since all storage is freed when a program ends. These calls illustrate the normal usage in more complicated programs. One final note. A string should never be used as both a target for an assignment or concatenation and an argument to either of these functions in the same statement. For example, the statement

sp = concat_strings(sp, other, NULL);

concatenates the string other to the string sp, but the old value of sp is lost in the process. This is called a core leak. A pointer to dynamically allocated storage is overwritten, so the storage can never be freed. The correct way to do this would be

tmp = copystring(sp);
sp = concat_strings(sp, tmp, NULL);
freestring(tmp);

If this is a common operation, it might be a good idea to write an append() function to improve performance by avoiding the extra allocation. The functions presented here should provide a solid basis for your personal string function library. A good set of strings functions can simplify the use of these objects and simplify your programs.

#include <stdio.h>
#include <stdarg.h>
#ifndef NULL
#define NULL (void *)0
#endif
char *NewString = "";
void logic_error(char *msg)
{
/* error processing */
exit(1);
}
void *getstore(int sz, char *trc)
{
void *rp, *malloc();
if ((rp = malloc(sz)) == NULL)
logic_error(trc);
return rp;
}
char *concat_strings(char *s, ...)
{
va_list ap;
char *sp, *rp;
int ln, ix;
va_start(ap, s);
sp = s;
ln = 0;
while (sp != NULL)
{
ln += strlen(sp);
sp = va_arg(ap, char *);
}
va_end(ap);
rp = (char *)getstore(ln + 1, "Concats");
va_start(ap, s);
sp = s;
ix = ln = 0;
while (sp != NULL)
{
ln = strlen(sp);
memcpy(&rp[ix], sp, ln);
ix += ln;
sp = va_arg(ap, char *);
}
rp[ix] = '\0';
va_end(ap);
return rp;
}
void freestring(char *sp)
{
if (sp == NULL)
logic_error("FreeStr");
else
if (sp != NewString)
free(sp);
}
char *copystring(char *sp)
{
int ln;
char *rp;
ln = strlen(sp);
if (ln > 0)
{
rp = (char *)getstore(ln + 1, "CopyStr");
strcpy(rp, sp);
}
else
rp = NewString;
return rp;
}
ain(int argc, char **argv)
{
char *name, *middle;
nt lst;
if (argc >= 3)
{
if (argc == 4)
{
middle = copystring(argv[2]);
lst = 3;
}
else
{
middle = NewString;
lst = 2;
}
name = concat_strings(argv[lst], ", ",
argv[1], " ", middle, NULL);
printf("%s", name);
}
freestring(middle);
freestring(name);
return 0;
}