Is C Dangerous?
by Dan Evans

When I asked an acquaintance of mine recently why his shop didn't use C on their large mainframe computers, he answered that he had heard C was a dangerous language. He caught me by surprise. Dangerous is not among the many adjectives I have heard applied to C. As the co-author of the Prisym/C/TPF compiler, I was curious to know where the danger lay. As we explored the issue, I found the obscurity was more his concern, not danger.

It's true: you can write obscure programs in C. As proof, I offer my own version of a famous C example of obscurity, the following two line program. If you know C, try to figure out what will be displayed by the printf() function call before looking at the answer at the end of this article. Better yet, compile it and run it for yourself, and then try to explain the result.

#define C 1
main() { printf(&C["\001%c%.2s\0003ACP"],C+C{"is"],"pfun") };

The program will run with either an ASCII or an EBCDIC C compiler, and should produce the same result.

It is probably fair to say that an obscure program can be written in any language, with any Assembler language being a leading contender. It is also true that a good programmer can express an algorithm clearly in any language. Clarity in programming is as important as clarity in writing, and second only to correctness. Any language as large as C has some features which may cause problems if used carelessly. In the interest of C clarity, let's explore some of these features to see how they can be used, or abused. In the following expression, which could be in C or in PL/1 or even in Pascal with a slight modification of the assignment operator,

rs = fun1(al) + cd * fun2(al) ;

We know that multiplication has higher precedence than addition, so the product of the variable cd and the return value from the function fun2() must be computed first, then added to the return from fun1(). But, operator precedence does not specify order of evaluation. Which function call is actually executed first? Compilers are generally free to evaluate expressions in any order which is semantically equivalent to their algebraic statement. We assume that the compiler writer has as extensive knowledge of the target machine and tries to generate the best machine code possible. Changing the order of evaluation may help the compiler generate better code. You probably feel that if we know the value of cd and the returns from fun1() and fun2() we can calculate rs. But what if cd changes during evaluation, because it is modified by fun1()? Then, the order of the function calls becomes important. Programmers have been known to write functions which modify global variables, and this kind of bug is even more common in Assembler programming.

The culprit is the side effect produced by either an operator or a function, and I chose the above example to illustrate the problem because it is not a problem limited to C. However, C has some side effect producing operators which are not available in other languages. In many languages, function calls are the only way to produce side effects. In C, in addition to function calls, the assignment operators, and the increment and decrement operators produce side effects. Consider the following example using the increment operator:

count = 0;
rs = fun3( ++count, ++count);

The ++ operator preceding a variable increments the stored value of the variable and uses the result as the value of the operation. We want to pass the values 1 and 2 to the function fun3(), but did we? It depends on the order of parameter evaluation. The ANSI C Standard leaves this order up to the compiler. For a stack machine, the natural order is right to left, leaving the left-most parameter on the top of the stack. When arguments are passed in a parameter list, the natural evaluation order is left to right. Since the result depends of the evaluation order of the compiler, this is called order dependent programming, and it should be avoided in any language. The simple way to avoid order dependency is: never use the target of an operation having a side effect twice in the same expression. Consider

rs = count + ++count

which sums the current value and the next value of count. If count is initially 5, rs is 11 if the left operand of the addition is evaluated first, or 12 otherwise. This is an expression where the target of the side effect operator ++, the variable count, is used twice. If we go back to our original example, the variable cd was used twice in the same expression, but it was harder to see. We postulated that it was modified by the execution of fun1(), so it was used a second time within fun1(), where the side effect was produced.

Order dependency can always be removed by using temporary variables. If we modify the previous example to :

t = count;
rs = t + ++count;

we remove the order dependency by separating the double use of the variable count into two expressions. A good compiler leaves the value of t in a register, so there is no difference in the generated code, but the clarity of the program is improved.

Just as a lack of order specification can improve compiled code, so can an explicit specification of the order. In C, there are three operators which have an explicit order of evaluation: logical AND (&&), logical OR (||), and the sequence operator (,). The effect of evaluation order for the logical AND operator is shown in the statement:

while (sq != NULL && sq->key != value)
sq = sq->next;

which searches a singly linked list for an element with a specific key and stops when it is found or the end of the list is reached. Logical AND always evaluates its left operand first, and will only evaluate its right operand if the left operand is true, or non-zero. In the while expression, the pointer sq will not be used in the comparison to value if it is set to NULL. Many languages will evaluate the entire expression and then test the result. Algebraically, this is equivalent, but C knows that if the left operand of a Boolean AND is false, the result is false, and the value of the right operand is irrelevant. Since the logical AND implements this Boolean operation, the explicit order of evaluation saves execution time and avoids a possible abend from using a NULL value as a pointer. Of course, this is exactly the way a good Assembler programmer would write the code.

Logical OR is similar to logical AND. It always evaluates its left operand first, and it will only evaluate its right operand if its left operand is false, or zero. The sequence operator always evaluates its left operand first, then discards the value and evaluates the right operand. If the left operand does not contain an operator with a side effect, such as an assignment, a good compiler will discard the left operand expression.

In summary, the order of evaluation of operands in a C expression is up to the compiler, within the constraints of operator precedence and parenthesization, except for the logical AND, logical OR, and sequence operators, where the order is explicit. This allows the compiler to produce code optimized for size and execution, speed, and avoids burdening the language with excessive rules. Is this dangerous? Hardly. It takes no more time than the time to read this article to understand the C order of evaluation rules. Only the inexperienced C programmer will overload a single expression with side effect operators.

However, obscure programs can be written in C, and they can in any language. The following program is equivalent to the program presented at the beginning of this article:

main() { printf("tpf"), }

I'll give an explanation in the next issue.