Productive C Practices

Pointer ownership discipline is an example of a productive C practice, something that we do to keep the complexities of the language under control. We've looked at a number of these practices already in the course:

  • Use good development tools that report errors (e.g., type errors) as early as possible
  • Minimize mutation whenever possible.
  • Favor local variables (stack allocation) over malloc (heap allocation).
  • Unabashedly draw diagrams to understand the layout of memory.
  • Use debuggers to aid in code understanding.
  • Use pointer ownership discipline to manage heap allocations.
  • Document assumptions about your code not already captured in the text.

Practices like these are what separate productive from unproductive C programmers. With good practices and discipline, we can write efficiently write memory-safe C code.

Below, we describe two additional practices that you should consider employing throughout your code.

const-correctness

We now know that much of C's complexity lies in the fact that we mutate the values of memory locations. To reduce the complexity of our code, we can simply minimize the amount of mutation that occurs. Unlike many of the practices above where there is little language support helping us along the way, there is a language feature that helps us maintain good mutation discipline: the const keyword.

const is a qualifier on a type, denoting that the memory location of that type cannot be mutated. In its most basic usage, const is a type-safe alternative to #define constants:

const int NUM_VALUES = 10;

// Produces a type error:
//   error: cannot assign to variable 'NUM_VALUES' with const-qualified type 'const int'
NUM_VALUES = 0;

This declaration is type-safe in the sense that the compiler knows NUM_VALUES is a const int and can produce better diagnostics. Compare this with the equivalent #define:

#define NUM_VALUES 10

// Produces a compiler error:
//   error: expression is not assignable
NUM_VALUES = 0;

Recall that #define just defines a textual substitution—the compiler replaces NUM_VALUES with 10 throughout our code. So the assignment becomes the non-sensible 10 = 0 which produces an error message that requires a bit of interpretation to understand.

In addition to local variable declarations, we can also place const on the types of function parameters. These const qualifiers are much more interesting because they act as documentation for our functions. For example, consider a function that pretty-prints a name from a collection of strings:

void print_name (const char *first_name, const char *middle_initial, const char *last_name) {
    printf("%s %s. %s", first_name, middle_initial, last_name);
}

The types of each of the parameters is const char *. Recall that we read types in C backward, i.e., right-to-left, so we can read this type as:

A pointer to one or more chars that are const, i.e., cannot be modified.

This means that the C compiler will ensure that print_name does not mutate any of its arguments! Since this qualifier appears in the type of the function, callers of the function will know this behavior of print_name without having to read the documentation!

In general, the practice of writing const-correct code involves using const as much as possible, especially in function parameter types, to document precisely where in your code you expect values to be mutable.

restrict and Pointer Aliasing

Consider the following simple struct that holds a pointer to a heap-allocated int:

typedef struct {
  int *value;
} cell;

Now consider the following function which moves the heap-allocated value from one cell to another, taking care to deallocate the heap-allocated value from the destination before losing our handle on it.

void move (cell *dst, cell *src) {
  free(dst->value);
  dst->value = src->value;
}

This code has a very subtle memory bug! Can you find it?

Consider the situation where we pass the same cell as both the source and destination for the move:

cell c;
c.value = (int*) malloc(sizeof(int))
*(c.value) = 22;
move(c, c);

Let's trace the execution of move line-by-line. When we first enter move, we have the following memory layout:

main -----
c [o]------------>[22]
move -----         ∧
dst [o]------------|
src [o]------------|

c is a cell, which is just a fancy pointer to a heap-allocated int. We end up making two copies of this cell, one for dst and src, all pointing to the same heap-allocated value 22.

On the next line, we delete 22 through dst:

main -----
c [o]------------>
move -----         ∧
dst [o]------------|
src [o]------------|

We then copy src's pointer into dst. But they were already were the same pointer, so nothing changes!

When we return from the function, we have:

main -----
c [o]------------>

c is now a dangling pointer to a freed chunk of memory. Any access to c->value will be a use-after-free error unless we point c->value to another heap-allocated chunk!

What went wrong here? It turns out that our code does not work when our two pointers src and dst are aliases to the same chunk of memory! Pointer aliasing of this sort is a huge problem in C code. Frequently, our code breaks down if it happens to be the case that two pointers we're operating over can point to the same value. Furthermore, some compiler optimizations are not possible if the compiler cannot prove that two pointers point to distinct memory locations.

The fix to our code, it turns out, is simple: only perform the move if the pointers point to distinct locations. We can check this via a pointer equality comparison:

void move (cell *dst, cell *src) {
    if (dst != src) {
        free(dst->value);
        dst->value = src->value;
    }
}

Alternatively, if we want to document that dst and src are assumed to be pointing to distinct locations, we use the restrict qualifier:

void move (cell * restrict dst, cell * restrict src) {
    // ...
}

By marking the pointer types as restrict, we are declaring that dst and src are restricted to point to distinct locations in memory. Unlike const, the compiler does not enforce this property on the pointers passed to move. Instead, this qualifier acts as documentation for users of the function. It also allows the compiler to perform optimizations on the code assuming that dst and src aren't aliasing the same memory location.

(Strictly speaking, the actual semantics of restrict state that any object accessed dst can only be accessed through dst in move, likewise for src. This means, effectively, that src and dst point to distinct memory locations.)