Productive C Practices
Pointer ownership discipline is an example of a productive C practice, something that we do to keep the complexities of the language under control. We've looked at a number of these practices already in the course:
- Use good development tools that report errors (e.g., type errors) as early as possible
- Minimize mutation whenever possible.
- Favor local variables (stack allocation) over
malloc
(heap allocation). - Unabashedly draw diagrams to understand the layout of memory.
- Use debuggers to aid in code understanding.
- Use pointer ownership discipline to manage heap allocations.
- Document assumptions about your code not already captured in the text.
Practices like these are what separate productive from unproductive C programmers. With good practices and discipline, we can write efficiently write memory-safe C code.
Below, we describe two additional practices that you should consider employing throughout your code.
const
-correctness
We now know that much of C's complexity lies in the fact that we mutate the values of memory locations.
To reduce the complexity of our code, we can simply minimize the amount of mutation that occurs.
Unlike many of the practices above where there is little language support helping us along the way, there is a language feature that helps us maintain good mutation discipline: the const
keyword.
const
is a qualifier on a type, denoting that the memory location of that type cannot be mutated.
In its most basic usage, const
is a type-safe alternative to #define
constants:
const int NUM_VALUES = 10;
// Produces a type error:
// error: cannot assign to variable 'NUM_VALUES' with const-qualified type 'const int'
NUM_VALUES = 0;
This declaration is type-safe in the sense that the compiler knows NUM_VALUES
is a const int
and can produce better diagnostics.
Compare this with the equivalent #define
:
#define NUM_VALUES 10
// Produces a compiler error:
// error: expression is not assignable
NUM_VALUES = 0;
Recall that #define
just defines a textual substitution—the compiler replaces NUM_VALUES
with 10
throughout our code.
So the assignment becomes the non-sensible 10 = 0
which produces an error message that requires a bit of interpretation to understand.
In addition to local variable declarations, we can also place const
on the types of function parameters.
These const
qualifiers are much more interesting because they act as documentation for our functions.
For example, consider a function that pretty-prints a name from a collection of strings:
void print_name (const char *first_name, const char *middle_initial, const char *last_name) {
printf("%s %s. %s", first_name, middle_initial, last_name);
}
The types of each of the parameters is const char *
.
Recall that we read types in C backward, i.e., right-to-left, so we can read this type as:
A pointer to one or more
char
s that areconst
, i.e., cannot be modified.
This means that the C compiler will ensure that print_name
does not mutate any of its arguments!
Since this qualifier appears in the type of the function, callers of the function will know this behavior of print_name
without having to read the documentation!
In general, the practice of writing const
-correct code involves using const
as much as possible, especially in function parameter types, to document precisely where in your code you expect values to be mutable.
restrict
and Pointer Aliasing
Consider the following simple struct
that holds a pointer to a heap-allocated int
:
typedef struct {
int *value;
} cell;
Now consider the following function which moves the heap-allocated value from one cell to another, taking care to deallocate the heap-allocated value from the destination before losing our handle on it.
void move (cell *dst, cell *src) {
free(dst->value);
dst->value = src->value;
}
This code has a very subtle memory bug! Can you find it?
Consider the situation where we pass the same cell as both the source and destination for the move
:
cell c;
c.value = (int*) malloc(sizeof(int))
*(c.value) = 22;
move(c, c);
Let's trace the execution of move
line-by-line.
When we first enter move
, we have the following memory layout:
main -----
c [o]------------>[22]
move ----- ∧
dst [o]------------|
src [o]------------|
c
is a cell, which is just a fancy pointer to a heap-allocated int
.
We end up making two copies of this cell
, one for dst
and src
, all pointing to the same heap-allocated value 22
.
On the next line, we delete 22
through dst
:
main -----
c [o]------------>
move ----- ∧
dst [o]------------|
src [o]------------|
We then copy src
's pointer into dst
.
But they were already were the same pointer, so nothing changes!
When we return from the function, we have:
main -----
c [o]------------>
c
is now a dangling pointer to a freed chunk of memory.
Any access to c->value
will be a use-after-free error unless we point c->value
to another heap-allocated chunk!
What went wrong here?
It turns out that our code does not work when our two pointers src
and dst
are aliases to the same chunk of memory!
Pointer aliasing of this sort is a huge problem in C code.
Frequently, our code breaks down if it happens to be the case that two pointers we're operating over can point to the same value.
Furthermore, some compiler optimizations are not possible if the compiler cannot prove that two pointers point to distinct memory locations.
The fix to our code, it turns out, is simple: only perform the move
if the pointers point to distinct locations.
We can check this via a pointer equality comparison:
void move (cell *dst, cell *src) {
if (dst != src) {
free(dst->value);
dst->value = src->value;
}
}
Alternatively, if we want to document that dst
and src
are assumed to be pointing to distinct locations, we use the restrict
qualifier:
void move (cell * restrict dst, cell * restrict src) {
// ...
}
By marking the pointer types as restrict
, we are declaring that dst
and src
are restricted to point to distinct locations in memory.
Unlike const
, the compiler does not enforce this property on the pointers passed to move
.
Instead, this qualifier acts as documentation for users of the function.
It also allows the compiler to perform optimizations on the code assuming that dst
and src
aren't aliasing the same memory location.
(Strictly speaking, the actual semantics of restrict
state that any object accessed dst
can only be accessed through dst
in move
, likewise for src
.
This means, effectively, that src
and dst
point to distinct memory locations.)