Ownership

When sharing memory between structures, memory management bites us yet again. Consider the following situation where we have two linked lists, one of which is a shallow copy of the other:

Stack       Heap
-----       ----
l1 [o]----->[o]-->[0][o]-->[1][o]-->[2][/]
                   ∧ 
l2 [o]----->[o]----|

If we try to free both l1 and l2, we encounter a different kind of memory error, a double free error. This arises because free performs internal bookkeeping to track that the memory passed to it is now available for reuse. freeing memory again mucks up this internal bookkeeping, leading to undefined behavior!

If we were to allow this kind of memory sharing, we have to be very careful about how we deallocate this structure. We need to:

  1. Free the nodes of the list through exactly one of l1 or l2, say l1 for example's sake.
  2. Free the memory allocated to the list struct pointed at by l1.
  3. Free the memory allocated to the list struct pointed at by l2 without double-freeing the nodes that l2 now has a dangling pointer to.

This means we have to treat l2 specially! Such code is easy to write if l1 and l2 are both local variables in the same function, i.e., they are textually next to each other in our program. However, if l1 and l2 get passed to different parts of our code, performing such ad hoc behavior gets tricky. In particular, we have to guarantee that l1 is always deallocated before l2. Otherwise, we'll end up leaking memory!

Now, imagine even worse situations where the two linked lists share nodes in more exotic ways:

Stack       Heap
-----       ----
l1 [o]----->[o]-->[0][o]-->[1][o]-->[2][/]
                            ∧        
l2 [o]----->[o]-->[3][o]----|

Who is responsible for freeing which nodes between the two lists? How can we figure this out programmatically if the lists have arbitrary numbers of shared and unshared nodes?

Maintaining Discipline

In short, it is virtually impossible to handle intricate cases of memory sharing in a general way while also maintaining memory safety! Instead, it is more pragmatic to maintain a particular discipline or collection of design patterns that allows us to maintain our sanity while navigating these problems.

There are several such disciplines of manual memory management that we might adopt in our programs. One of the most common of these disciplines is that of pointer ownership. The key idea behind pointer ownership is that we maintain the following invariant about all of the heap-allocated chunks in our program:

At any point during execution of our program, each heap-allocated chunk has one pointer to it through which the memory is eventually freed We call this pointer the owner pointer.

For example, in the following code snippet:

char *buf = (char*) malloc(sizeof(char) * len);

We create a heap-allocated chunk, presumably to hold a string. The pointer buf becomes the owner pointer for this chunk as it is the only handle we have on the memory. If nothing else happens in our program, we will ultimately need to free the heap-allocated chunk through buf before buf goes out of scope.

Note that there is nothing in our program or C that enforces that buf is an owner pointer or that we must free buf before it goes out of scope. We must, instead, maintain this discipline through good programming practices, including documentation and naming, and diligence.

Transferring Ownership

A key feature of heap-allocated memory is that we can return it from a function. How does this work with owner pointers?

We can transfer ownership of a heap-allocated chunk from one pointer to another. This arises naturally with the constructor functions that we write, e.g., make_list:

list* make_list() {
    // ret is initially the owner for the heap-allocated chunk
    list *ret = (list*) malloc(sizeof(list));
    ret->first = NULL;
    // Now, ownership transfers from ret as it is deallocated...
    return ret;
}

int main() {
    // ...to lst in main!
    list *lst = make_list();

    // lst is ultimately responsible for freeing the chunk
    free_list(lst);
}

Observe in the sample code how ret is the initial owner of the list. But by virtue of returning a copy of this pointer, it transfers ownership to the caller of make_list. main is therefore responsible for freeing the heap-allocated chunk through lst.

The fact that make_list transfers ownership of a heap-allocated to its caller is important enough to document:

// Returns an owner pointer to a new, empty list on the heap.
list* make_list() {
  // ...
}

With this documentation, callers of make_list know that they are responsible for freeing the list they receive from the function.

While ownership transfer occurs most commonly with constructor functions, it also arises in situations where we allocate memory in multiple steps to build a larger structure. For example, we might use the getline function to heap-allocate a chunk of memory, populate it with input from the user, and then store a pointer to that chunk:

typedef struct {
    char *name;   // owner pointer
    int age;
} person;

// Returns a new, heap-allocated person. Ownership of name is transferred
// to this person upon construction.
person* make_person(char *name) {
    person *ret = (person*) malloc(sizeof(person));
    ret->name = name;
    // All people start at age zero!
    ret->age = 0;
    return ret;
}

void free_person(person *p) {
    // N.B., make sure to free name since we own it!
    free(p->name);
    free(p);
}

int main() {
  char *name;
  int sz;
  // Mutates name to point to heap-allocated line of input from stdin.
  getline(&name, &sz, stdin);
  person *p = make_person(name);
  // NULL out `name` since it has been transferred to p
  name = NULL;
  // ...
  free_person(p);
}

In this code, the name pointer in main owns the memory allocated by getline. But then, we transfer ownership of that chunk to the person struct via make_person. Therefore, we know that we do not need to free name in main. Instead, we can trust that free_person will eventually free that chunk for us.

Again, note how none of this is enforced by C. We need to rely on coding conventions to enforce this discipline. One good trick to make this more explicit in code is to reassign any transferred pointers to NULL. This ensures that if such a pointer is used after it has been transferred will result in a quick null pointer dereference instead of undefined behavior.

Borrowing a Pointer

Frequently, other parts of code will want to use a heap-allocated chunk without taking ownership of it. Thus, we also allow parts of code to borrow a heap-allocated chunk, in effect, creating additional pointers to a heap-allocated chunk beyond the owner pointer.

// age_person increments the age of (borrowed) person p by 1.
void age_person(person *p) {
    p->age++;
}

int main() {
    person *jo;
    // initialize jo somehow...
    age_person(jo);

    // Because jo is still the owner pointer for the memory, we must deallocate them now!
    free_person(jo);
}

In this example, age_person borrows jo by taking a copy of the jo pointer and performing a mutation through it.

We allow for multiple borrowed pointers to exist in our program. However, we know that if borrowed pointers exist, then it isn't safe for the owner pointer to free that memory. This is because we might try to erroneously access the chunk through our borrowed pointers if we prematurely free it.

Thus, we introduce the following rule for managing borrowed pointers:

An owner pointer cannot free its chunk of memory until all borrowed pointers to that chunk are deallocated.

Borrowing a pointer by passing it to a function is a simple case since that borrowed pointer is deallocated specifically at the last point that pointer could be used, i.e., at the end of its enclosing function! However, we can get into more intricate borrowing scenarios when those borrowed pointers are stored in other data structures whose lifetime might extend beyond the functions in which they are created.

int main(void) {
  char *stooge1, *stooge2, *stooge3;
  int sz;
  getline(&stooge1, &sz, stdin);
  getline(&stooge2, &sz, stdin);
  getline(&stooge3, &sz, stdin);

  // N.B., stooges should borrow its elements; it is not
  // responsible for freeing them
  char **stooges = (char*) malloc(sizeof(char*) * 3);
  stooges[0] = stooge1;
  stooges[1] = stooge2;
  stooges[2] = stooge3;

  // Do arbitrary stuff to stooges, passing them to other functions, etc.
 
  // Nevertheless, once stooges is deallocated...
  free(stooges);

  // We can free the stooges through our original pointers!
  free(stooge1);
  free(stooge2);
  free(stooge3);
}

In this example, we allow a data structure to borrow three heap-allocated chunks of memory. That data structure can do arbitrary things (besides deallocating the chunks, of course) and live for an arbitrary long period of time. However, by acknowledging that the data structure only borrowed the chunks, we know that we must eventually free the chunks through their original pointers.

Summary of Ownership Rules

Pointer ownership discipline keeps our heap manipulation code from being incomprehensible. This discipline consists of three key rules:

  1. At any point during execution of our program, each heap-allocated chunk has one pointer to it through which the memory is eventually freed We call this pointer the owner pointer.
  2. A copy of a pointer to a heap-allocated chunk is a borrowed pointer to that chunk. Other code can access the chunk through borrowed pointers.
  3. An owner pointer cannot free its chunk of memory until all borrowed pointers to that chunk are deallocated.