When applying data structures to real-world problems, we will frequently need to adapt them to particular domains. In order to have the flexibility to write a variety of operations over a data structure, we need to build a vocabulary of basic methods for manipulating that structure, something I call motions. For example, one motion that you are well-acquainted with is the array traversal:

int *arr, len;
for (int i = 0; i < len; i++) {
    \\ ...
    arr[i]
    \\ ...
}

This motion is something that should come almost automatically to you at this point. If you identify that you need to traverse an array, you can produce this skeleton of code without much thought. Furthermore, you can adapt this skeleton to various situations, e.g., traversing every other element starting with the second:

int *arr, len;
for (int i = 1; i < len; i += 2) {
    \\ ...
    arr[i]
    \\ ...
}

In this section, we investigate the three basic motions you need to know in order to use a linked list: insertion, traversal, and deletion.

Insertion

Suppose that we wish to insert a new element into the front of the list. Concretely, consider the list:

Stack      Heap
-----      ----
l [o]----->[o]-->[1][o]-->[2][\]

How can we insert the value 0 into the front of the list? First, we can create a new node that contains 0 and points to the front of the list:

Stack      Heap
-----      ----
l [o]----->[o]-->[1][o]-->[2][o]-->
                  ^
                  |
              [0][o]

Now, to fix up the diagram, we need to make the first pointer of our list point to this new node.

Stack      Heap
-----      ----
l [o]----->[o]   [1][o]-->[2][o]-->
            |     ^
            |     |
            ->[0][o]

Summarizing the required operations, we:

  1. Create a new node that contains the value to insert and points to the current first node of the list.
  2. Make the list point to this new node as the head of the list.

We can directly translate them into the following insert_front function:

void insert_front(list *list, int v) {
    node *n = make_node(v, list->first);
    list->first = n;
}

We can also collapse these two statements into one:

void insert_front(list *list, int v) {
    list->first = make_node(v, list->first);
}

It is worthwhile to understand how this final function works precisely. Because we evaluate the right-hand side of the assignment first, make_node is passed what list->first initial points to. The assignment then changes list->first to point to the result of make_node.

This compact line can be summarized as the insertion motion for linked lists:

int v;
node *node;
node = make_node(v, node);

We can think of the line as saying "update node to point to a new node in front of what node used to point to".

As a final point, it is worthwhile to note that this function works in the case where the list is empty:

Stack      Heap
-----      ----
l [o]----->[\]

list->first is NULL which means our new node points to NULL:

Stack      Heap
-----      ----
l [o]----->[o]-->[0][\]

But this is exactly the right thing to do for insert_front in this case! This fact is the reason why we included the list struct in our implementation: we needed one level of indirection, i.e., to the head of the list, in order to handle this special case in a uniform manner.

Traversal

Next, let's consider writing a function that computes the total number of elements in the list, its size. To do this for our example list:

Stack      Heap
-----      ----
l [o]----->[o]-->[0][o]-->[1][o]-->[2][\]

We must traverse each element, counting the number of elements we visit along the way. If we were to perform a similar operation for an array, e.g., summing up its elements, we would use a for-loop:

int sum = 0;
for (int i = 0; i < len; i++) {
    sum = sum + arr[i];
}

Where the variable i represents the current index we are examining in the array.

We want to apply a similar kind of logic here, but we cannot directly access a linked list's elements by-index. Instead, we will directly hold a pointer to the current node that we are examining:

Stack      Heap
-----      ----
l [o]----->[o]-->[0][o]-->[1][o]-->[2][\]
                  ^
                  |
                 cur

Initially this pointer, traditionally called cur , points to the first node in the list. We can declare this auxiliary variable as follows:

node *cur = l->first;

We then advance the pointer to examine the next element. How do we advance cur? Advancing cur means that we need to assign it a new value. The new value we would like to assign cur is a pointer to the node that cur is pointing to, i.e., cur->next. Thus, the line of code to perform the update is:

cur = cur->next;

This results in the following updated diagram.

Stack      Heap
-----      ----
l [o]----->[o]-->[0][o]-->[1][o]-->[2][\]
                            ^
                            |
                           cur

We can then continue to advance the pointer in a loop until we reach the end of the list:

Stack      Heap
-----      ----
l [o]----->[o]-->[0][o]-->[1][o]-->[2][\]
                                            
                                          [\]
                                          cur

From the diagram, we see that this situation arises when cur is NULL, i.e., when cur is assigned a copy of the node's next field that holds the value 2. Putting this all together, we can implement the function as follows;

int size(list *list) {
    int sz = 0;
    node *cur = list->first;
    while (cur != NULL) {
        sz = sz + 1;
        cur = cur->next;
    }
    return sz;
}

In summary, the traversal motion looks as follows:

list *list;
node *cur = list->first;
while (cur != NULL) {
    ... cur-> value ...
    cur = cur->next;
}

Most linked list operations require some form of traversal, each with their own little spin on the traversal pattern. It is, therefore, worthwhile to investigate this motion on some additional examples. Let's next consider how to insert at the end of the list. If we want to insert 3 onto the end of the following list:

Stack      Heap
-----      ----
l [o]----->[o]-->[0][o]-->[1][o]-->[2][\]

We need to traverse to the end of the list to modify its last pointer. Our standard traversal motion stops when cur is NULL:

Stack      Heap
-----      ----
l [o]----->[o]-->[0][o]-->[1][o]-->[2][\]
                                          [\]
                                          cur

But this is too late! We need to stop when cur is on the last node, so we can modify its next field. We can accomplish this behavior by modifying the guard of the while-loop so that we traverse through the list until cur->next is NULL, i.e., we reach the last node in the list:

while (cur->next != NULL) { /* ... */ }

However, this introduces one additional complication. In the case where the list is empty, cur is initially NULL. Therefore, the first check of this guard will result in a null pointer access where we try to access the next field of a NULL pointer, resulting in undefined behavior.

Because of this, our insert_end function has two cases: whether the list is empty or not.

void insert_end(list *list, int v) {
    node *n = make_node(v, NULL);
    if (list->first == NULL) {
        list->first = n;
    } else {
        node *cur = list->first;
        while (cur->next != NULL) {
            cur = cur->next;
        }
        cur->next = n;
    }

In the case when the list is empty, insert_end behaves like insert_front. Otherwise, we traverse the list until cur is on the last node. At this point, we fix up the last node's pointer to point to our new element. Many list operations that we write have this basic skeleton:

if (list->first == NULL) {
    // Do the operation on an empty list
} else {
    // Do the operation on a non-empty list
}

It is frequently worthwhile to:

  • Design our functions with this mindset up-front---what does our operation do on the empty list and non-empty list?---and
  • Test our list functions on both empty and non-empty lists.

Finally, consider the dual operations to make_list and make_node: free_list and free_node. We have some freedom in deciding the behavior of each function. In particular, should make_node free the rest of the nodes that follow it in the list or just itself? Suppose that we choose the former:

void free_node(node *node) {
    free(node);   // N.B. only frees this node, not its next field.
}

How do we implement free_list using free_node? We can perform a standard traversal of the list, freeing every node as we move along. However, if we try the following code snippet inside of our traversal loop:

free_node(cur);
cur = cur->next;  // Bad!

We end up committing a memory error, namely a use-after-free error because we have already freed cur! The simple workaround is to save a pointer to the next node, free cur, and then finally use that saved pointer to advance.

void free_list(list *list) {
    node *cur = list->first;
    while (cur != NULL) {
        node *next = cur->next;
        free_node(cur);
        cur = next;
    }
    free(list);   // Note: don't forget this part!
}

Recursive Traversal

Suppose instead that we wanted our node to be responsible for freeing the nodes that follow it in the list. We could implement a similar loop. However, instead, we can achieve an elegant solution by employing recursive program design ala Racket. Recall that when designing recursive functions over lists, we work with the following design skeleton:

  • A base case: when the list is empty and
  • A recursive case: when the list is non-empty.

Here, these cases correspond to when the node we are at is NULL (empty) or non-NULL (non-empty). We can design our recursive function as follows:

  • To free an empty list, we do nothing.
  • To free a non-empty list, we free the rest of the list, and then free its head.

And then translate it directly into code:

void free_node(node *node) {
    if (node != NULL) {
        free_node(node->next);
        free(node);
    }
}

Then free_list has the straightforward implementation:

void free_list(list *list) {
    free_node(list->first);   // recursively frees all nodes
    free(list);
}

Again, either implementation of the functions is acceptable---as the designer of the linked list, you simply need to choose one and stick with it throughout your design.

As another example of recursive design, we can rewrite size to be recursive rather than iterative. Note that the signature of the size function as is, int size(list *list), is not appropriate for recursion argument, a list is not recursively defined---the nodes are!

Because of this, we must write a helper function that actually performs the recursion. We can recursively define the function as follows:

  • The size of an empty list is zero.
  • The size of a non-empty list is one plus the size of the rest of the list.

And then translate that definition into code:

int node_size(node *node) {
    if (node == NULL) {
        return 0;
    } else {
        return 1 + node_size(node->next);
    }
}

We can then define the user-facing size function in terms of this helper:

int size(list *list) {
    return node_size(list->first);
}

In general, the shape of defining a recursive function over our linked lists has the form:

void helper(node *node) {
    if (node == NULL) {
        // base case---empty list
    } else {
        // recursive case---non-empty list
    }
}

void func(list *list) {
    return helper(list->first);
}

Where func is the user-facing function and helper is the function that actually performs the recursive work over the nodes of the list.

As a rule of thumb, we will favor the iterative version of these operations over the recursive versions. This is because there is a space cost to maintaining all these function calls that the recursive version performs that is not present with the iterative version. Imagine what the stack looks like when we call size on the list, and we finally reach the NULL case:

Stack          Heap
-----          ----
...          [0][o]-->[1][o]-->[2][o]-->
size ---        ^        ^        ^
node [o]--------|        |        |
size ---                 |        |
node [o]-----------------|        |
size ---                          |
node [o]---------------------------
size ---
node [\]

Each recursive function call awaits for the recursive call it makes to return. This means that if the list has n elements, then the stack will grow by n function calls before it begins to return. In contrast, the imperative version of size only requires a single function call, so it does not require any additional stack space to execute.

Functional Aside: Tail-call Optimization Revisited

The space cost induced by the recursive version of size seems to be unavoidable. However, with a little bit of ingenuity and an optimization known as tail-call optimization, we can make the recursive version of size require the same space (roughly) as the imperative version.

The key insight with this optimization is that if a function's final operation is to make a recursive function call, we don't need to create a new stack frame. Instead, we can simply reuse the current stack frame as the new recursive function call's stack frame. By doing so, all the recursive function calls that a function makes only take up a single stack frame rather than n stack frames.

This optimization requires that the function to-be-optimized end with a recursive function call. size is currently not in this form! Consider the single line that makes up the recursive case of size:

return 1 + node_size(node->next);

While the final line of the function contains a recursive function call, note that we must first make the recursive function call and then add one to the result. The fact node_size still has work to do after the recursive call returns (namely add one to the returned value) means that this version of the function cannot be tail-call optimized.

We cannot end the function by performing an addition. To alleviate this issue, we need to re-organize how the function works. Rather than adding one after the recursive call returns, we need to add one before we make the recursive call. But what are we adding one to then, if we can't add it to the result of the recursive call? We instead add one to a new parameter of node_size that accumulates the size of the list:

int node_size(node *node, int acc) {
    if (node == NULL) {
        return acc;
    } else {
        return node_size(node->next, acc + 1);
    }
}

int size(list *list) {
    return node_size(list->first, 0);
}

Now, node_size ends with a recursive function call because the addition happens before the call is made---namely when we evaluate the parameters to the call. In general, we are inverting how node_size performs its computation. Rather than bubbling up the accumulated size from the end of the list back to the front, we push down the accumulated size from the beginning to the end of the list. We can generalize this approach to an advanced form of programming called continuation-passing style where functions take as arguments a continuation denoting what computation to perform after the function completes. With node_size, the "continuation" is simply the value that should be returned after all the node_size calls complete.

Deletion

Finally, how do we remove a particular element, call it v, from a linked list? Let's consider some cases. First, if there is nothing in the list:

Stack      Heap
-----      ----
l [o]----->[o]-->[\]

Then there is nothing to do. In a non-empty list, if v is the first element:

Stack      Heap
-----      ----
l [o]----->[o]-->[v][o]-->[0][o]-->[1][o]-->

We must fix up l->first to point to the node that l->first points to. This pointer is l->first->next, so we have the assignment:

l->first = l->first->next;

This code performs the deletion.

However, if v is not the first element:

Heap
----
... -->[0][o]-->[v][o]-->[1][o]--> ...
        ^
        |
        cur

Then we will need to modify our cur's next pointer to point to what cur->next points to. Like the l->first case, this node is cur->next->next which gives us the assignment:

cur->next = cur->next->next;

These three cases give us the following implementation for delete:

// returns true iff the first occurrence of v is deleted from the list
bool delete(list *list, int v) {
    if (list->first == NULL) {
        return false;
    } else if (list->first->value == v) {
        node *to_del = list->first;
        list->first = list->first->next;
        free_node(to_del);    // assumes that free_node only frees to_del
        return true;
    } else {
        node *cur = list->first;
        while (cur->next != NULL) {
            if (cur->next->value == v) {
                node_t *to_del = cur->next;
                cur->next = cur->next->next;
                free_node(to_del);
                return true;
            }
        }
        return false;
    }
}

Deletion from a linked list provides to be tricker than insertion with several cases, but, nevertheless, the deletion motion is consistent in each case:

node_t *node;
node_t *to_del = node->next;
node->next = node->next->next;
free_node(to_del);

Exercise: One of Many

Implement a function int index_of(list *l, int v) that returns the index of v in l or -1 if v is not found within l. To do this, identify which of the three kinds of linked list motions appropriately captures the behavior of index_of and then adapt the motion skeleton code towards implementing the function.