When applying data structures to real-world problems, we will frequently need to adapt them to particular domains. In order to have the flexibility to write a variety of operations over a data structure, we need to build a vocabulary of basic methods for manipulating that structure, something I call motions. For example, one motion that you are well-acquainted with is the array traversal:
int *arr, len;
for (int i = 0; i < len; i++) {
\\ ...
arr[i]
\\ ...
}
This motion is something that should come almost automatically to you at this point. If you identify that you need to traverse an array, you can produce this skeleton of code without much thought. Furthermore, you can adapt this skeleton to various situations, e.g., traversing every other element starting with the second:
int *arr, len;
for (int i = 1; i < len; i += 2) {
\\ ...
arr[i]
\\ ...
}
In this section, we investigate the three basic motions you need to know in order to use a linked list: insertion, traversal, and deletion.
Insertion
Suppose that we wish to insert a new element into the front of the list. Concretely, consider the list:
Stack Heap
----- ----
l [o]----->[o]-->[1][o]-->[2][\]
How can we insert the value 0
into the front of the list?
First, we can create a new node that contains 0
and points to the front of the list:
Stack Heap
----- ----
l [o]----->[o]-->[1][o]-->[2][o]-->
^
|
[0][o]
Now, to fix up the diagram, we need to make the first
pointer of our list point to this new node.
Stack Heap
----- ----
l [o]----->[o] [1][o]-->[2][o]-->
| ^
| |
->[0][o]
Summarizing the required operations, we:
- Create a new node that contains the value to insert and points to the current first node of the list.
- Make the list point to this new node as the head of the list.
We can directly translate them into the following insert_front
function:
void insert_front(list *list, int v) {
node *n = make_node(v, list->first);
list->first = n;
}
We can also collapse these two statements into one:
void insert_front(list *list, int v) {
list->first = make_node(v, list->first);
}
It is worthwhile to understand how this final function works precisely.
Because we evaluate the right-hand side of the assignment first, make_node
is passed what list->first
initial points to.
The assignment then changes list->first
to point to the result of make_node
.
This compact line can be summarized as the insertion motion for linked lists:
int v;
node *node;
node = make_node(v, node);
We can think of the line as saying "update node
to point to a new node in front of what node
used to point to".
As a final point, it is worthwhile to note that this function works in the case where the list is empty:
Stack Heap
----- ----
l [o]----->[\]
list->first
is NULL
which means our new node points to NULL
:
Stack Heap
----- ----
l [o]----->[o]-->[0][\]
But this is exactly the right thing to do for insert_front
in this case!
This fact is the reason why we included the list
struct in our implementation: we needed one level of indirection, i.e., to the head of the list, in order to handle this special case in a uniform manner.
Traversal
Next, let's consider writing a function that computes the total number of elements in the list, its size. To do this for our example list:
Stack Heap
----- ----
l [o]----->[o]-->[0][o]-->[1][o]-->[2][\]
We must traverse each element, counting the number of elements we visit along the way. If we were to perform a similar operation for an array, e.g., summing up its elements, we would use a for-loop:
int sum = 0;
for (int i = 0; i < len; i++) {
sum = sum + arr[i];
}
Where the variable i
represents the current index we are examining in the array.
We want to apply a similar kind of logic here, but we cannot directly access a linked list's elements by-index. Instead, we will directly hold a pointer to the current node that we are examining:
Stack Heap
----- ----
l [o]----->[o]-->[0][o]-->[1][o]-->[2][\]
^
|
cur
Initially this pointer, traditionally called cur
, points to the first node in the list.
We can declare this auxiliary variable as follows:
node *cur = l->first;
We then advance the pointer to examine the next element.
How do we advance cur
?
Advancing cur
means that we need to assign it a new value.
The new value we would like to assign cur
is a pointer to the node that cur
is pointing to, i.e., cur->next
.
Thus, the line of code to perform the update is:
cur = cur->next;
This results in the following updated diagram.
Stack Heap
----- ----
l [o]----->[o]-->[0][o]-->[1][o]-->[2][\]
^
|
cur
We can then continue to advance the pointer in a loop until we reach the end of the list:
Stack Heap
----- ----
l [o]----->[o]-->[0][o]-->[1][o]-->[2][\]
[\]
cur
From the diagram, we see that this situation arises when cur
is NULL
, i.e., when cur
is assigned a copy of the node's next
field that holds the value 2
.
Putting this all together, we can implement the function as follows;
int size(list *list) {
int sz = 0;
node *cur = list->first;
while (cur != NULL) {
sz = sz + 1;
cur = cur->next;
}
return sz;
}
In summary, the traversal motion looks as follows:
list *list;
node *cur = list->first;
while (cur != NULL) {
... cur-> value ...
cur = cur->next;
}
Most linked list operations require some form of traversal, each with their own little spin on the traversal pattern. It is, therefore, worthwhile to investigate this motion on some additional examples. Let's next consider how to insert at the end of the list. If we want to insert 3 onto the end of the following list:
Stack Heap
----- ----
l [o]----->[o]-->[0][o]-->[1][o]-->[2][\]
We need to traverse to the end of the list to modify its last pointer.
Our standard traversal motion stops when cur
is NULL
:
Stack Heap
----- ----
l [o]----->[o]-->[0][o]-->[1][o]-->[2][\]
[\]
cur
But this is too late!
We need to stop when cur
is on the last node, so we can modify its next
field.
We can accomplish this behavior by modifying the guard of the while-loop so that we traverse through the list until cur->next
is NULL
, i.e., we reach the last node in the list:
while (cur->next != NULL) { /* ... */ }
However, this introduces one additional complication.
In the case where the list is empty, cur
is initially NULL
.
Therefore, the first check of this guard will result in a null pointer access where we try to access the next
field of a NULL
pointer, resulting in undefined behavior.
Because of this, our insert_end
function has two cases: whether the list is empty or not.
void insert_end(list *list, int v) {
node *n = make_node(v, NULL);
if (list->first == NULL) {
list->first = n;
} else {
node *cur = list->first;
while (cur->next != NULL) {
cur = cur->next;
}
cur->next = n;
}
In the case when the list is empty, insert_end
behaves like insert_front
.
Otherwise, we traverse the list until cur
is on the last node.
At this point, we fix up the last node's pointer to point to our new element.
Many list operations that we write have this basic skeleton:
if (list->first == NULL) {
// Do the operation on an empty list
} else {
// Do the operation on a non-empty list
}
It is frequently worthwhile to:
- Design our functions with this mindset up-front---what does our operation do on the empty list and non-empty list?---and
- Test our list functions on both empty and non-empty lists.
Finally, consider the dual operations to make_list
and make_node
: free_list
and free_node
.
We have some freedom in deciding the behavior of each function.
In particular, should make_node
free the rest of the nodes that follow it in the list or just itself?
Suppose that we choose the former:
void free_node(node *node) {
free(node); // N.B. only frees this node, not its next field.
}
How do we implement free_list
using free_node
?
We can perform a standard traversal of the list, freeing every node as we move along.
However, if we try the following code snippet inside of our traversal loop:
free_node(cur);
cur = cur->next; // Bad!
We end up committing a memory error, namely a use-after-free error because we have already freed cur
!
The simple workaround is to save a pointer to the next
node, free cur
, and then finally use that saved pointer to advance.
void free_list(list *list) {
node *cur = list->first;
while (cur != NULL) {
node *next = cur->next;
free_node(cur);
cur = next;
}
free(list); // Note: don't forget this part!
}
Recursive Traversal
Suppose instead that we wanted our node to be responsible for freeing the nodes that follow it in the list. We could implement a similar loop. However, instead, we can achieve an elegant solution by employing recursive program design ala Racket. Recall that when designing recursive functions over lists, we work with the following design skeleton:
- A base case: when the list is empty and
- A recursive case: when the list is non-empty.
Here, these cases correspond to when the node we are at is NULL
(empty) or non-NULL
(non-empty).
We can design our recursive function as follows:
- To free an empty list, we do nothing.
- To free a non-empty list, we free the rest of the list, and then free its head.
And then translate it directly into code:
void free_node(node *node) {
if (node != NULL) {
free_node(node->next);
free(node);
}
}
Then free_list
has the straightforward implementation:
void free_list(list *list) {
free_node(list->first); // recursively frees all nodes
free(list);
}
Again, either implementation of the functions is acceptable---as the designer of the linked list, you simply need to choose one and stick with it throughout your design.
As another example of recursive design, we can rewrite size
to be recursive rather than iterative.
Note that the signature of the size
function as is, int size(list *list)
, is not appropriate for recursion argument, a list
is not recursively defined---the nodes are!
Because of this, we must write a helper function that actually performs the recursion. We can recursively define the function as follows:
- The size of an empty list is zero.
- The size of a non-empty list is one plus the size of the rest of the list.
And then translate that definition into code:
int node_size(node *node) {
if (node == NULL) {
return 0;
} else {
return 1 + node_size(node->next);
}
}
We can then define the user-facing size
function in terms of this helper:
int size(list *list) {
return node_size(list->first);
}
In general, the shape of defining a recursive function over our linked lists has the form:
void helper(node *node) {
if (node == NULL) {
// base case---empty list
} else {
// recursive case---non-empty list
}
}
void func(list *list) {
return helper(list->first);
}
Where func
is the user-facing function and helper
is the function that actually performs the recursive work over the nodes of the list.
As a rule of thumb, we will favor the iterative version of these operations over the recursive versions.
This is because there is a space cost to maintaining all these function calls that the recursive version performs that is not present with the iterative version.
Imagine what the stack looks like when we call size
on the list, and we finally reach the NULL
case:
Stack Heap
----- ----
... [0][o]-->[1][o]-->[2][o]-->
size --- ^ ^ ^
node [o]--------| | |
size --- | |
node [o]-----------------| |
size --- |
node [o]---------------------------
size ---
node [\]
Each recursive function call awaits for the recursive call it makes to return.
This means that if the list has n
elements, then the stack will grow by n
function calls before it begins to return.
In contrast, the imperative version of size
only requires a single function call, so it does not require any additional stack space to execute.
Functional Aside: Tail-call Optimization Revisited
The space cost induced by the recursive version of size
seems to be unavoidable.
However, with a little bit of ingenuity and an optimization known as tail-call optimization, we can make the recursive version of size
require the same space (roughly) as the imperative version.
The key insight with this optimization is that if a function's final operation is to make a recursive function call, we don't need to create a new stack frame.
Instead, we can simply reuse the current stack frame as the new recursive function call's stack frame.
By doing so, all the recursive function calls that a function makes only take up a single stack frame rather than n
stack frames.
This optimization requires that the function to-be-optimized end with a recursive function call.
size
is currently not in this form!
Consider the single line that makes up the recursive case of size
:
return 1 + node_size(node->next);
While the final line of the function contains a recursive function call, note that we must first make the recursive function call and then add one to the result.
The fact node_size
still has work to do after the recursive call returns (namely add one to the returned value) means that this version of the function cannot be tail-call optimized.
We cannot end the function by performing an addition.
To alleviate this issue, we need to re-organize how the function works.
Rather than adding one after the recursive call returns, we need to add one before we make the recursive call.
But what are we adding one to then, if we can't add it to the result of the recursive call?
We instead add one to a new parameter of node_size
that accumulates the size of the list:
int node_size(node *node, int acc) {
if (node == NULL) {
return acc;
} else {
return node_size(node->next, acc + 1);
}
}
int size(list *list) {
return node_size(list->first, 0);
}
Now, node_size
ends with a recursive function call because the addition happens before the call is made---namely when we evaluate the parameters to the call.
In general, we are inverting how node_size
performs its computation.
Rather than bubbling up the accumulated size from the end of the list back to the front, we push down the accumulated size from the beginning to the end of the list.
We can generalize this approach to an advanced form of programming called continuation-passing style where functions take as arguments a continuation denoting what computation to perform after the function completes.
With node_size
, the "continuation" is simply the value that should be returned after all the node_size
calls complete.
Deletion
Finally, how do we remove a particular element, call it v
, from a linked list?
Let's consider some cases.
First, if there is nothing in the list:
Stack Heap
----- ----
l [o]----->[o]-->[\]
Then there is nothing to do.
In a non-empty list, if v
is the first element:
Stack Heap
----- ----
l [o]----->[o]-->[v][o]-->[0][o]-->[1][o]-->
We must fix up l->first
to point to the node that l->first
points to.
This pointer is l->first->next
, so we have the assignment:
l->first = l->first->next;
This code performs the deletion.
However, if v
is not the first element:
Heap
----
... -->[0][o]-->[v][o]-->[1][o]--> ...
^
|
cur
Then we will need to modify our cur
's next
pointer to point to what cur->next
points to.
Like the l->first
case, this node is cur->next->next
which gives us the assignment:
cur->next = cur->next->next;
These three cases give us the following implementation for delete
:
// returns true iff the first occurrence of v is deleted from the list
bool delete(list *list, int v) {
if (list->first == NULL) {
return false;
} else if (list->first->value == v) {
node *to_del = list->first;
list->first = list->first->next;
free_node(to_del); // assumes that free_node only frees to_del
return true;
} else {
node *cur = list->first;
while (cur->next != NULL) {
if (cur->next->value == v) {
node_t *to_del = cur->next;
cur->next = cur->next->next;
free_node(to_del);
return true;
}
}
return false;
}
}
Deletion from a linked list provides to be tricker than insertion with several cases, but, nevertheless, the deletion motion is consistent in each case:
node_t *node;
node_t *to_del = node->next;
node->next = node->next->next;
free_node(to_del);
Exercise: One of Many
Implement a function int index_of(list *l, int v)
that returns the index of v
in l
or -1
if v
is not found within l
.
To do this, identify which of the three kinds of linked list motions appropriately captures the behavior of index_of
and then adapt the motion skeleton code towards implementing the function.