So far, we have used arrays to hold collections of data. However, arrays suffer from a terrible restriction: once declared, an array's size is fixed. In general, we need data structures that can hold a variable amount of data because we don't know how much data we will receive in advance. The canonical example of this situation is receiving data from the user in the form of console input or files.

In our discussion of abstract data types, we learned about the list ADT, an interface for a variable-sized, homogenous, sequential structure. Our growable buffer/arraylist is one canonical implementation of this interface but suffers from notable limitations. In particular, whenever we "outgrow" the underlying buffer, we have to copy all the elements of the original buffer to the new one. In some circumstances, this leads to undesirable performance!

The linked list is the other primary implementation of the list ADT. Rather than an array, a linked list utilizes a collection of nodes sequentially linked via pointers. It ties together all of the advanced systems programming concepts that we have discussed in the course so far: structs, pointers, and manual memory management. Furthermore, many structures can be thought of as a collection of linked structures, so the design patterns we learn here apply to many other situations we find ourselves in C. Thus, it is a fitting "right of passage" as we end the course---mastery of linked lists is a sign that we have mastered the fundamentals of C and systems programming!

Representation

Recall that in Scheme, we represent lists of data using box-and-pointer diagrams, e.g.,

[0][o]-->[1][o]-->[2][o]-->[3][o]-->[/]

This diagram represents the list [0, 1, 2, 3]. It is built with the expression:

(cons 0 (cons 1 (cons 2 (cons 3 null))))

The cons function produces one of these boxes. Such a box consists of:

  • A value and
  • A reference to the next box in the list.

We signify the end of the list using the value, null, written [/] in the diagram above.

In C, we mirror this structure with a struct we will call a node:

struct node {
    int value;
    struct node *next;
};

typedef struct node node;

A node possesses two fields corresponding to the value and the next node in the list. Recall that we can combine the struct and typedef declarations into a single declaration as follows:

typedef struct node {
    int value;
    struct node *next;
} node;

Because pointers are explicit in C, we can designate the NULL pointer as the "absence" of a node, i.e., the end of the list. In other words, the last node in our list will have a next field that contains NULL.

All of our nodes will be allocated on the heap because we don't know in advance how much data our list will store. It is, therefore, convenient to have a function that heap-allocates a chunk of memory to act as a node and populates that chunk with appropriate initial values:

node* make_node(int value, node *next) {
    node *ret = (node*) malloc(sizeof(node));
    ret->value = value;
    ret->next = next;
    return ret;
}

The critical difference between the Racket and C implementations of lists is that in Racket, lists are immutable---once created, a list is never modified. Consequently list operations, specified in a recursive manner produce a new list distinct from list given as input. In contrast, standard C implementations of linked lists are mutable---list operations modify the input list rather than produce new lists. Operations over C linked lists are, therefore, specified in an imperative style, i.e., with loops.

Because of these differences, we add a wrapper to our linked list that holds a pointer to a collection of nodes that constitute the actual list:

typedef struct list {
    node *first;
} list;

We will also heap-allocate our list and thus introduce a function for heap-allocating lists:

list* make_list(void) {
    list *ret = (list*) malloc(sizeof(list));
    ret->first = NULL;
    return ret;
}

Because we have designated a NULL pointer to be the end of a linked list, we represent an empty list by a list whose first field is NULL. Producing a list using this function, e.g., list *l = make_list();, produces the following memory diagram:

Stack      Heap
-----      ----
l [o]----->[\]

The single box in the heap is our list that has a single field, first, that is initially NULL. If we want to add elements onto our list, we can explicitly modify the pointers of the list and its underlying nodes. For example, the assignment l->first = make_node(0, NULL) changes the diagram as follows:

Stack      Heap
-----      ----
l [o]----->[o]-->[0][\]

To add onto the end of the list, we chain together field accesses to modify the last pointer in the list. For example, l->first->next = make_node(1, make_node(2, NULL)).

Stack      Heap
-----      ----
l [o]----->[o]-->[0][o]-->[1][o]-->[2][\]

Exercise: Write It Down

Assume that we previously defined the node and list types given in the reading. Write down the state of memory at each of the indicated points in the following program:

int main(void) {
    list *l = make_list();
    node *n1 = make_node(42, NULL);
    // POINT A
    l->first = n1;
    n1->next = make_node(11, NULL);
    // POINT B
    n1->next->next = make_node(35, n1);
    // POINT C
    return 0;
}