Previously we studied the basic linked-based implementation of the list abstract data type. We can further augment this basic implementation in order to optimize common operations or specialize the list's behavior to domain-specific scenarios. In doing so, we begin making trade-offs between features, performance, and implementation complexity.

Memoizing the Size of a List

Recall that the size operation over a linked list traverses the list, counting every node. For small lists, this is not a problem in practice, but if the list is large, then traversing the list—perhaps multiple times in quick succession—is undesirable. Rather than recomputing the size of a list on the fly, we can, instead, remember the size as an additional field of the list structure:

typedef struct list {
    node *first;
    int size;
} list;

size now has a straightforward implementation.

int size(list *l) {
    return l->size;
}

Note that we want to keep the size function even though it is only accessing a field of list. This function still has the effect of hiding the implementation of the list structure even though the function, itself, is very simple.

While size is now simple, we must ensure that the list's size field is kept up-to-date by the other list functions. For example, make_list should initialize the size to be 0:

list* make_list(void) {
    list *ret = (list*) make_list();
    ret->first = NULL;
    ret->size = 0;
    return ret;
}

Functions that insert into the list, e.g., insert_front need to increment the size of the list:

void insert_front(int v, list *l) {
    l->first = make_node(v, l->first);
    l->size++;
}

And functions that remove from the list, e.g., remove_last need to decrement the size:

bool remove_last(list *l) {
    if (l->first == NULL) {
        return false;
    } else {
        node *cur = l->first;
        while (cur->next != NULL) {
            cur = cur->next;
        }
        free(cur->next);  // cur->next is NULL, so this doesn't leak memory!
        cur->next = NULL;
        l->size--;
    }
}

We can capture this responsibility of the list functions to keep the size field updated as an invariant on the list structure:

After the execution of every list function, the size field accurately records the size of the list.

This invariant induces extra burden on our implementation. In particular, you can imagine that it is likely we'll introduce at least one bug into our code by forgetting to update the size field. However, if we are willing to take on this burden, we gain the benefit of a quick way to lookup the size of a list.

End Links

Similar to size, we can also observe a similar sort of performance issue with insert_last. To insert onto the end of the list, we must traverse the entire list. In contrast, we only need to access list->front in order to insert into the front of the list.

To alleviate this issue, we can see that if we had a pointer to the last node of the list, we can use it to quickly append onto the end. This leads to the following variation on our list type:

typedef struct list {
    node *first;
    node *last;
} list;

Like with size, our functions must now ensure that last is always pointing to the last node of the list. Unlike size, keeping last up-to-date requires some careful thought! Let's consider updating insert_back to utilize the last pointer:

void insert_back(int v, list *l) {
    // this code has a bug! do you see it?
    // Hint: think about the possible cases of old_last...
    node *old_last = l->last;
    l->last = make_node(v, null);
    old_last->next = l->last;
}

What's wrong with this code? Consider the field access to old_last->next, this requires that old_last->next, the (old) last node of the list, is non-NULL. When might this not be the case? When the list is empty, of course! Thus, like the insert_front function, we must perform a case analysis on l->last:

void insert_back(int v, list *l) {
    // this code still has a bug! do you see it?
    // Hint: did we forget to update something?
    if (l->last == NULL) {
        l->last = make_node(v, null);
    } else {
        node *old_last = l->last;
        l->last = make_node(v, null);
        old_last->next = l->last;
    }
}

As the code suggests, we forgot to do something. What was it? We needed to update l->first as well because the list was empty! This final code snippet is correct:

void insert_back(int v, list *l) {
    // this code still has a bug! do you see it?
    // Hint: did we forget to update something?
    if (l->last == NULL) {
        l->last = make_node(v, null);
        l->first = l->last;
    } else {
        node *old_last = l->last;
        l->last = make_node(v, null);
        old_last->next = l->last;
    }
}

In the one-element case, both l->first and l->last point to the same node. Will this be a problem when we insert_back an additional element? Let's think through this case with diagrams! In the one element scenario, our list l looks as follows.

l [o]--->[0][/]
  [o]-----^

Let's assume the first cell of l is the first pointer and the second cell is the last pointer. Now, let's modify l so we insert 1 via insert_back. If we follow our code precisely, we:

  1. We hold a temporary pointer to the old last node of the list, 0.
  2. We make the last node of the list a new node containing 1.
  3. We update the old last node to point to this new node.

These three steps modify the diagram as follows:

  • Step 1:

    l        [o]--->[0][ ]
             [o]-----^
    old_last [o]-----|
    
  • Step 2:

    l        [o]--->[0][o]    [1][/]
             [o]-----^---------^
                     |
    old_last [o]-----|
    
  • Step 3:

    l        [o]--->[0][o]--->[1][/]
             [o]-----^---------^
                     |
    old_last [o]-----|
    

So our code works out in the end! But it wasn't clear at all it would do that. We really needed to resort to a diagram to check that our approach was correct. This is the complexity introduced by adding a pointer to the end of the list: we must be very careful about maintaining the first and last pointers since they might be updated at times we don't expect, particularly, when the list is empty.

Other Variants of Linked Structures

So far we have considered modifications to the list structure, but we might also consider modifying how our node structures are represented. For example, two common changes are:

  • Making nodes doubly-linked, that is, equipping each node with a previous pointer. Previous pointers allow us to perform right-to-left traversals of the list and can simplify trickier operations such as deletion.
  • Connecting the first and last nodes of the list, creating a circular structure. Circular structures arise naturally in a variety of contexts and it is sometimes useful to represent them directly in our code.

We will explore some of these structural variants in class.

Riffs on the List ADT

In addition to modifying the linked list implementation, we might also consider modifications of the list ADT itself. By modifications, we don't mean simply adding operations. Instead, we might consider other "list-like" ADTs that can also be implemented using linked lists.

Two of these ADTs are stacks and queues, list-like ADTs that put restrictions on how we can access elements in the list. In the normal list ADT, there usually exists a get-style operation that returns an element of a list by its index. However, many situations involve "list-like" structures but explicitly forbid accessing elements in the middle of the list. For example, with:

  • A line of people for an activity, we should only add people to the back and remove people from the front.
  • A stack of books, we should only add and remove books from the top of the stack.

These two scenarios give rise to two different variations of our list ADT:

  • A queue, a sequential, homogenous, variable-sized data structure that allows for efficient addition to the front and removal from the end of the structure.
  • A stack, a sequential, homogenous, variable-sized data structure that allows for efficient addition and removal from the front of the structure.

We can codify these interfaces with the following functions, enqueue/dequeue and push/pop, respectively:

// Adds element v to the front of queue q.
void enqueue(int v, queue *q);

// Removes the element at the back of the (assumed to be non-empty) queue.
int dequeue(queue *q);

// Adds element v to the front of stack s.
void push(int v, stack *s);

// Removes the element at the front of the (assumed to be non-empty) stack.
int pop(stack *s)

Note that these structures are precisely lists but with restrictions on how we access these elements. Consequently, there is less to say about the implementations of these functions with our linked implementation. We can implement each function using the standard list operations we've seen previously:

  • enqueue is insert_front.
  • dequeue is remove_back.
  • push is insert_front.
  • pop is remove_front.

Another way of describing stacks and queues are in terms of how elements flow through the structure:

  • An element in the queue starts at the front of the queue and exits the back of the queue. This is called first in, last out (FILO) behavior because the latest element added to the queue is the last to leave.
  • An element in the stack starts at the front of the stack and exits from the front of the stack. This is called first in, first out (FIFO) behavior because the latest element added to the stack is the first to leave.

Exercise: Stack 'em up

Create a complete stack ADT complete with a linked-based implementation. Create files:

  • stack.h
  • stack.c
  • main.c

Which define the interface, implementation, and tests for your stack implementation. Implement the following functions for your stack ADT:

// Returns true iff the stack is empty (contains no elements)
bool is_empty(stack *s);

// Adds element v to the front of stack s.
void push(int v, stack *s);

// Removes the element at the front of the (assumed to be non-empty) stack.
int pop(stack *s)

Note that these implementations should be straightforward adaptions of list functions you have already encountered!