Previously we studied the basic linked-based implementation of the list abstract data type. We can further augment this basic implementation in order to optimize common operations or specialize the list's behavior to domain-specific scenarios. In doing so, we begin making trade-offs between features, performance, and implementation complexity.
Memoizing the Size of a List
Recall that the size
operation over a linked list traverses the list, counting every node.
For small lists, this is not a problem in practice, but if the list is large, then traversing the list—perhaps multiple times in quick succession—is undesirable.
Rather than recomputing the size
of a list on the fly, we can, instead, remember the size as an additional field of the list
structure:
typedef struct list {
node *first;
int size;
} list;
size
now has a straightforward implementation.
int size(list *l) {
return l->size;
}
Note that we want to keep the size
function even though it is only accessing a field of list
.
This function still has the effect of hiding the implementation of the list
structure even though the function, itself, is very simple.
While size
is now simple, we must ensure that the list
's size
field is kept up-to-date by the other list
functions.
For example, make_list
should initialize the size
to be 0
:
list* make_list(void) {
list *ret = (list*) make_list();
ret->first = NULL;
ret->size = 0;
return ret;
}
Functions that insert into the list, e.g., insert_front
need to increment the size of the list:
void insert_front(int v, list *l) {
l->first = make_node(v, l->first);
l->size++;
}
And functions that remove from the list, e.g., remove_last
need to decrement the size:
bool remove_last(list *l) {
if (l->first == NULL) {
return false;
} else {
node *cur = l->first;
while (cur->next != NULL) {
cur = cur->next;
}
free(cur->next); // cur->next is NULL, so this doesn't leak memory!
cur->next = NULL;
l->size--;
}
}
We can capture this responsibility of the list
functions to keep the size
field updated as an invariant on the list
structure:
After the execution of every
list
function, thesize
field accurately records the size of the list.
This invariant induces extra burden on our implementation.
In particular, you can imagine that it is likely we'll introduce at least one bug into our code by forgetting to update the size
field.
However, if we are willing to take on this burden, we gain the benefit of a quick way to lookup the size of a list.
End Links
Similar to size
, we can also observe a similar sort of performance issue with insert_last
.
To insert onto the end of the list, we must traverse the entire list.
In contrast, we only need to access list->front
in order to insert into the front of the list.
To alleviate this issue, we can see that if we had a pointer to the last node of the list, we can use it to quickly append onto the end. This leads to the following variation on our list type:
typedef struct list {
node *first;
node *last;
} list;
Like with size
, our functions must now ensure that last
is always pointing to the last node of the list.
Unlike size
, keeping last
up-to-date requires some careful thought!
Let's consider updating insert_back
to utilize the last
pointer:
void insert_back(int v, list *l) {
// this code has a bug! do you see it?
// Hint: think about the possible cases of old_last...
node *old_last = l->last;
l->last = make_node(v, null);
old_last->next = l->last;
}
What's wrong with this code?
Consider the field access to old_last->next
, this requires that old_last->next
, the (old) last node of the list, is non-NULL
.
When might this not be the case?
When the list is empty, of course!
Thus, like the insert_front
function, we must perform a case analysis on l->last
:
void insert_back(int v, list *l) {
// this code still has a bug! do you see it?
// Hint: did we forget to update something?
if (l->last == NULL) {
l->last = make_node(v, null);
} else {
node *old_last = l->last;
l->last = make_node(v, null);
old_last->next = l->last;
}
}
As the code suggests, we forgot to do something.
What was it?
We needed to update l->first
as well because the list was empty!
This final code snippet is correct:
void insert_back(int v, list *l) {
// this code still has a bug! do you see it?
// Hint: did we forget to update something?
if (l->last == NULL) {
l->last = make_node(v, null);
l->first = l->last;
} else {
node *old_last = l->last;
l->last = make_node(v, null);
old_last->next = l->last;
}
}
In the one-element case, both l->first
and l->last
point to the same node.
Will this be a problem when we insert_back
an additional element?
Let's think through this case with diagrams!
In the one element scenario, our list l
looks as follows.
l [o]--->[0][/]
[o]-----^
Let's assume the first cell of l
is the first
pointer and the second cell is the last
pointer.
Now, let's modify l
so we insert 1
via insert_back
.
If we follow our code precisely, we:
- We hold a temporary pointer to the old last node of the list,
0
. - We make the last node of the list a new node containing
1
. - We update the old last node to point to this new node.
These three steps modify the diagram as follows:
-
Step 1:
l [o]--->[0][ ] [o]-----^ old_last [o]-----|
-
Step 2:
l [o]--->[0][o] [1][/] [o]-----^---------^ | old_last [o]-----|
-
Step 3:
l [o]--->[0][o]--->[1][/] [o]-----^---------^ | old_last [o]-----|
So our code works out in the end!
But it wasn't clear at all it would do that.
We really needed to resort to a diagram to check that our approach was correct.
This is the complexity introduced by adding a pointer to the end of the list: we must be very careful about maintaining the first
and last
pointers since they might be updated at times we don't expect, particularly, when the list is empty.
Other Variants of Linked Structures
So far we have considered modifications to the list
structure, but we might also consider modifying how our node
structures are represented.
For example, two common changes are:
- Making nodes doubly-linked, that is, equipping each node with a previous pointer. Previous pointers allow us to perform right-to-left traversals of the list and can simplify trickier operations such as deletion.
- Connecting the first and last nodes of the list, creating a circular structure. Circular structures arise naturally in a variety of contexts and it is sometimes useful to represent them directly in our code.
We will explore some of these structural variants in class.
Riffs on the List ADT
In addition to modifying the linked list implementation, we might also consider modifications of the list ADT itself. By modifications, we don't mean simply adding operations. Instead, we might consider other "list-like" ADTs that can also be implemented using linked lists.
Two of these ADTs are stacks and queues, list-like ADTs that put restrictions on how we can access elements in the list.
In the normal list ADT, there usually exists a get
-style operation that returns an element of a list by its index.
However, many situations involve "list-like" structures but explicitly forbid accessing elements in the middle of the list.
For example, with:
- A line of people for an activity, we should only add people to the back and remove people from the front.
- A stack of books, we should only add and remove books from the top of the stack.
These two scenarios give rise to two different variations of our list ADT:
- A queue, a sequential, homogenous, variable-sized data structure that allows for efficient addition to the front and removal from the end of the structure.
- A stack, a sequential, homogenous, variable-sized data structure that allows for efficient addition and removal from the front of the structure.
We can codify these interfaces with the following functions, enqueue/dequeue and push/pop, respectively:
// Adds element v to the front of queue q.
void enqueue(int v, queue *q);
// Removes the element at the back of the (assumed to be non-empty) queue.
int dequeue(queue *q);
// Adds element v to the front of stack s.
void push(int v, stack *s);
// Removes the element at the front of the (assumed to be non-empty) stack.
int pop(stack *s)
Note that these structures are precisely lists but with restrictions on how we access these elements. Consequently, there is less to say about the implementations of these functions with our linked implementation. We can implement each function using the standard list operations we've seen previously:
enqueue
isinsert_front
.dequeue
isremove_back
.push
isinsert_front
.pop
isremove_front
.
Another way of describing stacks and queues are in terms of how elements flow through the structure:
- An element in the queue starts at the front of the queue and exits the back of the queue. This is called first in, last out (FILO) behavior because the latest element added to the queue is the last to leave.
- An element in the stack starts at the front of the stack and exits from the front of the stack. This is called first in, first out (FIFO) behavior because the latest element added to the stack is the first to leave.
Exercise: Stack 'em up
Create a complete stack
ADT complete with a linked-based implementation.
Create files:
stack.h
stack.c
main.c
Which define the interface, implementation, and tests for your stack implementation.
Implement the following functions for your stack
ADT:
// Returns true iff the stack is empty (contains no elements)
bool is_empty(stack *s);
// Adds element v to the front of stack s.
void push(int v, stack *s);
// Removes the element at the front of the (assumed to be non-empty) stack.
int pop(stack *s)
Note that these implementations should be straightforward adaptions of list functions you have already encountered!