As we increase the size of our programs, we run the risk of making our programs too complex to understand. Consider a program that is broken up into eight components. Imagine if there were no restrictions on how these components interacted. In the worst case, every component could interact with every other component.

Now, imagine that a bug is found in, say, component 5. Trying to uncover the cause of the bug may require investigating how this component interacts with other components in the system. In this maximally connected case, we have to see how component 5 interacts with the other seven components of the system. Clearly such a situation is not ideal, especially if there were additional components, say hundreds of them!

But this situation can become worse. Suppose the bug is particularly tricky, and we have to investigate not just how component 5 interacts with every other component, but also how every component interacts with each component that interacts with component 5. That is a bit of a mouthful, but what we mean is that maybe the interactions of any triple of components is the issue. In other words, we have to examine how:

  • Components 5, 1, and 2,
  • Components 5, 1, and 3,
  • Components 5, 1, and 4,
  • Components 5, 1, and 6,
  • Components 5, 1, and 7,
  • Components 5, 2, and 3, ...

... and so forth all interact to find the bug!

Maybe any combination of components that could interact together in the system can cause the bug. In the worst case, we would need to examine all of these combinations to unearth the problem, an exponential number of possibilities in all. Consequently, to build large software artifacts that we have some hope of reasoning about, we need to limit these interactions between components in our system to the bare essentials.

Abstract Data Types

One of the key tools in our arsenal to limit unnecessary interaction between software components is abstraction. By abstraction, we mean that we:

  • Expose the essential parts of a software component required to interact with others and
  • Hide all the inessential parts.

This approach to building software helps minimize the surface area of the component we're writing, minimizing the ways it can interact with other components, thus reducing the complexity of our programs!

When our software components are best realized as types in our programs, this abstraction takes the form of abstract data types (ADTs). We break up our component into two parts:

  • An public interface that defines the ways that our program can interact with this type, i.e., what functions are available to create, use, and delete values of this type.
  • A private implementation of the interface functions, hidden from the outside world.

A Simple Example: A Counter

Let's examine a simple abstract data type to see this concept in action. Consider a humble counter that keeps track of an ever-increasing value. First, what operations should our counter ADT expose to the outside world? Keeping in mind our mantra of "creation-use-deletion" for types, we need to consider exposing functions for each of these categories. To keep things simple, we'll assume that our counter is only stack-allocated, so we'll have no need to consider the "deletion" category yet.

"Using" a counter amounts to two behaviors: incrementing the counter and fetching its current value. Combining these two behaviors with the need to make a counter in the first place yields three function signatures:

// Creates a new counter, initially with value 0
counter make_counter(void);

// Increments counter c by 1
void increment(counter *c);

// Retrives the value of counter c
int get_value(counter *c);

These three functions form the interface to our counter ADT. Note that we have not defined what the counter type is or how these functions are implemented. The point of the ADT is to separate interface from implementation so that we can expose the former and hide the latter.

Nevertheless, we need to actually provide an implementation, so let's do so. Critically, our choice of what the counter type is will guide our implementation of the interface. The simplest counter is simply an int, so let's go with that and see how everything pans out:

typedef struct {
    int value;
} counter;

counter make_counter(void) {
    counter c;
    c.value = 0;
    return c;
}

void increment(counter *c) {
    // x->f is convenient syntax that means (*x).f
    c->value++;
}

int get_value(counter *c) {
    return c->value;
}

Great! We have provided an implementation of our counter ADT. Presumably, all of this code sits in a single file, so we really haven't "hidden" anything from everyone. Next, let's see how we can organize our C programs to effectively hide implementations and expose interfaces to other parts of our program.

Organizing C Programs Around ADTs

Some programming languages provide strong support for abstractions like ADTs, e.g., classes in object-oriented languages. C, as you are probably used to by now, does not provide such support. This is because such support usually comes at a cost of performance that is disagreeable in the low-level contexts that C is used. Instead, C programs are organized in a particular way to achieve this effect of hiding implementation and exposing interface through the compilation process. This organization is not enforced by C, per se; it is simply a convention that all C programmers follow!

We break up an ADT like our counter into two files:

  • A header file that contains information about the interface to this ADT.
  • An implementation file that contains an appropriate implementation of this interface.

Applying this design pattern to our counter example, we create two files:

  • counter.h is our header file:

    // counter.h
    
    #pragma once
    
    // The counter type
    typedef struct 
        int value;
    } counter;
    
    // Creates a new counter, initially with value 0
    counter make_counter(void);
    
    // Increments counter c by 1
    void increment(counter *c);
    
    // Retrives the value of counter c
    int get_value(counter *c);
    
  • counter.c is our implementation file:

    // counter.c
    #include "counter.h"
    
    counter make_counter(void) {
        counter c;
        c.value = 0;
        return c;
    }
    
    void increment(counter *c) {
        // x->f is convenient syntax that means (*x).f
        c->value++;
    }
    
    int get_value(counter *c) {
        return c->value;
    }
    

Observe how we link the header and implementation files together. Recall that the #include pre-processor directive copies the contents of a file (more or less) verbatim into the #include's position in the file. Thus, counter.c stays consistent with counter.h by virtue of literal copying-pasting of the latter into the former during compilation!

Also observe the use of a new pre-processor directive, #pragma once, at the top of our header file. #pragma once ensures that only one copy of this file is included (via #include) in any compilation unit of our program. A compilation unit is a single .c file including the complete set of files it (transitively) includes via #include.

To see why multiple copies of our header might be included in a compilation unit, note that we #include our header file not just in our implementation but any file that will use the counter type. Because the header file contains types and function headers, it is precisely the information needed by the compiler to type-check any uses of a counter in our program! But what if a compilation unit includes (a) our counter header and (b) another header header that also #includes counter.h? We then arrive at the situation that #pragma once fixes—ensuring that only one copy of counter.h is included in the overall compilation unit!

To use our counter type, say for example, in our main program, we would have a separate file for main, call it main.c. We would then #include the header file in main. Then, any use of the counter type or its functions will type-check successfully:

// main.c

#include "counter.h"

int main(void) {
    counter c = make_counter();
    printf("Counter is initially %d\n", get_value(&c));
    increment(&c);
    increment(&c);
    increment(&c);
    printf("Counter is now %d\n", get_value(&c));
    return 0; 
}

Finally, when we go to compile our program, we need to include both compilation units in our clang invocation:

$> clang main.c counter.c -o prog
$> ./prog
Counter is initially 0
Counter is initially 3

Note that counter.h is "included" in the compilation by virtue of the #include directives found in main.c and counter.c!

Introduction to Sequences and Lists

Abstract data types can be used to capture many kinds of data. For example, the FILE* type exposed in stdio.h is an ADT over a stream of data that could be drawn from, e.g., a file on disk, the user's keyboard, or their screen. However, ADTs are commonly used to capture classes of data structures, data that is responsible for organizing and storing other data. There are four broad classes of such data structures, defined by the relationships they encode between the data they store:

  • Sequences capture sequential relationships between data.
  • Trees capture hierarchical relationships between data.
  • Maps capture function-like relationships between data.
  • Graphs capture arbitrary relationships between data.

The study of these structures is the primary topic for CSC 207. But of these, sequences are most common and prevalent in programming, so it is worth our time to explore them in detail now.

We've already seen an example of a particular kind of sequence in programming: the array. An array is a sequence that is:

  • Fixed-size and
  • Homogeneous, i.e., stores elements of same type.

We have run into the perils of the limitations of these fixed-size structures already. For example, if we use an array to store user input, how do we cope with the fact that the user enters more data than we have allotted space to the array?

A natural question, therefore, is what is a sequence that is instead a variable-size structure, one that can grow and shrink on demand? This leads us to the abstract data type that we will study for this next leg of the course, the list. We have already seen lists in Scheme/Racket, but we didn't think too hard about how they were implemented. Now that we are working in a systems context, this is a good time to go under the hood and explore the different ways lists can be implemented and the trade-offs between these approaches.

The List ADT

Before we dive into implementation, we must define the interface of this list abstract data type. What are the kinds of operations that we want to perform over some list type? And for simplicity's sake, let's say our list just holds int values. Again, let's appeal to the "creation-use-deletion" mantra for types:

  • We need functions to create and delete lists:

    list* make_list(void);
    void free_list(list* lst);
    
  • We need a function to add elements to a list, presumably at the end:

    void list_add(list* lst, int val);
    
  • We need a function to retrieve elements from the list. Like arrays, lists are sequential in nature, so it is natural to want to retrieve their elements by index:

    int list_get(list* lst, int index);
    
  • Since a list is variable-sized, it'll be useful to know the length of a list at any given point in time.

    int list_length(list* lst);
    
  • At this point, it might be nice to delete elements (by index) from our list, too:

    void list_remove(list* lst, int index);
    

With this interface defined, we can envision how operating over a list should work, even without knowing its implementation!

int main(void) {
    list *lst = make_list();
    printf("The size of the empty list is: %d\n", list_length(lst));      // 0
    lst_add(lst, 5);
    lst_add(lst, 9);
    lst_add(lst, 7);
    printf("The size of this list is now: %d\n", list_length(lst));       // 3
    printf("The element at index 1 is: %d\n", list_get(lst, 1));          // 9
    list_remove(lst, 1);
    printf("The element at index 1 now is: %d\n", list_get(lst, 1));      // 7
    printf("And the length of the list is now: %d\n", list_length(lst));  // 2
    free_list(lst);
    return 0;
}

In lab, we'll explore one of the two ways to implement this list ADT: arrays!

Lab Preparation: Reviewing Dynamic Memory Management

Before jumping into lab, we'll want to review the basics of dynamic memory management with malloc and free. To do this, review chapters 17.1–17.4 from King which covers dynamic allocation for both strings and arrays.

Exercise: Enhanced Interface

In our initial exploration of the list ADT, we scoped some basic functions that we might implement. However, there are many other functions to consider, some of which you may have seen in CSC 151! Take some time to brainstorm what other functions might be nice to perform over lists, write down their function signatures, and expand the example main function above with their use.