Abstract Data Types

As we increase the size of our programs, we run the risk of making our programs too complex to understand. Consider a program that is broken up into eight components. Imagine if there were no restrictions on how these components interacted. In the worst case, every component could interact with every other component.

Now, imagine that a bug is found in, say, component 5. Trying to uncover the cause of the bug may require investigating how this component interacts with other components in the system. In this maximally connected case, we have to see how component 5 interacts with the other seven components of the system. Clearly such a situation is not ideal, especially if there were additional components, say hundreds of them!

But this situation can become worse. Suppose the bug is particularly tricky, and we have to investigate not just how component 5 interacts with every other component, but also how every component interacts with each component that interacts with component 5. That is a bit of a mouthful, but what we mean is that maybe the interactions of any triple of components is the issue. In other words, we have to examine how:

Components 5, 1, and 2,
Components 5, 1, and 3,
Components 5, 1, and 4,
Components 5, 1, and 6,
Components 5, 1, and 7,
Components 5, 2, and 3, …

… and so forth all interact to find the bug!

Maybe any combination of components that could interact together in the system can cause the bug. In the worst case, we would need to examine all of these combinations to unearth the problem, an exponential number of possibilities in all. Consequently, to build large software artifacts that we have some hope of reasoning about, we need to limit these interactions between components in our system to the bare essentials.

Abstract Data Types

One of the key tools in our arsenal to limit unnecessary interaction between software components is abstraction. By abstraction, we mean that we:

Expose the essential parts of a software component required to interact with others and
Hide all the inessential parts.

This approach to building software helps minimize the surface area of the component we’re writing, minimizing the ways it can interact with other components, thus reducing the complexity of our programs.

When our software components are best realized as types in our programs, this abstraction takes the form of abstract data types (ADTs). We break up our component into two parts:

An public interface that defines the ways that our program can interact with this type, i.e., what functions are available to create, use, and delete values of this type.
A private implementation of the interface functions, hidden from the outside world.

A simple Example: A Counter

Let’s examine a simple abstract data type to see this concept in action. Consider a humble counter that keeps track of an ever-increasing value. First, what operations should our counter ADT expose to the outside world? Keeping in mind our mantra of “creation-use-deletion” for types, we need to consider exposing functions for each of these categories. To keep things simple, we’ll assume that our counter is only stack-allocated, so we’ll have no need to consider the “deletion” category yet.

“Using” a counter amounts to two behaviors: incrementing the counter and fetching its current value. Combining these two behaviors with the need to make a counter in the first place yields three function signatures:

              // Creates a new counter, initially with value 0
counter make_counter(void);

// Increments counter c by 1
void increment(counter *c);

// Retrives the value of counter c
int get_value(counter *c);

            

These three functions form the interface to our counter ADT. Note that we have not defined what the counter type is or how these functions are implemented. The point of the ADT is to separate interface from implementation so that we can expose the former and hide the latter.

Nevertheless, we need to actually provide an implementation, so let’s do so. Critically, our choice of what the counter type is will guide our implementation of the interface. The simplest counter is simply an int, so let’s go with that and see how everything pans out:

              typedef struct {
    int value;
} counter;

counter make_counter(void) {
    counter c;
    c.value = 0;
    return c;
}

void increment(counter *c) {
    // x->f is convenient syntax that means (*x).f
    c->value++;
}

int get_value(counter *c) {
    return c->value;
}

            

Great! We have provided an implementation of our counter ADT. Presumably, all of this code sits in a single file, so we really haven’t “hidden” anything from everyone. Next, let’s see how we can organize our C programs to effectively hide implementations and expose interfaces to other parts of our program.

Organizing C Programs around ADTs

Some programming languages provide strong support for abstractions like ADTs, e.g., classes in object-oriented languages. C does not provide such support. This is because such support usually comes at a cost of performance that is disagreeable in the low-level contexts that C is used. Instead, C programs are organized in a particular way to achieve this effect of hiding implementation and exposing interface through the compilation process. This organization is not enforced by C, per se; it is simply a convention that all C programmers follow!

We break up an ADT like our counter into two files:

A header file that contains information about the interface to this ADT.
An implementation file that contains an appropriate implementation of this interface.

Applying this design pattern to our counter example, we create two files:

counter.h is our header file:

              // counter.h

#ifndef COUNTER_H  
#define COUNTER_H

// The counter type
typedef struct 
    int value;
} counter;

// Creates a new counter, initially with value 0
counter make_counter(void);

// Increments counter c by 1
void increment(counter *c);

// Retrives the value of counter c
int get_value(counter *c);

#endif

            

counter.c is our implementation file:

              // counter.c
#include "counter.h"

counter make_counter(void) {
    counter c;
    c.value = 0;
    return c;
}

void increment(counter *c) {
    // x->f is convenient syntax that means (*x).f
    c->value++;
}

int get_value(counter *c) {
    return c->value;
}

            

Observe how we link the header and implementation files together. Recall that the #include pre-processor directive copies the contents of a file (more or less) verbatim into the #include’s position in the file. Thus, counter.c stays consistent with counter.h by virtue of literal copying-pasting of the latter into the former during compilation.

To use our counter type, say for example, in our main program, we would have a separate file for main, call it main.c. We would then #include the header file in main. Then, any use of the counter type or its functions will type-check successfully:

              // main.c

#include "counter.h"

int main(void) {
    counter c = make_counter();
    printf("Counter is initially %d\n", get_value(&c));
    increment(&c);
    increment(&c);
    increment(&c);
    printf("Counter is now %d\n", get_value(&c));
    return 0; 
}

            

Finally, when we go to compile our program, we need to include both compilation units in our clang invocation:

              $> clang main.c counter.c -o prog
$> ./prog
Counter is initially 0
Counter is initially 3

            

Note that counter.h is “included” in the compilation by virtue of the #include directives found in main.c and counter.c.

Sequences and Lists

Abstract data types can be used to capture many kinds of data. For example, the FILE* type exposed in stdio.h is an ADT over a stream of data that could be drawn from, e.g., a file on disk, the user’s keyboard, or their screen. However, ADTs are commonly used to capture classes of data structures, data that is responsible for organizing and storing other data. There are four broad classes of such data structures, defined by the relationships they encode between the data they store:

Sequences capture sequential relationships between data.
Trees capture hierarchical relationships between data.
Maps capture function-like relationships between data.
Graphs capture arbitrary relationships between data.

The study of these structures is the primary topic for CSC 207. But of these, sequences are most common and prevalent in programming, so it is worth our time to explore them in detail now.

Recently, we implemented a linked-list, which can be viewed as an implementation of a list ADT. We also used lists quite a bit in Scheme/Racket, but we didn’t think too hard about how they were implemented. Now that we are in a systems context, this is a good time to go under the hood and explore different ways lists can be implemented and the trade-offs between those approaches.

So if a linked-list is one way to implement a list ADT, another is with an array. We’ve used arrays aleady of course, but as review, an array has the following properties:

It is a fixed-size and
The values are homogeneous, i.e., stores elements of same type.

We have run into the perils of the limitations of these fixed-size structures already. For example, if we use an array to store user input, how do we cope with the fact that the user enters more data than we have allotted space to the array?

The list ADT

Before we dive into implementation with an array, let’s take a step back and define the interface of this list abstract data type. What are the kinds of operations that we want to perform over some list type? And for simplicity’s sake, let’s say our list just holds int values. Again, let’s appeal to the “creation-use-deletion” mantra for types:

We need functions to create and delete lists:

              list* make_list(void);
void free_list(list* lst);

We need a function to add elements to a list, presumably at the end:

              void list_add(list* lst, int val);

            

We need a function to retrieve elements from the list. Like arrays, lists are sequential in nature, so it is natural to want to retrieve their elements by index:

              int list_get(list* lst, int index);

            

Since a list is variable-sized, it’ll be useful to know the length of a list at any given point in time.

              int list_length(list* lst);

            

At this point, it might be nice to delete elements (by index) from our list, too:

              void list_remove(list* lst, int index);

            

With this interface defined, we can envision how operating over a list should work, even without knowing its implementation.

              int main(void) {
    list *lst = make_list();
    printf("The size of the empty list is: %d\n", list_length(lst));      // 0
    lst_add(lst, 5);
    lst_add(lst, 9);
    lst_add(lst, 7);
    printf("The size of this list is now: %d\n", list_length(lst));       // 3
    printf("The element at index 1 is: %d\n", list_get(lst, 1));          // 9
    list_remove(lst, 1);
    printf("The element at index 1 now is: %d\n", list_get(lst, 1));      // 7
    printf("And the length of the list is now: %d\n", list_length(lst));  // 2
    free_list(lst);
    return 0;
}

            

Implementing a list ADT with an array will be the topic of today’s lab.

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
This work is derived from materials by Charlie Curtsinger, Nicole Eikmeier, Fernanda Eliott, Priscill Jimenez, Barbara Johnson, Titus Klinge, Dawn Nye, Peter-Michael Osera, Sam Rebelsky, John Stone, Henry Walker, and Jerod Weinman.

This website was built using Jekyll, Twitter Bootstrap, and the Bootswatch Cosmo Theme.