Intermezzo: Build Tools

Ideally, we do not need to understand how our tools work in order to use them. They should be designed in such a way that their function and use is self-evident. Unfortunately, developer tools, e.g., compilers, editors, and IDEs, are rarely designed with this sort of functionality in mind. Instead, developer tools are designed assuming that you, the programmer, know how their internals work.

This requirement is a double-edged sword. On one hand, their transparency makes them much more difficult to use. For example, consider the error message that arises when you forget the main function in a C program:

$> clang t.c
ld: Undefined symbols:
  _main, referenced from:
      <initial-undefines>
clang: error: linker command failed with exit code 1 (use -v to see invocation)

What is ld? What are "Undefined symbols?" Why can't clang simply say main cannot be found? You, as a programmer, are expected to be able to sift through the noise in this error message to understand the underlying problem. While this error message is rather informative once you get over the language it employs, you have certainly encountered errors in C, whether in your programs, clang, or your editor, that required you to just "know" how things worked in order to derive the actual problem.

The flip side of this transparency is that, with appropriate knowledge of these internals, we can deeply customize our tools to fit exactly the development scenarios we find ourselves in. For example, the emacs is a highly extensible text editor. With appropriate knowledge of how to configurate it using its Lisp-based configuration language, you can create a rich editing experience with support for inline compiler error message and auto-completion.

In this reading, we'll look at some of the build tools you have been using throughout the course with a deeper lens. We'll also give you resources to begin the journey of learning about and ultimately, customizing, your development tools to fit your exact needs.

Separation Compilation

When compiling our C programs, we've sent a single build command to clang, e.g., to compile our linked list programs:

$> clang -Wall -Werror -o prog -g -fsanitize=address list.c main.c

It turns out that this single command hides several steps in the compilation process that become more important to understand as our programs grow! For example, in large-scale C developments, we compile not just two or three C files, but hundreds or even thousands of C files totalling in the millions of lines of source code! In these situations, build times can take hours or even days. So it is important to understand and try to optimize our build times as much as possible.

One important optimization comes from the insight that when we modify a codebase, we typically only modify at most a few files at a time. We can dramatically speed up compilation by only recompiling the parts of the code we modified and integrating those recompiled bits with the rest of the already-compiled program. We call this process separate compilation where we can compile different components of our program independently of each other.

In C, each source file provided to the compiler is consider its own separately compiled unit, called a compilation unit in C parlence. With our standard clang invocation, we don't really separate compilation at work. For example, if we had three .c files, x.c, y.c, and z.c, compiled with the command clang -o prog x.c y.c z.c, we observe the following process, pictured as a diagram:

-------
| x.c |-----\
-------      \   clang -o prog
              \    x.c y.c z.c
-------        \           --------
| y.c |---------o--------> | prog |
-------        /           --------
              /
-------      /
| z.c |-----/
-------

Our three source files do not seem to be separately compiled at all! To observe this separate compilation process at work, we need to use the -c flag which compiles an individual compilation unit into an intermediate file called an object file. For example, if we invoke clang -c x.c we generate an object file called x.o:

-------   clang -c x.c    -------
| x.c | ----------------> | x.o |
-------                   -------

An object file is a compiled file, i.e., machine code, with holes in it. These holes correspond to function definitions found in other compilation units. For example, suppose that x.c is defined thusly:

// x.c

void bar();

void foo() {
    printf("In foo\n");
    bar();
}

bar's type signature is given in x.c but not its implementation. Thus, x.o will have a hole for bar's implementation since it is not provided within x.c directly, presumably provided by either y.c or z.c.

These holes enable each compilation unit to be compiled independently of each other as displayed in the diagram below.

-------   clang -c x.c    -------
| x.c | ----------------> | x.o |
-------                   -------

-------   clang -c y.c    -------
| y.c | ----------------> | y.o |
-------                   -------

-------   clang -c z.c    -------
| z.c | ----------------> | z.o |
-------                   -------

However, we still need to put together these independently-generated object files into a final program. This final step of the compilation process is called linking and is performed by the ld program, although it turns out that we can invoke ld through clang by simply passing in the object files instead of the C source files:

-------   clang -c x.c    -------
| x.c | ----------------> | x.o | ----\
-------                   -------      \   clang -o prog
                                        \    x.o y.o z.o
-------   clang -c y.c    -------        \           --------
| y.c | ----------------> | y.o | --------o--------> | prog |
-------                   -------        /           --------
                                        /
-------   clang -c z.c    -------      /
| z.c | ----------------> | z.o | ----/
-------                   -------

clang hides many flags that it passes to ld to complete the compilation process. For example, here is the (decluttered) call to ld to make our final executable when we call clang on our three-file program. I ran this command on my Mac, so you see a number of Mac-specific flags and directories in the spew.

$> ld -demangle
      -lto_library /Library/Developer/CommandLineTools/usr/lib/libLTO.dylib
      -no_deduplicate
      -dynamic
      -arch arm64
      -platform_version macos 14.0.0 14.0
      -syslibroot /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk
      -o prog
      -L/usr/local/lib
      x.o
      y.o
      z.o
      -lSystem /Library/Developer/CommandLineTools/usr/lib/clang/15.0.0/lib/darwin/libclang_rt.osx.a

(You can view the exact set of commands that clang invokes by passing the verbose -v flag.)

Now suppose that I modify just one my source files, say y.c. To see the effect of my change, I only need to recompile y.c to y.o via the clang -c y.c command. From there, I can re-link the new object file with the older object files. Note that re-linking requires that I process all object files in my program, even those object files I did not touch. But thankfully, I do not have to recompile the untouched compilation units!

Build Management with Makefiles

Separation compilation is a powerful feature of C. However, as you can tell from our running example it is tedious to set up, requiring multiple, precise commands to avoid redundant work. make is a powerful command-line utility for automating multistep build processes. Notably, make can track dependencies between files in a build and smartly issues commands to build a file only when one of its dependency has been modified since the last build.

To use make, we create a file called Makefile (with a capital "M" and no file extension!) alongside our source files. We can then edit our Makefile with build recipes using our text editor. For example, in our previous lab on array lists, we introduced the following Makefile that you copied and used to build your multi-file program:

main : charlist.c main.c
	clang -Wall -Werror -g charlist.c main.c -o main

A Makefile is composed of a collection of recipes, each of which consists of three parts:

A build target.
The build target's dependencies.
Commands to invoke to buil

The Makefile given above is composed of a single recipe for building our main program. The first line of a recipe consists of the build target and its dependencies, separated by a colon. For this recipe, main is our build target and the files charlist.c and main.c are its dependencies. When invoking make on the command-line, we can pass the name of the build target and its corresponding recipe is considered. make invokes the command associated with the recipe, i.e., the subsequent lines of the recipe, if main does not exist or, if it does exist, any one of its dependencies has been modified since main was last modified. For this recipe, we give a call to clang to produce main given our source files.

Importantly, note that the commands of the recipe are all indented one tab character. This indentation must be a tab character instead of multiple spaces. Depending on your editor and settings, you may need to either copy-paste a tab character from a website or search how to enter a tab character in your editor.

Our Makefile does not separately compile our sources. We need to add additional recipes to make this happen. Consider the following updated Makefile that achieves this effect:

main : charlist.o main.o
	clang -Wall -Werror -g charlist.o main.o -o main

charlist.o : charlist.c
	clang -Wall -Werror -g -c charlist.c

main.o : main.c
	clang -Wall -Werror -g -c main.c

The second and third recipes specify how we build object files from our two .c files. And the first recipe updates our original recipe for main to dependent and use these object files to build our program. Now, when we invoke make, the program does the following:

We look at the first recipe to see what main depends on: charlist.o and main.o. We then discover these files' dependencies by looking for their respective recipes and their dependencies and so forth. This search forms a dependency tree of files that depend on each other in this build process. For our `Makefile1, the following tree is formed:
```
     /----> charlist.o ----> charlist.c
    / 
main 
    \
     \----> main.o ----> main.c
```
If any of the dependencies of main are newer than main, we invoke that file's recipe command to update it. If this updates any other files in the dependency tree, we invoke their recipe command to update those files.
Finally, if any of main's dependencies were changed as a result of this process, we invoke main's recipe command to update it.

For example, if we updated only charlist.c (and not main.c), we would:

Update charlist.o by invoking clang -Wall -Werror -g -c charlist.c
Update main by invoking clang -Wall -Werror -g charlist.o main.o -o main.

Note that the process correctly detects that we don't need to recompile main.o since main.c has not been updated!

Learning More About Makefiles

Makefiles are highly configurable to handle a variety of building scenarios. One example of the capabilities of Makefiles are variables that allow us to capture common build patterns in a single recipe. For example, we can concisely write the Makefile above into a generic Makefile that works on any basic C project as follows:

FLAGS = -Wall -Werror -g

prog : charlist.o main.o
	clang $(FLAGS) $^ -o $@

%.o : %.c
	clang $(FLAGS) -c $^

.PHONY : clean

clean :
	rm -rf prog *.o

You can use this Makefile as a starting point for your C projects moving forward, substituting the appropriate object file dependencies for prog in its dependency list. This Makefile uses a number of variables to make things more general-purpose.

We declare a variables FLAGS that allows us to specify the flags to clang in one place. Variables of this sort are declared using the syntax <name> = <text>, and we use the syntax $(<name>) to reference a variable.
We use several automatic variables that are present for each rule:
- $^ expands to the (space separated) dependencies of the recipe.
- $@ expands to the target of the recipe.
Finally, we use wildcard patterns in the third recipe to capture the common recipe for building object files. The recipe pattern %.o : %.c, matches any object file (i.e., a file ending in the .o extension), making it depend on its corresponding C file of the same name, but with a .c extension.

Finally, this Makefile also includes a phony rule that doesn't correspond to a file, but allows us to issue a command concisely. Here the command make clean will delete prog and any object files via the rm command, effectively cleaning our build program from the directory.

For more information on using make, check out the GNU Make manual:

GNU Make Manual

Editors and the Language Server Protocol

Another important aspect of our build tools is our choice of editor. In the old days, there was a stark difference between lightweight text editors and heavyweight integrated development environments (IDEs). As a developer, you would need to make a choice between using a lightweight editor that was quick and responsiveness or a heavyweight editor that provided lots of tools for code introspection, refactoring, and build management.

Today, we get to enjoy the best of both worlds with highly extensible text editors that have a lightweight core, but feature plugins/packages that allow you to add in functionality that quickly approaches the support found in the most comprehensive of IDEs. Three of the most popular of these highly extensible editors are:

The first two of these editors, Emacs and Neovim, are terminal editors that exist entirely within the terminal. Visual Studio Code, in contrast, is a graphical, multi-platform application. While Code is easier to set up and use because it is graphical, it is worth it to invest time in setting up and using a terminal editor like Emacs or Neovim because there are often situations where you need to edit code exclusively in the terminal.

In all three cases, you will want to install appropriate packages for the Language Support Protocol (LSP) which is a standard interface for programming tools to provide services to a text editor including syntax highlight, highlighting errors, code hints, and autocompletion.

In Emacs, you will need to install the Emacs-lsp package.
In Neovim, there are several choices to install, the easiest of which is Coc.
Visual Studio has built-in support for LSP! For our purposes, you just need to install Microsoft's C/C++ Extension Pack.

Sculpting a Development Environment

Like a tradesperson who takes the time to curate and customize their tools, developers do the same with their development tools. Setting up a build system and editor is just the beginning on your journey of customization! As you continue in the curriclum, you should actively look towards refining your development environment. This includes:

Investing in learning Git, a version control system integral for developing larger projects. Github is a Git-hosting service that is a de facto standard in the industry.
Mastering navigating and using the terminal for everyday work. One step towards doing this is using a terminal multiplexing tool that allows you to have multiple virtual terminal windows within a single window. Tmux is the canonical tool in this space.
Further customization of your terminal environment. "Awesome" lists like this denenv one can provide further inspiration for customization!