Computational Pipelines

Summary
We introduce the design strategy of realizing a computation as a computational pipeline, a composition of functions.

Imagine the problem of taking a list of temperature readings, e.g.,

(define iowa-july-temperatures
  (list 83 89 88 90 90 91 92 95 85 91 91 86 88 89
        79 84 91 93 87 80 82 85 87 93 93 90 87 88 90))

And performing the following operations on the data:

  • Normalizing the data, e.g., subtracting 2 from each temperature since we know the sensors were faulty,
  • Keeping the temperatures that are all 90 degrees or above—the “scorchers”, and
  • Finding the minimum scorcher temperature.

We know from our discussion about lists that we can realize this computation as a combination of map, filter, and fold calls bound with define:

(define normalized-temperatures (map (lambda (n) (- n 5)) iowa-july-temperatures))
(define scorchers (filter (lambda (n) (>= n 90)) normalized-temperatures))
(define min (reduce (lambda (x y) (if (< x y) x y)) scorchers))
> min
90

We can see that the defines are unnecessary since they only bind the intermediate values of the computation. We can define min directly by nesting the calls to map, filter, and reduce:

(define min
  (reduce (lambda (x y) (if (< x y) x y))
          (filter (lambda (n) (>= n 90))
                  (map (lambda (n) (- n 5))
                       iowa-july-temperatures))))
> min
90

Oof! While we got rid of the unnecessary bindings, our expression just became quite tedious to write and read. However, observe the following pattern of behavior in this expression: we are passing the output of one function to another and producing the final result. We call such a computation a computational pipeline, inspired from the fact that we can represent such a computation pictorially like a pipeline:

iowa-july-temperatures
         |
        map
         |
         V
(normalized-temperatures)
         |
      filter
         |
         V
    (scorchers)
         |
       reduce
         |
         V
        min

In this reading, we introduce additional higher-order functions—functions that take other functions as input—to make writing computational pipelines easier to write. Most of these functions are specified in the csc151 library but they are prevalent in other functional languages as well.

Pipelining and Composition

Recall that the goal of functions is to abstract away repetitive data. For example, a function that converts Celsius to Fahrenheit abstracts away or is parameterized by the temperature in Celsius. With our computational pipeline, it is not data that is being repeated but an action that is being repeated. What is that action? It is the act of (a) calling a function, (b) passing that function’s output to a second function, and (c) outputting the result of the second function!

How might we write a function to capture this abstract behavior? Let’s take a look again at two computational pipelines:

> (reduce (lambda (x y) (if (< x y) x y))
          (filter (lambda (n) (>= n 90))
                  (map (lambda (n) (- n 5))
                        iowa-july-temperatures)))
90
> (filter (lambda (n) (> n 0))
          (map (lambda (n) (+ n 5))
               (map (lambda (n) (- n 10))
                    (list 8 7 1 3 2 6 5 1 1))))
>
'(3 2 1)

The first is our computational pipeline from before. The second is a computational pipeline that subtracts 10 from every element, adds 5 to every element, and then keeps only those that are positive.

These expressions seem completely different, but if we pay attention to the pattern of function application, we see that:

  • The two expressions take an initial value (iowa-july-temperatures and (list 8 7 1 3 2 6 5 1 1)).
  • They both feed that initial value to a function (map (lambda (n) (- n 5)) and map (lambda (n) (- n 10))).
  • They feed the output from the first function to a second function (map (lambda (n) (+ n 5)) and filter (lambda (n) (>= n 90))).
  • They finally feed the output from the second function to a third function (reduce (lambda (x y) (if (< x y) x y)) and filter (lambda (n) (> n 0))) and then output the result of the third function.

Thus, we need to abstract away or make parameters of the initial value and the three functions in question! The resulting function then performs the pattern of function application we just observed:

(define pipeline-3
  (lambda (x f1 f2 f3)
    (f3 (f2 (f1 x)))))

pipeline-3 is a higher-order function because f1, f2, f3 are, themselves functions. We can now use pipeline-3 to cut down the verbosity of our two expressions:

> (pipeline-3 iowa-july-temperatures
              (lambda (l) (map (lambda (n) (- n 5)) l))
              (lambda (l) (filter (lambda (n) (>= n 90)) l))
              (lambda (l) (reduce (lambda (x y) (if < x y) x y) l)))
90

> (pipeline-3 (list 8 7 1 3 2 6 5 1 1)
              (lambda (l) (map (lambda (n) (- n 10)) l))
              (lambda (l) (map (lambda (n) (+ n 5)) l))
              (lambda (l) (filter (lambda (n) (> n 0)) l)))
'(3 2 1)

pipeline-3 has a number of nice benefits!

  1. We now write the computational pipeline in “top-down” order rather than “bottom-up”. This makes the code imminently more readable.
  2. The functions calls are no longer nested which also makes it more clear what the code is trying to do.

The only downside is that each successive argument to pipeline-3 is a function but the calls to map, filter, and reduce that we wrote in the nested version of the code were not functions themselves—they are partially applied functions. To fix this, we have to wrap each partially applied function in a lambda that is awaiting the final argument.

Of course, we shouldn’t restrict ourselves to computational pipelines involving three functions! The general version of this function is called the pipeline operator, (:>). (N.B., in other languages, you’ll see this operator written as |> which is meant to be a right triangle (▷).) The implementation (:>) requires that we use Racket’s variable arguments features to create a function that accepts an arbitrary number of arguments. These arguments are given to us in a list which we can process in a left-to-right fashion with foldl!

Here is the implementation of :> as it is given in the csc151 package:

;;; (:> x f fs) -> any/c
;;; x : any/c
;;; f1 : procedure?
;;; fs : ...
;;; The pipeline operator. Chains invocations of functions f and fs
;;; in a left-to-right fashion starting with the value x.
(define :>
  (lambda (x . fs)
    (foldl (lambda (g acc) (g acc)) x fs)))

And here is how we would use it to express the two computational pipelines we’ve studied in this section:

> (:> iowa-july-temperatures
      (lambda (l) (map (lambda (n) (- n 5)) l))
      (lambda (l) (filter (lambda (n) (>= n 90)) l))
      (lambda (l) (reduce (lambda (x y) (if < x y) x y) l)))
90

> (:> (list 8 7 1 3 2 6 5 1 1)
      (lambda (l) (map (lambda (n) (- n 10)) l))
      (lambda (l) (map (lambda (n) (+ n 5)) l))
      (lambda (l) (filter (lambda (n) (> n 0)) l)))
'(3 2 1)

(:>) is a powerful tool for expressing computational pipelines succinctly, even when they only involve two functions!

Function Composition

Note that the pipeline operator runs a collection of functions starting with some initial value. What if we instead wanted to create a new function that is the result of running that collection on some arbitrary initial value? We could wrap the pipeline in a lambda that is expecting the initial value:

(define min-of-temperatures
  (lambda (temperatures)
    (:> temperatures
        (lambda (l) (map (lambda (n) (- n 5)) l))
        (lambda (l) (filter (lambda (n) (>= n 90)) l))
        (lambda (l) (reduce (lambda (x y) (if < x y) x y) l)))))

However, the Racket standard library provides a higher-order helper function creating a reusable function from a computational pipeline, compose:

(define min-of-temperatures
  (compose (lambda (l) (reduce (lambda (x y) (if < x y) x y) l))
           (lambda (l) (filter (lambda (n) (>= n 90)) l))
           (lambda (l) (map (lambda (n) (- n 5)) l))))

compose is great for creating reusable pipelines. However, we have to be aware of a quirk of compose: the collection of functions is given in reverse order! In other words, the last function to fire first in the chain is given first. This is because compose is meant to emulate the mathematical composition operator (\( \circ \)) that takes its arguments in reverse order, i.e., \( g \circ f \) is a function that runs f first and then g with the output of f.

In fact, the csc151 library provides a shorthand function for compose called o (lowercase ‘o’) that is meant to be evocative of this operator:

(define min-of-temperatures
  (o (lambda (l) (reduce (lambda (x y) (if < x y) x y) l))
     (lambda (l) (filter (lambda (n) (>= n 90)) l))
     (lambda (l) (map (lambda (n) (- n 5)) l))))

Sectioning

Let’s reexamine our Iowa temperature computational pipeline created with the pipeline operator:

> (:> iowa-july-temperatures
      (lambda (l) (map (lambda (n) (- n 5)) l))
      (lambda (l) (filter (lambda (n) (>= n 90)) l))
      (lambda (l) (reduce (lambda (x y) (if < x y) x y) l)))
90

We noted that a downside of this approach was the need to wrap the calls to map, filter, and reduce in lambdas because :> needed to be provided functions (of a single argument) For example, in the original pipeline, we had the expression:

(filter (lambda (n) (>= n 90)) ...)

Where the ... was the call to map. We create a function from this partially applied function by replacing the ... with a parameter:

(lambda (l) (filter (lambda (n) (>= n 90)) l))

However, it is easy to mess up this transformation, e.g., by using the argument to the outer lambda in the wrong postilion of the contained function call. To help construct these lambdas, the csc151 introduces the section function. Sectioning a function involves providing concrete values for only some of the function’s arguments. The remaining arguments then become parameters to a new function that calls this original function with said arguments.

Here is how we would use section to more concisely write the filter partial function:

(section filter (lambda (n) (>= n 90)) <>)

section takes as input:

  • A function to partially apply (filter).
  • The arguments to that function ((lambda (n) (>= n 90))). We represent arguments that should remain unfilled with <>.

Here’s our newly sectioned filter function in action:

(define example-sectioning
  (section filter (lambda (n) (>= n 90)) <>))

> (example-sectioning (list 45 100 75 95 90 80))
'(100 95 90)

Sectioning allows us to concisely write partially applied functions which is useful for writing computational pipelines. Here’s our Iowa pipeline rewritten to use section:

> (:> iowa-july-temperatures
      (section map (lambda (n) (- n 5)) <>)
      (section filter (lambda (n) (>= n 90)) <>)
      (section reduce (lambda (x y) (if < x y) x y) <>))
90

Note that we’ve removed some of the lambdas from the pipeline. In particular, note that in all three cases the final argument of the functions are sectioned out as they are expected inputs that are given by the previous functions in the pipeline. However, we have not removed all of the lambdas! We can use section to also simplify some of the higher-order functions passed to map, filter, and reduce!

> (:> iowa-july-temperatures
      (section map (section - <> 5) <>)
      (section filter (section >= <> 90) <>)
      (section reduce (lambda (x y) (if < x y) x y) <>))
90

Note that we can’t simplify the lambda to reduce not because it has two parameters—we can use section to create functions of multiple arguments, e.g., (section + 1 <> <> <> 5) is a function that awaits three numbers that will then be added to 1 and 5. We can’t simplify the lambda because the arguments are mentioned in multiple places in the function and section does not provide a mechanism for specifying the repeated occurrence of a hole in a function. (We could use a let-expression to get around this restriction, but then the section would become longer than the lambda itself!)

However, note that two new sections that we wrote have common behavior: they both involve providing a value for the second (right-hand) argument of a binary function, (-) and (>=), respectively. The csc151 packages includes additional functions to make these cases more concise:

> (:> iowa-july-temperatures
      (section map (r-s - 5) <>)
      (section filter (r-s >= 90) <>)
      (section reduce (lambda (x y) (if < x y) x y) <>))
90

r-s, short for right-section is the special case of sectioning where (a) we are sectioning a binary (two-argument) function and (b) we are providing a value to the second (right-hand) argument. l-s, short for left-section is the special case of sectioning where we are instead providing the first (left-hand) argument. Depending on the function involved, using l-s versus r-s matters, e.g.,

> (define f1
    (l-s >= 5))
> (define f2
    (r-s >= 5))
> (f1 2) ; 5 >= 2
#t
> (f2 2) ; 2 >= 5
#f

Computational Pipelines in Context

Recall that algorithmic decomposition is one of the cornerstones of computer programming. Computational pipelines are a beautiful special case of decomposition where the computation in question is a pipeline of operations.

There are several benefits of realizing a computation in this matter:

  • The code does not introduce extraneous names that can lead to subtle bugs.
  • The code most directly expresses the “pipeline nature” of the computation. Such code is declarative in nature: we are expressing what the computation does, not how it ought to be carried out.

The downside is that writing programs in this style is highly idiosyncratic and takes a bit of time getting used to. But it is worth it to get right now as this will pay dividends later in other languages and settings!

Self-Check (‡)

Design a computational pipeline for the following computation:

Given a list of strings:

  • Keeps the words that do not start with “the”, case insensitive.
  • Transforms the words into their lengths.
  • Counts the number of odd length words that are left in the list.

(Hint: you might find it easier to use (:>) to design the pipeline with a concrete input to test Afterwards, convert your pipeline to use (o) instead to make it into a function expecting a list of strings as input.)