Dictionaries, maps, and hash tables

Summary
We consider the dictionary data type, which provides a way to associate information with words. We then explore Racket’s hash tables, which provide one implementation of dictionaries.

Introduction

As you’ve seen, we often find it useful to collect data into structures. We’ve collected values in lists or vectors, which are useful for processing sequentially. We will soon explore trees, which are useful for processing hierarchically. Since different structures might support different kinds of operations, the design of structured types like lists and trees is a core part of algorithm design. Computer scientists have defined a wide range of data types designed to collect data.

Consider, for example, the problem of representing a simple phone book. A phone book associates phone numbers with names. When you look up a name in a phone book, you should either get the corresponding phone number or a warning that no phone number is available for that name. There are many other instances in which you want to associate information with a “name” or something equivalent. For example, a physical dictionary associates definitions with words.

In referring to these kinds of structures, computer scientists often call the thing used to look up information the key and the associated information the value. In a phone book, the name is the key and the phone number is a value. In a traditional dictionary, the word is the key and the definition is the value.

Computer scientists use a variety of names for structures that associate keys and values, including Dictionary, Map, Keyed Table, and even Hash. Generally, you can think of these structures as collections of key/value pairs that make it easy to reference values by key. What operations would you expect such structures to provide? It should be easy to create a new structure, to add or update a key/value pair, and to look up a value by key. It might also be useful to remove a key/value pair and to check whether a key is in the structure.

Over the years, computer scientists have come up with a large number of ways to represent dictionaries, including association lists, binary search trees, skip lists, and hash tables. Don’t worry; you don’t need to know the underlying details of any of them, at least not at this stage of your career. You do, however, need to understand how and why to use such structures.

For now, we will focus on how you use one of the most common implementations of dictionaries, known as Hash Tables. In many situations, hash tables are one of the most efficient implementations of dictionaries. Most modern languages include hash tables as a basic type. And, because hash tables are so popular, many programmers don’t distinguish hash tables from the broader concept of dictionaries. (That’s why some people refer to the general concept as a “hash”.) But you should remember that hash tables are just one way to implement dictionaries.

Getting started with hash tables

We’re exploring a new type. Some of us make it a point to ask five or so questions when we encounter a new type: What is its name? What is its purpose? How do I express values in the type? How does DrRacket display values in the type? What procedures are available for working with the type?

We’ve already answered the first two questions: The type is named “hash table” and its purpose is to store key/value pairs. We rarely write hash tables directly and we rarely ask DrRacket to display them (they are often fairly large), so we’ll leave those questions to after our coverage of the procedures associated with hash tables.

You may recall that we said that there are three basic operations we use with hash tables: we create them, we add key/value pairs, and we look up values by keys.

You create a new hash table with (make-hash). You look up a value in a hash table with (hash-ref table key). You add a key/value pair to the table with (hash-set! table key value). Note that hash-set! ends with an exclamation point. That reminds us that the procedure actually modifies the underlying hash table.

Let’s explore those basic operations, creating a map of book titles to authors. For now, we’ll assume that each book has a single author; in a subsequent section, we’ll consider how to deal with multiple authors.

> (define book-authors (make-hash))
> (hash-set! book-authors "The Princess Bride" "William Goldman")
> (hash-set! book-authors "Homegoing" "Yaa Gasi")
> (hash-set! book-authors "Moo" "Jane Smiley")
> (hash-set! book-authors "Moo, Baa, La La La!" "Sanda Boynton")
> (hash-ref book-authors "Homegoing")
"Yaa Gasi"
> (hash-ref book-authors "Moo")
"Jane Smiley"

What happens when the book title is not in the table? Let’s check.

> (hash-ref book-authors "FunDHum")
hash-ref: no value found for key
  key: "FunDHum"

It appears that the hash table issues an error. That seems like a reasonable choice. Racket needs to signal to the user that the value is not available. But an error message is not something many of us want to have happen in the middle of our program. Hence, the Racket hash library also provides a (hash-has-key? table key) procedure that checks whether the hash table contains a particular key.

> (hash-has-key? book-authors "Homegoing")
#t
> (hash-has-key? book-authors "FunDHum")
#f

At times, you may realize that you want to update a key/value pair that you have stored in the hash table. You can use the hash-set! operation to update the table.

> (hash-ref book-authors "The Princess Bride")
"William Goldman"
> (hash-set! book-authors "The Princess Bride" "S. Morgenstern")
> (hash-ref book-authors "The Princess Bride")
"S. Morgenstern"

You’ve learned four basic hash table operations: make-hash, hash-set!, hash-ref, and hash-has-key?. While the Racket hash table implementation provides a host of other operations, those four basic operations should be all that you need at this point. In the next few sections, we’ll explore some other design issues in making and using hash tables.

Displaying hash tables

As hash tables exist primarily for you to use to extract information, You will rarely ask DrRacket to display hash tables. Also, hash tables are generally large enough that you won’t want to see all of the information on your screen. But DrRacket will display a hash table if you ask it nicely.

> book-authors
'#hash(("Homegoing" . "Yaa Gasi") ("Moo, Baa, La La La!" . "Sanda
Boynton") ("The Princess Bride" . "S. Morgenstern") ("Moo" . "Jane
Smiley"))

Not very pretty, is it? But, if we look carefully, it’s fairly straightforward. It begins with a tick, an octothorpe, and the word hash. Following that is a list of key/value pairs. And they are literally pairs, as indicated by the dot between the two values.

Can we create a hash in the same way? Let’s try.

> (define animals '#hash(("A" . "Aaardvark")
                         ("B" . "Badger")
                         ("C" . "Chinchilla")
                         ("D" . "Dingo")
                         ("E" . "Emu")
                         ("F" . "Fennec Fox")))
> animals
'#hash(("B" . "Badger") ("F" . "Fennec Fox") ("D" . "Dingo") ("A" .
"Aaardvark") ("E" . "Emu") ("C" . "Chinchilla"))
> (hash-ref animals "B")
"Badger"
> (hash-has-key? animals "Z")
#f

It looks like this does give us a hash table, but with the key/value pairs in a slightly different order than we started with. That’s okay: We should not care about the ordering because our focus is on using hash tables rather than understanding the underlying technology. (It turns out that the ordering is one of the things that helps keep hash tables efficient. You’ll study how to construct hash tables in a subsequent course.) Let’s add something to the hash table.

> (hash-set! animals "Z" "Zebra")
hash-set!: contract violation
  expected: (and/c hash? (not/c immutable?))
  given: '#hash(("A" . "Aaardvark") ("B" . "Badger") ("C" .
"Chinchilla") ("D" . "Dingo") ("E" . "Emu") ("F" . "Fennec
Fox"))
  argument position: 1st
  other arguments...:
   "Z"
   "Zebra"

Hmmm. That didn’t work. Why not? When we present a hash table to Racket using the '#hash notation, it treats is as an immutable table. You can use it, but you cannot modify it. That makes sense for some cases. For example, you might have assumptions about the contents of the hash table and you would not want another part of the program (perhaps written by another programmer who did not understand those expectations) to “mess with” the table.

But what if you want to create a hash table with a sequence of key/value pairs and you want it to be mutable? In that case, you can use a variant of the make-hash operation in which you present it with a list of key/value pairs.

> (define animals
    (make-hash (list (cons "A" "Anteater")
                     (cons "B" "Baboon")
                     (cons "C" "Civet")
                     (cons "D" "Dormouse")
                     (cons "E" "Echidna")
                     (cons "F" "Flamingo"))))
> animals
'#hash(("B" . "Baboon") ("F" . "Flamingo") ("D" . "Dormouse") ("A" .
"Anteater") ("E" . "Echidna") ("C" . "Civet"))
> (hash-ref animals "B")
"Baboon"
> (hash-has-key? animals "Z")
#f
> (hash-set! animals "Z" "Zebra")
> animals
'#hash(("B" . "Baboon") ("F" . "Flamingo") ("D" . "Dormouse") ("Z" .
"Zebra") ("A" . "Anteater") ("E" . "Echidna") ("C" . "Civet"))
> (hash-ref animals "Z")
"Zebra"

Connecting keys

What happens if we try to use something that’s close to an existing key, but not quite the same? Let’s try and find out.

> (hash-has-key? book-authors "Homegoing")
#t
> (hash-has-key? book-authors "Home going")
#f
> (hash-has-key? book-authors "homegoing")
#f

As these two examples suggest, hash tables are not particularly smart. They expect exact matches of the hash key. Programmers who want more accommodating hash tables may want to write procedures that convert keys to a common form before hashing. Let’s consider how we might do that with strings.

First, let’s agree on a common form. Say, for example, that we decide that the common form should be all lowercase and should contain only letters and digits. We can use string-downcase to convert the string to lowercase and some filtering to extract only the letters and digits.

;;; (common-form str) -> string?
;;;   str : string?
;;; Convert `str` to a common form for use as, say, a key in
;;; a hash table.
;;;   If provided with a non-string, just returns the value.
(define common-form
  (lambda (str)
    (if (string? str)
        (string-downcase (list->string (filter (lambda (ch)
                                                 (or (char-numeric? ch)
                                                     (char-alphabetic? ch)))
                                               (string->list str))))
        str)))
> (common-form "Homegoing")
"homegoing"
> (common-form "Home Going")
"homegoing"
> (common-form "Home going!")
"homegoing"
> (common-form "Going home?")
"goinghome"
> (common-form (list "Home" "Going"))
'("Home" "Going")

Now that we have this helper procedure, we can write our own versions of the basic hash table operations.

;;; (new-make-hash pairs) -> hash?
;;;   pairs : list-of pair?
;;; Create a hash table that uses the common form of the keys in pairs.
(define new-make-hash
  (lambda (pairs)
    (make-hash (map (lambda (pair)
                      (cons (common-form (car pair))
                            (cdr pair)))
                    pairs))))

;;; (new-hash-set! hash key value) -> void?
;;;   hash : hash?
;;;   key : any?
;;;   value : any?
;;; Sets a value in a hash using the common form of string keys.
(define new-hash-set!
  (lambda (hash key value)
        (hash-set! hash (common-form key) value)))

;;; (new-hash-ref! hash key) -> any?
;;;   hash : hash?
;;;   key : any?
;;; Look up a value using the common form of key.
(define new-hash-ref
  (lambda (hash key)
    (hash-ref hash (common-form key))))

;;; (new-hash-has-key? hash key) -> boolean?
;;;   hash : hash
;;;   key : any?
;;; Determine if hash has the common form of key.
(define new-hash-has-key?
  (lambda (hash key)
    (hash-has-key? hash (common-form key))))
> (define book-authors
    (new-make-hash
     (list (cons "The Princess Bride" "William Goldman")
           (cons "Homegoing" "Yaa Gasi")
           (cons "Moo" "Jane Smiley")
           (cons "Moo, Baa, La La La!" "Sanda Boynton"))))
> (new-hash-ref book-authors "Homegoing")
"Yaa Gasi"
> (new-hash-ref book-authors "Home Going")
"Yaa Gasi"
> (new-hash-ref book-authors "moo!")
"Jane Smiley"
> (new-hash-set! book-authors "Friday Black" "Nana Kwame Adjei-Brenyah")
> (new-hash-ref book-authors "fri day black")
"Nana Kwame Adjei-Brenyah"
> book-authors
'#hash(("moobaalalala" . "Sanda Boynton") ("homegoing" . "Yaa Gasi")
("theprincessbride" . "William Goldman") ("fridayblack" . "Nana Kwame
Adjei-Brenyah") ("moo" . "Jane Smiley"))

What happens if we fail to use our methods? Let’s see.

> (hash-ref book-authors "Homegoing")
hash-ref: no value found for key
  key: "Homegoing"
> (hash-set! book-authors "The Princess Bride" "S. Morgenstern")
> (new-hash-ref book-authors "The Princess Bride")
"William Goldman"
> (hash-ref book-authors "The Princess Bride")
"S. Morgenstern"
> book-authors
'#hash(("moobaalalala" . "Sanda Boynton") ("homegoing" . "Yaa Gasi")
("theprincessbride" . "William Goldman") ("The Princess Bride" . "S.
Morgenstern") ("fridayblack" . "Nana Kwame Adjei-Brenyah") ("moo" .
"Jane Smiley"))

As these examples suggest, if we decide to use our alternate procedures, we need to be uniform in our use of those procedures.

Storing compound data

What happens if we want to store more than one value with a key? For example, some books have more than one author. For example, How to Design Programs is by Matthew Felleisen, Robert Bruce Findler, Matthew Flatt, and Shriram Krishnamurthi. One possibility, at least in this case, is to represent the authors not as a string, but as a list.

> (new-hash-set! book-authors
                "How to Design Programs"
                (list "Matthias Felleisen" "Robert Bruce Findler"
                      "Matthew Flatt" "Shriram Krishnamurthi"))
> (new-hash-ref book-authors "How To Design Programs")
'("Matthias Felleisen" "Robert Bruce Findler" "Matthew Flatt" "Shriram
Krishnamurthi")
> (new-hash-ref book-authors "Friday Black")
"Nana Kwame Adjei-Brenyah"

What happens when we want to associated different kinds of values with a single key? For example, a college phone directory might contain not only a phone number, but also a username and an address.

Consider the following table.

**name**   **phone**      **username**   **address**
Avery      555-555-1212   aviary         54 Main Hall
Riley      555-555-8888   lifeof         12 James B. Mary
Reese      555-555-1234   pbcups         N. Dibble
Jordan     555-555-4321   air            23 Center Dorm

How might we represent that table?

One possibility is to set up multiple hash tables, one that maps the name to the phone number, a second of which maps the name to the username, the third of which maps to an address, and so on and so forth.

;;; phones : hash?
;;; A table of phone numbers
(define phones (make-hash))

;;; usernames : hash?
;;; A table of usernames
(define usernames (make-hash))

;;; address : hash?
;;; A table of addresses
(define addresses (make-hash))

;;; (add-entry! name phone username address) -> void?
;;;   name : string?
;;;   phone : string?
;;;   address : string?
;;; Add an entry to our global student directory.
(define add-entry!
  (lambda (name phone username address)
    (new-hash-set! phones name phone)
    (new-hash-set! usernames name username)
    (new-hash-set! addresses name address)))

;;; (student-exists? name) -> boolean?
;;;   name : string?
;;; Determine if our student exists in the global student directory.
(define student-exists?
  (lambda (name)
    (and (hash-has-key? phones name)
         (hash-has-key? usernames name)
         (hash-has-key? addresses name))))

;;; (lookup-phone name) -> string?
;;;   name : string?
;;; Look up the phone number associated with a student in our global 
;;; student directory.
(define lookup-phone
  (lambda (name)
    (and (new-hash-has-key? phones name)
         (new-hash-ref phones name))))

;;; (lookup-username name) -> string?
;;;   name : string?
;;; Look up the name associated with a student in our global student 
;;; directory.
(define lookup-username
  (lambda (name)
    (and (new-hash-has-key? usernames name)
         (new-hash-ref usernames name))))

;;; (lookup-address name) -> string?
;;;   name : string?
;;; Look up the name associated with a student in our global student 
;;; directory.
(define lookup-address
  (lambda (name)
    (and (new-hash-has-key? addresses name)
         (new-hash-ref addresses name))))
> (add-entry! "Avery" "555-555-1212" "aviary" "54 Main Hall")
> (add-entry! "Riley" "555-555-8888" "lifeof" "12 James B. Mary")
> (add-entry! "Reese" "555-555-1234" "pbcups" "N. Dibble")
> (add-entry! "Jordan" "555-555-4321" "air" "23 Center Dorm")
> (lookup-phone "Reese")
"555-555-1234"
> (lookup-address "Avery")
"54 Main Hall"
> (lookup-username "Quinn")
#f
> (add-entry! "Quinn" "555-555-5555" "eskimo" "11Q")
> (lookup-username "Quinn")
"eskimo"

Unfortunately, the design means that we have only one set of phones, usernames, and addresses. It’s our custom to make the table the parameter to a procedure like lookup-phone; we want to look up someone’s phone number in a particular directory, rather than in a global directory.

An alternative: Vectors of data

One alternative is to group data together into a single structure. At this point in your career, your best bet would probably be a vector in which you include the values in a fixed order: First the phone number, then the username, then the address.

;;; (new-directory) -> hash?
;;; Creates a directory of students usable by the following procedures.
;;; (DrRacket has a `make-directory` procedure with a different meaning,
;;; so we call this `new-directory` instead.
(define new-directory
  make-hash)

;;; (directory-add-entry! dir name phone username address) -> void?
;;;   dir : hash?
;;;   name : string?
;;;   phone : string?
;;;   address : string?
;;; Add an entry to our global student directory.
(define directory-add-entry!
  (lambda (dir name phone username address)
    (new-hash-set! dir name (vector phone username address))))

;;; (directory-lookup-phone dir name) -> string?
;;;   name : string?
;;; Look up the phone number associated with a student in `dir`.
(define directory-lookup-phone
  (lambda (dir name)
    (and (new-hash-has-key? dir name)
         (vector-ref (new-hash-ref dir name) 0))))

;;; (directory-lookup-username dir name) -> string?
;;;   name : string?
;;; Look up the username associated with a student in `dir`.
(define directory-lookup-username
  (lambda (dir name)
    (and (new-hash-has-key? dir name)
         (vector-ref (new-hash-ref dir name) 1))))

;;; (directory-lookup-address dir name) -> string?
;;;   name : string?
;;; Look up the address associated with a student in `dir`.
(define directory-lookup-address
  (lambda (dir name)
    (and (new-hash-has-key? dir name)
         (vector-ref (new-hash-ref dir name) 2))))
> (define egelloc (new-directory))
> (define ytisevinu (new-directory))
> (directory-add-entry! egelloc "Avery" "555-555-1212" "aviary" "54 Main Hall")
> (directory-add-entry! ytisevinu "Riley" "555-555-8888" "lifeof" "12 James B.
Mary")
> (directory-add-entry! egelloc "Reese" "555-555-1234" "pbcups" "N. Dibble")
> (directory-add-entry! ytisevinu "Jordan" "555-555-4321" "air" "23 Center Dorm")
> (directory-lookup-address egelloc "Reese")
"N. Dibble"
> (directory-lookup-address ytisevinu "Reese")
#f
> (directory-add-entry! ytisevinu "Reese" "555-555-0000" "seer" "Off campus")
> (directory-lookup-address egelloc "Reese")
"N. Dibble"
> (directory-lookup-address ytisevinu "Reese")
"Off campus"

Another alternative: Extended keys

Is the strategy we just considered the only way to use hash tables to store compound data? Of course not. As you’ve likely learned by now, there are multiple ways to design solutions to most problems. It’s worthwhile to consider alternate designs and then choose one based on factors most important to you, such as speed, adaptability, clarity, or ease of implemention.

If you really like having four separate hash tables, you could make a directory a vector of four hash tables. But we’ll leave that as an exercise for you to explore on your own time.

We could also construct “extended keys” that take incorporate both the real key and an additional string that indicates the kind of information we’re storing. For example, we would store Reese’s phone number with the key "reese-phone" and Jordan’s address with the key "jordan-address". Let’s see what that might look like.

;;; (new-table) -> hash?
;;; Creates a table of student contact info usable by the following 
;;; procedures.
(define new-table
  make-hash)

;;; (table-add-entry! table name phone username address) -> void?
;;;   table : hash?
;;;   name : string?
;;;   phone : string?
;;;   address : string?
;;; Add an entry to our global student table.
(define table-add-entry!
  (lambda (table name phone username address)
    (new-hash-set! table (string-append name "000" "phone") phone)
    (new-hash-set! table (string-append name "000" "username") username)
    (new-hash-set! table (string-append name "000" "address") address)))

;;; (table-lookup table name attribute) -> string?
;;;   table : hash?
;;;   name : string?
;;;   attribute : string?
;;; Look up the attribute (column name) for a particular name.
;;; Returns the attribute, if found, or #f if not found.
(define table-lookup 
  (lambda (table name attribute)
    (let ([key (string-append name "000" attribute)])
      (and (new-hash-has-key? table key)
           (new-hash-ref table key)))))

;;; (table-lookup-phone table name) -> string?
;;;   name : string?
;;; Look up the phone number associated with a student in table.
(define table-lookup-phone
  (section table-lookup <> <> "phone"))

;;; (table-lookup-username dir name) -> string?
;;;   name : string?
;;; Look up the username associated with a student in our global 
;;; student table.
(define table-lookup-username
  (section table-lookup <> <> "username"))

;;; (table-lookup-address dir name) -> string?
;;;   name : string?
;;; Look up the address associated with a student in our global 
;;; student table.
(define table-lookup-address
  (section table-lookup <> <> "address"))

Let’s try it out.

> (define nellgrin (new-table))
> (define nellcorn (new-table))
> (table-add-entry! nellgrin "Avery" "555-555-1212" "aviary" "54 Main Hall")
> (table-add-entry! nellcorn "Riley" "555-555-8888" "lifeof" "12 James B.
Mary")
> (table-add-entry! nellgrin "Reese" "555-555-1234" "pbcups" "N. Dibble")
> (table-add-entry! nellcorn "Jordan" "555-555-4321" "air" "23 Center Dorm")
> (table-lookup-address nellgrin "Reese")
"N. Dibble"
> (table-lookup-address nellcorn "Reese")
#f
> (table-add-entry! nellcorn "Reese" "555-555-0000" "seer" "Off campus")
> (table-lookup-address nellgrin "Reese")
"N. Dibble"
> (table-lookup-address nellcorn "Reese")
"Off campus"

It looks like it’s working well.

Why didn’t we look at the contents of any of the hash tables? Because we care more about how they behave than how they are structured. If you use procedures without needed to know the underlying structure, you’ve likely managed to abstract away some of the complexity.

Self Checks

Check 1: Using hash tables

We’ve listed three uses of hash tables: dictionaries, phone books, and information about books. Come up with two others.

Check 2: Comparing models

Explain why we created three variations of add-entry!.

Check 3: Comparing models, revisited (‡)

Suppose we wanted to be able to change someone’s phone number. Indicate how you might write set-phone! for each of the three models. In the first case, it will likely take the form (set-phone! name phone). In the second case, it is likely to take the form (directory-set-phone! dir name phone). In the third case, it is likely to take the form (table-set-phone! table name phone).

(define set-phone!
  (lambda (name phone)
    undefined))

(define directory-set-phone!
  (lambda (dir name phone)
    undefined))

(define table-set-phone!
  (lambda (table name phone)
    undefined))

Acknowledgements

In writing this section, we drew upon the discussion of hash tables in the Racket language documentation and the discussion of hash tables in the Racket Guide.

This section also draws upon a reading entitled “Association Lists” from Grinnell College’s CSC 151.