Coding Challenge 5: Data science

Assigned: Monday, 23 February 2026
Summary: In this coding challenge we will work with imported data to solve more complex problems with lists.
Collaboration: Normal collaboration policies for coding challenges apply to this assignment. See the syllabus for specifics.

Instructions

In this coding challenge we are providing you with starter code, CC5-template.scm. You should download this file and upload the given code to a new .scm file named LASTNAME-data-science.scm, where you replace LASTNAME with your last name.

At the top of your file, make sure to include your name, and any acknowledgements according to our collaboration policy. Now is a great time to review those collaboration policies.

You are required to document every procedure that you write using the documentation style of our course. Tests are optional, but likely helpful in determining if your code is working as desired.

Data

In this coding challenge, we’re going to work with a file of data called iowa-voter-registration-2018-02.csv. The full data set contains voter registration numbers from January 2000 to February 2018, and was dowloaded from data/iowa.gov on February 14, 2018. The version we’re giving you here is slightly smaller. A row of data in the file corresponds to a county in Iowa, and each row has the following columns.

Columns

Date (in MM/DD/YYYY HH:MM:SS xM format)
FIPS code
County Name
Active Democrats
Active Republicans
Active Libertarians
Active No-Party
Active Other
Active Total
Inactive Democrats
Inactive Republicans
Inactive Libertarians
Inactive No-Party
Inactive Other
Inactive Total
Grand Total
Latitude
Longitude
Coordinates

Here is an example row from the data set

02/01/2018 12:00:00 AM,19085,Harrison,2021,3684,36,2640,13,8394,50,105,1,123,2,281,8675,41.6828528,-95.8169209,"(41.6828528, -95.8169209)"

Access the file here: iowa-voter-registration-2018-02.csv

Problem 1: Basics about the data

In this problem we are going to learn some basic summary information about the data. Before tackling the file (using with-file), let’s write procedures that work with a list of the following sort. This list, test-list, is already included in the starter code.

(define test-list
(list (list "02/01/2018 12:00:00 AM" "19103" "Johnson" "42139" "18684" "644" "30401" "221" "92089" "5893" "2768" "99" "6592" "55" "15407" "107496" "41.6715511" "-91.5880849" "(41.6715511, -91.5880849)") 
       (list "02/01/2018 12:00:00 AM" "19111" "Lee" "8756" "4352" "85" "8022" "22" "21237" "651" "349" "12" "1067" "4" "2083" "23320" "40.6419764" "-91.479264" "(40.6419764, -91.479264)") 
       (list "02/01/2018 12:00:00 AM" "19057" "Des Moines" "10089" "6505" "107" "8822" "38" "25561" "881" "463" "13" "1117" "5" "2479" "28040" "40.9231829" "-91.1814707" "(40.9231829, -91.1814707)") 
       (list "02/01/2018 12:00:00 AM" "19061" "Dubuque" "24735" "15986" "276" "22244" "87" "63328" "1777" "1081" "30" "2113" "18" "5019" "68347" "42.468832" "-90.8824564" "(42.468832, -90.8824564)") 
       (list "02/01/2018 12:00:00 AM" "19179" "Wapello" "7812" "5257" "62" "6818" "37" "19986" "679" "379" "7" "965" "0" "2030" "22016" "41.0305845" "-92.4094499" "(41.0305845, -92.4094499)") 
       (list "02/01/2018 12:00:00 AM" "19153" "Polk" "106466" "82595" "1989" "87024" "526" "278600" "8506" "5778" "238" "8536" "59" "23117" "301717" "41.6855048" "-93.5735335" "(41.6855048, -93.5735335)") 
       (list "02/01/2018 12:00:00 AM" "19101" "Jefferson" "3795" "3249" "46" "3006" "44" "10140" "351" "180" "10" "416" "10" "967" "11107" "41.0317596" "-91.9488774" "(41.0317596, -91.9488774)") 
       (list "02/01/2018 12:00:00 AM" "19013" "Black Hawk" "28562" "20817" "403" "29878" "168" "79828" "2658" "1651" "55" "3426" "21" "7811" "87639" "42.4700957" "-92.3088197" "(42.4700957, -92.3088197)") 
       (list "02/01/2018 12:00:00 AM" "19113" "Linn" "48961" "37798" "1054" "51591" "323" "139727" "3722" "2332" "103" "4586" "33" "10776" "150503" "42.0789478" "-91.5989646" "(42.0789478, -91.5989646)") 
       (list "02/01/2018 12:00:00 AM" "19097" "Jackson" "4730" "3094" "25" "5786" "8" "13643" "269" "164" "2" "513" "1" "949" "14592" "42.1717426" "-90.5742294" "(42.1717426, -90.5742294)")))

Part A

Write a procedures called extract-FIPS and extract-county, which return the FIPS code and the county name respectively of a given row of data. The FIPS should be returned as a number, and the county name should be returned as a string.

> (extract-FIPS (list "02/01/2018 12:00:00 AM" "19113" "Linn" "48961" "37798" "1054" "51591" "323" "139727" "3722" "2332" "103" "4586" "33" "10776" "150503" "42.0789478" "-91.5989646" "(42.0789478, -91.5989646)"))
19113

> (extract-FIPS (list "02/01/2018 12:00:00 AM" "19179" "Wapello" "7812" "5257" "62" "6818" "37" "19986" "679" "379" "7" "965" "0" "2030" "22016" "41.0305845" "-92.4094499" "(41.0305845, -92.4094499)") )
19179

> (extract-county (list "02/01/2018 12:00:00 AM" "19113" "Linn" "48961" "37798" "1054" "51591" "323" "139727" "3722" "2332" "103" "4586" "33" "10776" "150503" "42.0789478" "-91.5989646" "(42.0789478, -91.5989646)"))
"Linn"

> (extract-county (list "02/01/2018 12:00:00 AM" "19179" "Wapello" "7812" "5257" "62" "6818" "37" "19986" "679" "379" "7" "965" "0" "2030" "22016" "41.0305845" "-92.4094499" "(41.0305845, -92.4094499)") )
"Wapello"

Part B

Write a procedure called summary-info, which takes in a list such as test-list as input, and returns a list with the following items: the total number of pieces (rows) of data, the smallest FIPS number from the data, the largest FIPS number from the data, the county which comes alphabetically first, and the county that comes alphabetically last.

> (summary-info test-list)
(10 19013 19179 "Black Hawk" "Wapello")

One strategy to solve this problem would involve using the sort procedure.

Part C

Calculate the summary info for the Iowa voter registration file. This should be a fairly straightforward application of with-file combined with your solution from part b, it may be helpful to remind yourself of how that procedure (with-file) works in our reading data from files reading.

Problem 2: Republican Voters

Part A

Write a procedure percent-rep which takes in a row of data, and returns the percentage of voters (both active and inactive) which are registered as republicans. Your answer should be formated as XX.XX, rounded to two decimal places.

> (percent-rep (list "02/01/2018 12:00:00 AM" "19113" "Linn" "48961" "37798" "1054" "51591" "323" "139727" "3722" "2332" "103" "4586" "33" "10776" "150503" "42.0789478" "-91.5989646" "(42.0789478, -91.5989646)"))
26.66

> (percent-rep (list "02/01/2018 12:00:00 AM" "19179" "Wapello" "7812" "5257" "62" "6818" "37" "19986" "679" "379" "7" "965" "0" "2030" "22016" "41.0305845" "-92.4094499" "(41.0305845, -92.4094499)") )
25.6

Part B

In this problem, we will find the entries in the data with the largest percentage of republican voters, using your procedure from part A. Your procedure should return a list of the top 5 entries in the data with the largest computed values from part A. Rather than returning the full row of data for each entry (which contains more information than we care about), return the name of the county, the year the data was collected as a number, and the percentage you calculated in part A.

> (highest-reps test-list)
(list (list "Jefferson" 2018 30.87) 
       (list "Polk" 2018 29.29) 
       (list "Linn" 2018 26.66) 
       (list "Black Hawk" 2018 25.64) 
       (list "Wapello" 2018 25.6))

Note that in the dataset you were given, all of the entries are from the year 2018, so expect to see that year in all of your answers. However, we will test your procedure on datasets which have other years, and it should work in those cases as well.

Part C

Finally, find the top 5 republican percentage entries for the Iowa voter registration file. Similarly to problem 1, this should be fairly straightforward.

Problem 3: Your choice

This is your opportunity to be creative. Ask a question about this dataset, and then answer it. In order to answer your question, you should need to access at least 2 entries (columns) of the dataset.

As with the previous problems, it might be a good strategy to first write procedures that can work on test-list, before experimenting with importing the file.

Your submission for this problem should include:

The question you asked, as a comment.
Procedures as needed to answer your question. Procedures need documentation as always.
Demonstration of running the code on the file using with-file.
A brief comment which explains what you found from the data.

Submission guidelines

Submit your .scm file to Gradescope, using the name according to the top of these instructions.

Grading rubric

In grading your submission, we will look for the following at each level. Note that if a criteria does not pass a lower level, we will likely not check for criteria at the higher levels. We may also identify other characteristics that move your work between levels.

You should read through the rubric and verify that your submission meets the rubric.

Redo or above

Submissions that lack any of these characteristics will get an N.

[] Includes the specified file (correctly named).
[] Includes an appropriate header on the file that indicates the course, author, acknowledgements, etc.
[] Acknowledges appropriately.
[] Code runs in scamper.

Meets expectations or above

Submissions that lack any of these characteristics but have all of the prior characteristics will get an R.

[] Code is well-formatted, following the style of our class. File is organized with comments to indicate the start of new problems. 
[] All grader tests pass for Problem 1A and 1B.
[] All grader tests pass for Problem 2A and 2B.
[] Code works as expected, reading data using `with-file`, in Problems 1C and 2C. 
[] Problem 3 asks a new question about the data, and successfully answers it with code and a brief explanation. 
[] Documentation in the 151 style is included for all code, and contains correct information.

Exemplary / Exceeds expectations

Submissions that lack any of these characteristics but have all of the prior characteristics will get an M.

[] All code is exceptionally organized and easy to follow, through the use of comments (to explain the purpose of different pieces of the code), decomposition, and highly intuitive naming choices.
[] Helper functions are used in a way to avoid replicating code in multiple places. 
[] Tests are included for problems 1A, 1B, 2A, and 2B. 
[] Question posed in question 3 is particularly interesting or complex to answer. 

Copyright © Eric Autry, Charlie Curtsinger, Sarah Dahlby Albright, Janet Davis, Nicole Eikmeier, Fahmida Hamid, Priscilla Jiménez, Barbara Johnson, Titus Klinge, Peter-Michael Osera, Leah Perlmutter, Samuel A. Rebelsky, William Rebelsky, John David Stone, Anya Vostinar, Henry Walker, and Jerod Weinman.

Unless specified otherwise elsewhere on this page, this work is licensed under a Creative Commons Attribution 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc/3.0/ or send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.

This website was built using Jekyll, Twitter Bootstrap, and the Bootswatch Cosmo Theme.