In this coding challenge we are providing you with starter code, CC5-template.scm. You should download this file and upload the given code to a new .scm file named LASTNAME-data-science.scm, where you replace LASTNAME with your last name.
At the top of your file, make sure to include your name, and any acknowledgements according to our collaboration policy. Now is a great time to review those collaboration policies.
You are required to document every procedure that you write using the documentation style of our course. Tests are optional, but likely helpful in determining if your code is working as desired.
In this coding challenge, we’re going to work with a file of data called iowa-voter-registration-2018-02.csv. The full data set contains voter registration numbers from January 2000 to February 2018, and was dowloaded from data/iowa.gov on February 14, 2018. The version we’re giving you here is slightly smaller. A row of data in the file corresponds to a county in Iowa, and each row has the following columns.
Columns
0: Date (in MM/DD/YYYY HH:MM:SS xM format)
1: FIPS code
2: County Name
3: Active Democrats
4: Active Republicans
5: Active Libertarians
6: Active No-Party
7: Active Other
8: Active Total
9: Inactive Democrats
10: Inactive Republicans
11: Inactive Libertarians
12: Inactive No-Party
13: Inactive Other
14: Inactive Total
15: Grand Total
16: Latitude
17: Longitude
18: Coordinates
Here is an example row from the data set
02/01/2018 12:00:00 AM,19085,Harrison,2021,3684,36,2640,13,8394,50,105,1,123,2,281,8675,41.6828528,-95.8169209,"(41.6828528, -95.8169209)"
Access the file here: iowa-voter-registration-2018-02.csv
In this problem we are going to learn some basic summary information about the data. Before tackling the file (using with-file), let’s write procedures that work with a list of the following sort. This list, test-list, is already included in the starter code.
(define test-list
(list (list "02/01/2018 12:00:00 AM" "19103" "Johnson" "42139" "18684" "644" "30401" "221" "92089" "5893" "2768" "99" "6592" "55" "15407" "107496" "41.6715511" "-91.5880849" "(41.6715511, -91.5880849)")
(list "02/01/2018 12:00:00 AM" "19111" "Lee" "8756" "4352" "85" "8022" "22" "21237" "651" "349" "12" "1067" "4" "2083" "23320" "40.6419764" "-91.479264" "(40.6419764, -91.479264)")
(list "02/01/2018 12:00:00 AM" "19057" "Des Moines" "10089" "6505" "107" "8822" "38" "25561" "881" "463" "13" "1117" "5" "2479" "28040" "40.9231829" "-91.1814707" "(40.9231829, -91.1814707)")
(list "02/01/2018 12:00:00 AM" "19061" "Dubuque" "24735" "15986" "276" "22244" "87" "63328" "1777" "1081" "30" "2113" "18" "5019" "68347" "42.468832" "-90.8824564" "(42.468832, -90.8824564)")
(list "02/01/2018 12:00:00 AM" "19179" "Wapello" "7812" "5257" "62" "6818" "37" "19986" "679" "379" "7" "965" "0" "2030" "22016" "41.0305845" "-92.4094499" "(41.0305845, -92.4094499)")
(list "02/01/2018 12:00:00 AM" "19153" "Polk" "106466" "82595" "1989" "87024" "526" "278600" "8506" "5778" "238" "8536" "59" "23117" "301717" "41.6855048" "-93.5735335" "(41.6855048, -93.5735335)")
(list "02/01/2018 12:00:00 AM" "19101" "Jefferson" "3795" "3249" "46" "3006" "44" "10140" "351" "180" "10" "416" "10" "967" "11107" "41.0317596" "-91.9488774" "(41.0317596, -91.9488774)")
(list "02/01/2018 12:00:00 AM" "19013" "Black Hawk" "28562" "20817" "403" "29878" "168" "79828" "2658" "1651" "55" "3426" "21" "7811" "87639" "42.4700957" "-92.3088197" "(42.4700957, -92.3088197)")
(list "02/01/2018 12:00:00 AM" "19113" "Linn" "48961" "37798" "1054" "51591" "323" "139727" "3722" "2332" "103" "4586" "33" "10776" "150503" "42.0789478" "-91.5989646" "(42.0789478, -91.5989646)")
(list "02/01/2018 12:00:00 AM" "19097" "Jackson" "4730" "3094" "25" "5786" "8" "13643" "269" "164" "2" "513" "1" "949" "14592" "42.1717426" "-90.5742294" "(42.1717426, -90.5742294)")))
Write a procedures called extract-FIPS and extract-county, which return the FIPS code and the county name respectively of a given row of data. The FIPS should be returned as a number, and the county name should be returned as a string.
> (extract-FIPS (list "02/01/2018 12:00:00 AM" "19113" "Linn" "48961" "37798" "1054" "51591" "323" "139727" "3722" "2332" "103" "4586" "33" "10776" "150503" "42.0789478" "-91.5989646" "(42.0789478, -91.5989646)"))
19113
> (extract-FIPS (list "02/01/2018 12:00:00 AM" "19179" "Wapello" "7812" "5257" "62" "6818" "37" "19986" "679" "379" "7" "965" "0" "2030" "22016" "41.0305845" "-92.4094499" "(41.0305845, -92.4094499)") )
19179
> (extract-county (list "02/01/2018 12:00:00 AM" "19113" "Linn" "48961" "37798" "1054" "51591" "323" "139727" "3722" "2332" "103" "4586" "33" "10776" "150503" "42.0789478" "-91.5989646" "(42.0789478, -91.5989646)"))
"Linn"
> (extract-county (list "02/01/2018 12:00:00 AM" "19179" "Wapello" "7812" "5257" "62" "6818" "37" "19986" "679" "379" "7" "965" "0" "2030" "22016" "41.0305845" "-92.4094499" "(41.0305845, -92.4094499)") )
"Wapello"
Write a procedure called summary-info, which takes in a list such as test-list as input, and returns a list with the following items: the total number of pieces (rows) of data, the smallest FIPS number from the data, the largest FIPS number from the data, the county which comes alphabetically first, and the county that comes alphabetically last.
> (summary-info test-list)
(10 19013 19179 "Black Hawk" "Wapello")
One strategy to solve this problem would involve using the sort procedure.
Calculate the summary info for the Iowa voter registration file. This should be a fairly straightforward application of with-file combined with your solution from part b, it may be helpful to remind yourself of how that procedure (with-file) works in our reading data from files reading.
Write a procedure percent-rep which takes in a row of data, and returns the percentage of voters (both active and inactive) which are registered as republicans. Your answer should be formated as XX.XX, rounded to two decimal places.
> (percent-rep (list "02/01/2018 12:00:00 AM" "19113" "Linn" "48961" "37798" "1054" "51591" "323" "139727" "3722" "2332" "103" "4586" "33" "10776" "150503" "42.0789478" "-91.5989646" "(42.0789478, -91.5989646)"))
26.66
> (percent-rep (list "02/01/2018 12:00:00 AM" "19179" "Wapello" "7812" "5257" "62" "6818" "37" "19986" "679" "379" "7" "965" "0" "2030" "22016" "41.0305845" "-92.4094499" "(41.0305845, -92.4094499)") )
25.6
In this problem, we will find the entries in the data with the largest percentage of republican voters, using your procedure from part A. Your procedure should return a list of the top 5 entries in the data with the largest computed values from part A. Rather than returning the full row of data for each entry (which contains more information than we care about), return the name of the county, the year the data was collected as a number, and the percentage you calculated in part A.
> (highest-reps test-list)
(list (list "Jefferson" 2018 30.87)
(list "Polk" 2018 29.29)
(list "Linn" 2018 26.66)
(list "Black Hawk" 2018 25.64)
(list "Wapello" 2018 25.6))
Note that in the dataset you were given, all of the entries are from the year 2018, so expect to see that year in all of your answers. However, we will test your procedure on datasets which have other years, and it should work in those cases as well.
Finally, find the top 5 republican percentage entries for the Iowa voter registration file. Similarly to problem 1, this should be fairly straightforward.
This is your opportunity to be creative. Ask a question about this dataset, and then answer it. In order to answer your question, you should need to access at least 2 entries (columns) of the dataset.
As with the previous problems, it might be a good strategy to first write procedures that can work on test-list, before experimenting with importing the file.
Your submission for this problem should include:
with-file.Submit your .scm file to Gradescope, using the name according to the top of these instructions.
In grading your submission, we will look for the following at each level. Note that if a criteria does not pass a lower level, we will likely not check for criteria at the higher levels. We may also identify other characteristics that move your work between levels.
You should read through the rubric and verify that your submission meets the rubric.
Submissions that lack any of these characteristics will get an N.
[] Includes the specified file (correctly named).
[] Includes an appropriate header on the file that indicates the course, author, acknowledgements, etc.
[] Acknowledges appropriately.
[] Code runs in scamper.
Submissions that lack any of these characteristics but have all of the prior characteristics will get an R.
[] Code is well-formatted, following the style of our class. File is organized with comments to indicate the start of new problems.
[] All grader tests pass for Problem 1A and 1B.
[] All grader tests pass for Problem 2A and 2B.
[] Code works as expected, reading data using `with-file`, in Problems 1C and 2C.
[] Problem 3 asks a new question about the data, and successfully answers it with code and a brief explanation.
[] Documentation in the 151 style is included for all code, and contains correct information.
Submissions that lack any of these characteristics but have all of the prior characteristics will get an M.
[] All code is exceptionally organized and easy to follow, through the use of comments (to explain the purpose of different pieces of the code), decomposition, and highly intuitive naming choices.
[] Helper functions are used in a way to avoid replicating code in multiple places.
[] Tests are included for problems 1A, 1B, 2A, and 2B.
[] Question posed in question 3 is particularly interesting or complex to answer.