Machine Bias (https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing)
How we analyzed the COMPAS Recidivism Algorithm (https://www.propublica.org/article/how-we-analyzed-the-compas-recidivism-algorithm)
Can you make AI fairer than a judge? (https://www.technologyreview.com/2019/10/17/75285/ai-fairer-than-judge-criminal-risk-assessment-algorithm/)
Optional: Watch this video on fairness (https://www.youtube.com/watch?v=jIXIuYdnyyk&t=0s&ab_channel=ArvindNarayanan)
In the ProPublica article titled Machine Bias, the authors explain racial biases that result from the use of algorithmic decision making in the American judicial system. In a number of jurisdictions across the United States, judges are given access to results from an algorithm that assesses the risk of defendants. In assessing the risk of defendants, Northpointe, the company that created the algorithm, uses questionnaire data collected from defendants, inmates, or criminal records. Due to civil rights law, these questionnaires are unable to vary their risk assessments explicitly based on race. However, the questionnaire asks questions that are a reflection of racial disparities in the United States, resulting in unequal outcomes for defendants in the Northpointe algorithm.
While this issue can certainly be discussed further, I believe that an issue that deserves consideration is how differences in technological/mathematical literacy can influence design choices. In the statistics course sequence at Grinnell College, we discuss the underlying theory behind statistical inference. In this discussion, much attention is paid to the limitations of approaches, the assumptions that are made when an approach is chosen, and how the validity of approximations can depend. In this process, it is made clear that some assumptions made are based on subjective questions, like “when is a sample large enough?” or “does this data appear normally distributed?” Moreover, notions like “statistical significance” are often based on arbitrary distinctions of when something cannot happen due to random chance given the assumptions made. As a result, when someone discusses their statistical findings, they explain their assumptions, why they were made, the limitations of the study, and their conclusions. However, according to David Scharf at the Broward County Sheriff’s Office in Fort Lauderdale, they decided on Northpointe’s software over other tools because it was “easy to use and produced ‘simple yet effective charts and graphs for judicial review.'” Due to a faith in technology and mathematics, those not familiar with statistical inference or technology can choose tools based on their simplicity. After all, the quality of a piece of technology is often judged on whether it works simply and immediately out of the packaging. However, this faith in mathematics and technology can result in corporations like Northpointe choosing to not disclose the limitations or methodology of their tools.
Probability is a combination of a large population. It finds the potential trend based on the big data set and the set variables that are selected for the analysis. However, the trend is the likelihood that someone will do something, but it is not a firm argument that someone will definitely do something.
In Machine Bias, the author Julia Angwin describes the bias while calculating the outcome of the two young prisoners who did the same trials. Thus, in this case, risk assessment is trying to use the trend to predict the individual, which involves a lot of variations. Especially, some of the traits might change during the time of experiencing the world, so the mindset of the world varies based on the people they meet and the project they do. Thus, humans are hard to predict, and I don’t think that machine learning could do it directly.
The same as described in the article “Can you make AI fairer than a judge?”, as the author mentions that machine decision-making cannot be perfect with the problem of being unable to eliminate errors, and the justification of “high risk” in the beginning is a really vague thing to talk about.
Today’s readings about COMPAS, a “risk assessment” tool used in the US criminal legal system, were fascinating. Their analysis of a non-human tool used to set human penalties is a fascinating analysis of when we take technology too far. In addition to sentencing, I think it is important to remember that the whole process of criminal justice in the US is problematic. From police brutality to jury/judge implicit biases and of course, a history of laws in this country set up to discriminate against minority groups already makes the Justice system extremely error-prone and biased. The COMPAS system does not only mirror many of these facets but as we learned, also extends them. As discussed in previous classes, the objectivity of human matters can lead to inaccurate reading and results without being able to evaluate the human elements that often persist. The COMPAS system is unable, as explicitly described, to measure human change and circumstance. Additionally, the COMPAS system is unable to see things through a human lens and have the compassion to make a proper ruling.
While accuracy numbers may be above 50%, there was evidence of racial bias, and a difference of even a tenth of a percentage point in the accuracies means someone’s life was wrongfully changed by a machine. So while we may traditionally view the ~60% accuracies to be quite good, adding in the context of where and under which circumstances these inaccuracies occur in regard to sentencing and bond rates makes this system seem horribly wrong.
The COMPAS system is an excellent example of perceived machine superiority. We choose to blindly trust machines, but taking the justice system for example, removing context from the areas in which we automate can lead to extremely human problems.
I think the whole idea of the COMPAS algorithm is weird. There are so many factors that contribute to whether or not someone is going to reoffend so I don’t think it’s possible to get an accurate picture of what makes everyone reoffend because everyone is so different. For example, one person might have a more supportive family, and find some faith in religion that inspires them not to re-offend. Or that person could have mental issues that make them re-offend. Since it’s so hard to incorporate ethics into the code, I don’t think it should be used. Especially since this algorithm is clearly biased and rates Black people at a higher rate than any other race. It’s just giving legal officials another reason to support their racial biases because this algorithm is supposed to justify someone being convicted or not. This code is literally encoded bias based on racial inequality history. The only people gaining from COMPAS are non-black groups which makes sense because white people own the prisons and make money off of Black people who go to prison. This algorithm is another way to ensure that Black people will always be in prison and white people profit from their suffering.
The readings all prove that the COMPAS algorithm is extremely flawed. Although human error in the justice system produces a lot of the same mistakes when it comes to incarceration, what the algorithm should be there for is to reduce human, error, not reinforce it. Once again, the flawed nature of this algorithm has a lot to do with how the coding professionals in the company that made it did not approach their work from the proper intersectional lens, because they purposely omitted race from their algorithm to make it “fair”. But by doing so, they made it unfair. The algorithm needs to take race into account because often race issues work in tandem with other topics that the algorithm considers, which are social class and educational backgrounds. While it may be unintentional that the COMPAS algorithm contains encoded bias, the company that made it failed to broaden its scope of perspectives when it came to testing the algorithm. According to the three articles, it is stated that testing was performed by those who made it, which is not enough. How can you ever produce an algorithm that is “unbiased” and effective for the people that it affects if you do not get it tested by those very people first? The answer is that you cannot.
In his paper ‘This Is Not A Minority Report’, Assistant Professor of Digital Media Theory at The New School Joshua Scannell posits that crime only exists in relation to policing. In the USA, poor communities have been shown to be overpoliced in comparison to richer neighbourhoods. Since lower socio-economic strata in the USA are disproportionately comprised of black and brown individuals, it then follows that these individuals are overpoliced in comparison to their white counterparts. Obviously, if a system looks for crime in a place more than the other, they will find more incidences of the same in the former location. This is data, then, is what is used to train technology like COMPAS. So, such predictive technologies are only capable of predicting policing of recidivism, not of predicting who would actually reoffend. As is stated in the MIT Technology Review article, not only does this system codify historical systemic issues in policing, but it also privatizes the justice process and makes encoded biases interrogatable.
One of the more interesting aspects about these algorithms is the ethical dilemma of Internal Review, which we talked about in class last week. The analysis of COMPAS from ProPublica mentions that when these algorithms were first made available and adopted by local justice systems across the country, the data to support their efficacy was severely limited, or in some cases non-existent. The fact that much of the research conducted on the effectiveness and bias of the systems was conducted by the authors offers a clear example of conflict of interest. Even if the authors of such algorithms maintained a strictly unbiased implementation, it would seem unbecoming to avoid external validation. We see this a lot in software development: the author’s likely worked hard developing this algorithm; they likely felt it was fair and would improve society; but in that way they would be inevitably blinded. It’s the same reason why we don’t allow doctors to operate on their children; that is, proximity to the product fundamentally alters the ability to criticize it and find fault. Until a product like a sentencing algorithm has been vigorously reviewed by individuals and and organizations not involved with its production, it should not be available for adoption. Unfortunately, because this seemingly obvious procedure was not followed, the software has gained some popularity, and despite new evidence to its ineffectiveness, it may be too late to undo or fix the harm it has undoubtedly caused.
The use of algorithmic risk assessments in the criminal justice system presents a deeply concerning conundrum. Algorithms, by design, are programmed to identify patterns in vast amounts of data; however, the fairness and impartiality of their output is fundamentally tied to the quality and neutrality of the data they are trained on. The inherent biases in the data—stemming from years of systemic racial and socio-economic discrimination—can inadvertently be amplified by these algorithms. Thus, rather than rectifying the existing prejudices in the system, there is a risk of entrenching them further. Northpointe’s refusal to publicly disclose their algorithm’s inner workings is particularly troubling.
Furthermore, while the promise of automating certain aspects of the judicial process might seem like an appealing solution to human biases and the overburdened penal system, it is essential to treat these tools with caution. The story of Brisha Borden and Vernon Prater underscores the dangerous implications of algorithmic errors. As computer scientists, we must be wary of oversimplifying complex human behaviors and decisions into mere numbers or categories. Instead, there should be a continuous feedback mechanism where the outcomes of algorithmic decisions are constantly evaluated and the algorithms refined. It is crucial to remember that while algorithms can aid the process, the human element—understanding, compassion, and judgment—should not be sidelined.
I think COMPAS emphasizes the trickle down of the black box effect to wider society as described in last week’s readings. The people who design these models may not even really understand how they work in their entirety, but conclude that they work anyways. They then sell their model as a product, and those even less equipped to understand the intricacies and dangers of what they are buying trust in those that supposedly understand them best. We also witness the same “code your way out of it” attitude with the application of COMPAS in Broward County. To address jail overcrowding, a systemic issue, an algorithm that is clearly biased itself is introduced. The more research I did, the more articles I found from recent years that claimed that such algorithms had exceeded human ability in predicting that someone would reoffend. An article from UC Berkeley I read cited a California study that claimed COMPAS correctly predicted recidivism 89% of the time, a figure far closer to the Blackstone ratio. Predictably, no mention of the racial biases present (at least at one time) in the algorithm were made.
Under all of this, I have strong feelings of anxiety towards the parameterization and mathematization of human action and behavior. We trust humans (mostly) to make these decisions, and it isn’t unreasonable for me to believe an algorithm may be just as good or better than a human at predicting recidivism as long as it is held accountable for its biases. Even those that are very good at such predictions make errors that have serious consequences and are stumped by the irrational and complex decision-making of someone other than themselves. It is deeply unnerving to me that the complexities and tells of humans may be lost on other humans, but not to something made by humans.
Article link: https://news.berkeley.edu/2020/02/14/algorithms-are-better-than-people-in-predicting-recidivism-study-says
I think COMPASS has a serious problem with the function. The program should help humans streamline the work they do, but it should not reinforce biases and discrimination toward black people. Human judgment should be weighted first, then the outcomes of programs should be evaluated. To address the serious issue of bias and discrimination within COMPASS, it is imperative that a comprehensive review and revision of its algorithms and decision-making processes take place. Furthermore, we need to understand that it is impossible to create “perfect” programs that can replace human judgment. Programs are just tools, they help humans, but they only do what humans ordered. Over-reliance on programs can lead to a disconnect from the nuanced, context-dependent decisions that humans are often better to make.
I feel as though this is far too subjective of a situation to leave to artificial intelligence. It is cool to think that we could have a robot that is so well-trained that it outperforms a judge, but that is clearly not the case here. It feels like we’re trying to make life easier for the Judge and in return making life a lot harder for minorities who already have the deck stacked against them in court. I would love it if there were a way to like actually have a robot eliminate the societal bias present in the courtroom. I don’t see that happening with something like this that is trained off of data collected from our society, data which reflects the very biases that we want to free ourselves of. Yet, if we want to accurately predict recidivism, we need to look at what actually happens in our society in terms of recidivism. So we’re kind of in an impossible situation here, the solution is probably not an algorithmic one. I think it’s possible that the problem isn’t really even with the algorithm. It’s also not a problem with the data, it’s a problem with what the data is showing us. A society with a lot of issues that often push marginalized groups into very difficult situations.
Reading the ProPublica article had me wondering if someone has done a similar analysis on how well Parole Boards and other human-based institutions do at predicting this repeat offending. These models are pretty much as bad as you can get (I’m genuinely confused about how they can perform so poorly) and I wonder how that compares to human bias. Especially since the model must have “gotten it” from somewhere and that seems likely to be historical sentencing information laced with systemic racism and as the MIT article mentioned, the racially-influenced likelihood of rearrest. Ideally, it seems like removing opportunities for judicial bias is a good thing, although that’s what mandatory minimums do and they’re horrid, plus these systems are just relearning our bias so even a perfectly “unbiased” system would just be in line with the general bias of society and reflect America’s inequal rates of policing and arrests. And even if we theoretically had a model that predicts “perfectly,” then it would just be a self-fulfilling prophecy as those who are “guaranteed” to reoffend are locked up and denied limited rehab.
I’m also wondering about the questions. How do you even measure “ties to the community.” I would guess the questionnaire was written with a view of an “upstanding citizen” in mind, one that probably looked quite similar to the creators of these tools. This is also related to what they mentioned at the end of the ProPublica article where jobs often have a “check here” type question about criminal history, not allowing for nuance.
My general thoughts are that companies not releasing their models is incredibly worrisome and that generally if we’re going to do machine-determined-ethics (which I’m not convinced we should), this needs to be as transparent as possible and with input from as widespread a group as possible. Also ideally it wouldn’t be a company because I really don’t trust the motives there. People being told what they / their voters want to hear is a powerful force.
Sorry, it looks like my formatting didn’t get copied over
I think it is difficult to use statistics in this type of situation, and though at first glance the idea that using some sort of algorithm could possibly reduce some sort of personal bias that judges have seems somewhat worth investigating. But, this is based on a view that holds statistics, based on crime, which as Tanmaie mentioned, is based on policing, as somehow morally neutral or unbiased. This idea that data is somehow free of the biases and systems it is obtained under seems to appear fairly frequently across discussions of ethics and computing. It furthers the false idea of the morally neutral, unbiased researcher, and depends on the judged “truth” or “reality” that these statistics supposedly represent. Higher rates of policing in Black communities, along with countless other systemic factors inform these statistics. Crime statistics do not show who is really doing “wrong”, they show something more like who the system would like to punish?
I have read the first article before as part of a statistics course, so it’s really interesting to read more about the context and the analysis of the algorithm from different perspectives. I think that there are certain circumstances where technology does more harm than it helps people, and this is a perfect example of that. As discussed in the last few readings, computer science, machine learning and data science are known to be very rigid, rational and inflexible. In these fields, optimization, efficiency and accuracy are highly prioritized, and computer programmers and scientists have to constantly make trade-offs between that and equity. In this case, creators of the algorithm opted for high predictive accuracy, but that meant training the model using historical datasets that are problematic and deeply rooted in systemic racism. Furthermore, the lack of checks and balances when it comes to testing these algorithms raises a lot of concerns about its validity and ethics. I think that all algorithms, especially ones that have social and ethical implications, should be tested on and by the same group of people that they are bound to affect. No matter how the testing is conducted, the process should be supervised or documented; otherwise, it should not be used in the legal system where people’s lives are at risk.
These articles are a depressing reminder of what we have discussed before: our algorithms carry internalized biases within them, and those result in very real consequences. At first it’s difficult to imagine why we would even use predictive crime algorithms that consistently got things wrong. But our pre-existing justice system already did that, so instead of handing the work over to a program that is unbiased and doesn’t carry the weight of America’s institutionalized racism, we’re handing the work over to a program that also internalized that racism. It doesn’t need to change the justice system, rather it just needs to help maintain it, as flawed as it is. It’s incredibly unsurprising that Northpointe was co-founded by someone who ran a prison, who would absolutely benefit from an algorithm that determined people’s jail time and parole eligibility. I understand the difficulties of creating an absolutely unbiased algorithm, but the fact that Northpointe doesn’t reveal how COMPAS operates, and that it would be implemented in multiple states anyways, is inexcusable. If the technology isn’t ready, it shouldn’t be used, as it’s affecting lives in very serious ways. The justice system is already defective with people in charge, but that doesn’t mean that we should start implementing algorithms from private institutions that maintain those defects.
While reading the articles, I couldn’t stop thinking about how much we depend on technology. It goes far beyond simply using our phones and navigating the internet; algorithms are present in almost every aspect of our lives. What’s unsettling is how flawed they can be. The COMPASS algorithm serves as a perfect example of this. To the best of my knowledge, in my country, we don’t have any risk assessment algorithms that are admissible in court. Therefore, it was a big surprise to come across an algorithm that can influence a judge’s decision on whether a defendant should be released until trial or held in custody until their trial. The way this algorithm operates is based on a series of questions. However, it’s incredibly challenging to predict something solely based on personal answers, especially when utilizing data generated during times of discrimination, as noted by Princeton professor Ruha Benjamin. Furthermore, it’s essential to recognize that fairness depends on the context of a situation, yet the algorithm employs generic questions to assess how potentially dangerous an individual might be, disregarding much of that individual’s background. It’s disconcerting to contemplate how many flawed algorithms are in use, but what’s even more alarming is how we often unquestioningly trust their outcomes.
I personally have no faith in the criminal justice system, and I definitely do not have faith in artificial intelligence to dictate complicated phenomena like recidivism. I think when the legitimacy of the highest court in the country has been called into question for the past 2 decades and several states still implement the death penalty despite dozens of false convictions, the law has become a tool to segregate, discriminate against, and penalize marginalized groups. Machine Bias aptly highlights that Black people are viewed as more dangerous, or “high risk,” despite having “lesser” offenses. This is not surprising to me at all.
The correlation between violence and crime has more to do with class (https://revisesociology.com/2016/10/30/social-class-and-crime/), but race and class are inextricably linked because a variety of factors (i.e. racial capitalism). I think it’s also worth unpacking what we consider “violent” crime, and what kind of violence we’re talking about. I think most people consider direct violence, like assault or burglary to be more violent than tax evasion or fraud. But I think it might be helpful to think about violence in terms of scale rather than subjective severity.
Overall, I think the very threat of recidivism could be mitigated if the prisons and jails earnestly valued rehabilitation.
Risk assessment programs just extend the bias and failure already present in our justice system, and in addition fail to consider how various circumstances and variables weigh differently depending on one’s situation. Not only are the data fed into the programs already biased due to the fact that police disproportionately arrest black people, the questions that help to determine the risk assessment scores are quite vague. The survey directions seem to downplay how important these answers are to your score—even though some judges wrongly use it in sentencing—as if it’s just some personality test you can take on the internet. Questions that ask someone to rank a statement like “A hungry person has a right to steal” are weird to me due to their wide range of interpretation. Yet these questions that can be misinterpreted or answered not knowing the gravity of the situation are fed into some calculation—that we don’t even know because the companies don’t want to show us—disregarding all of that. Risk assessment programs just feel like one more way to enforce the racism embedded into the justice system under the guise of some “unbiased” computer calculation.
There is quite a lot of statistical information within these readings, but truthfully, I had no idea that there was an algorithm that produced a risk factor index for defendants. I noticed that these articles are from 2016, and I am interested if there has been any change in the use of these algorithms despite the analysis that has been done to understand the issue. I also would like to question the focus on the “truthfulness” aspect of these algorithms. There was some mention in the “Machine Bias” article about how a job can drastically impact the outcome of these algorithms, and I find it funny that these articles seem to focus more on the issue of poorly being able to identify re-offenders and rather that it could be an issue that Black life experience is more likely to be criminalized in the U.S. The U.S. has prevented the growth of wealth in Black communities through a variety of different factors, and we have also defunded and restricted access to support for Black communities in many cases. If we restrict access to support, create policies that criminalize specific types of behavior more severely, and police these communities more heavily, I think we are bound to see an algorithm created to identify re-arrests to perpetuate these stereotypes. I think it is interesting that you can see in the MIT data visualizations that white defendants have a much lower re-arrest rate when comparing risk values of 3 and 7 whereas Black defendants have a value about equal at 3 and 7. I feel like you can see from this data that the COMPAS algorithm has a very difficult time determining the rationale behind the re-arrest of Black people while it has a much clearer picture of why a white person might be re-arrested. I feel like it is almost a commentary on our own criminalization of Black life.
ProPublica’s analysis of racial bias in the Northpointe COMPAS algorithm is a damning result for the use of predictive analysis algorithms in the criminal justice system. It’s clear from their results that although the model has moderate suggestive accuracy for recidivism, it disproportionately rates Black defendants as higher risk than White defendants when controlled for other factors.
The analysis also mentions how women have a higher risk of receiving a high risk assessment than men when controlled for other factors. What I felt was missing here was an intersectional analysis: I wonder if Black women like Brisha Borden are given disproportionately higher scores beyond just Black defendants or just women defendants.
However, the most surprising part of ProPublica’s article was the quote from Northpointe founder Tim Brennan on allowing his software to be used in courts:
“‘I wanted to stay away from the courts,’ Brennan said, explaining that his focus was on reducing crime rather than punishment. ‘But as time went on I started realizing that so many decisions are made, you know, in the courts. So I gradually softened on whether this could be used in the courts or not.'”
To me, it sounds like Brennan is admitting here that he originally did not want his software to determine punishment since it could be inaccurate, but then changed his mind when he realized there was a business opportunity to expand COMPASS into the court room.
From ProPublica’s analysis and Brennan’s admission, it seems clear to me that the main question we should pose to lawmakers is not whether this particular algorithm is effective or not, but whether it is ethical for any scoring algorithm to be used in this domain at all.
These readings, which centered on COMPAS, provide further evidence that as appealing as the idea of letting machines make decisions because humans are biased may be, in the end, machines reflect and magnify the biases fed into them. COMPAS asserted that it did not explicitly use race as a factor, but many of the markers it did use—for example, having a degree—are closely linked to broader systems of oppression that are absolutely racialized; in this case, people of color, including Black people, as the articles focus on, were historically excluded from higher education in the US. This is not to say no Black people attend college, nor is it to say all white people do, only to say that systemic oppression makes this task easier for one group than another. Similar logic applies to the criteria of having a job, which especially in the age of automating away “low-skill” labor is increasingly tied to the already stated criteria of degree-holding. Therefore, even though race isn’t explicitly collected for, it still has a deep impact on COMPAS’ output.
Though the assessment’s creator claims “it is difficult to construct a score that doesn’t include items that can be correlated with race — such as poverty, joblessness and social marginalization [and] ‘If those are omitted from your risk assessment, accuracy goes down’”, it is naive at best and malicious at worst to assume this absolves you, both as the creator and as the tool, from the racist outcomes that assumes the social conditions that surround non-whiteness, more explicitly, Blackness, as discussed in the reading, are proxies for criminality. To acknowledge this difficulty and claim that the decisions made were for accuracy purposes, even with the evidence that the tool as it stands is not particularly accurate and as a result causing undue harm to Black individuals and undue leniency to white ones, is not a way to explain away the problem and mark it as inevitable; it is a shameless, remorseless admission of guilt.
These 3 readings are quite tied together and make an interesting question. With so many different combinations of intersectionality, is it even possible to make an unbias AI or technology where everyone can be satisfied? There are so many different levels of understanding a person and their concerns that it seems quite impossible to figure out whether these biases are correct or not. In “Machine bias”, it makes sense to be able to figure out how we could find the bias of why black people are rated more at a higher risk compared to white people. However, when we think about the bigger picture of racism, sexism, ableism, and so on, I do find it much harder to figure out what is biased and unbiased when so many people have different identities. Rather than focusing on the idea of how we could use technology to improve life currently, I feel like we should focus more on the idea of what we should do in society and how we could help it. I feel like we should rather focus on the problems without technology first and then implement it afterward. When we are focused on working with technology first, this will be the first problem we will get, and rather would like to think about it first before making AI or any sort for it.
The article raises important concerns about bias in algorithmic risk assessment tools used in criminal justice. The analysis of COMPAS exposes troubling racial disparities – black defendants face higher risk scores and false positives compared to similar white defendants. This compounds existing inequalities. Even controlling for criminal history, the algorithm is more likely to label blacks higher risk.
The core issue is that these tools rely on flawed data that reflects racial bias in policing, arrests, and incarceration rates. So the output perpetuates that bias. The algorithm is not explicitly factoring in race, but is still discriminatory. The company’s defense about overall accuracy misses the point – equality under the law is a higher standard. Even moderately higher error rates for blacks have serious consequences in sentencing and parole.
More transparency and external auditing of proprietary tools like COMPAS is needed. But we must be cautious of technological solutions to social problems. Removing bias requires addressing systemic racism and over-policing of minority communities. While risk assessment tools are not inherently unjust, current implementations reinforce inequality. Careful oversight and suspicion of automated decision-making is warranted. We cannot outsource moral choices to black-box algorithms. Accountability and due process should come first.