In Scramble to respond to Covid-19, hospitals turned to models with high risk of bias, by Elise Reuter. (https://medcitynews.com/2021/04/in-scramble-to-respond-to-covid-19-hospitals-turned-to-models-with-high-risk-of-bias/)
More than a Glitch: confronting race, gender, and ability bias in tech by Meredith Broussard. Chapter 8 (https://web.s.ebscohost.com/ehost/ebookviewer/ebook/bmxlYmtfXzMzMDkwNzNfX0FO0?sid=7cc4f64f-efb3-4910-bfa8-1d874c2f538f@redis&vid=0&format=EB&lpid=lp_135&rid=0)
In Elise Reuter’s article on the use of biased or untested algorithms in hospitals, the author discusses a number of the concerns regarding algorithms that have been analyzed throughout this class. From data that is inherently biased to assumptions that these algorithms will work, there are a number of issues with how algorithms are implemented, perceived, and deployed. Many of the healthcare facilities discussed in the article implemented patient deterioration prediction algorithms as a result of the COVID-19 pandemic, when health resources were strained. Particularly in the beginning of the pandemic, little was known about the specifics of the disease, how different populations would react to the disease, and the typical course of care for patients infected with the disease. Thus, a number of healthcare facilities opted to use technology to assist their decision making process, given that the decision process was particularly difficult for healthcare providers during the pandemic.
As already discussed, human biases toward machines exist in a number of forms, from a readiness to accept machines as conscious with the Eliza effect to a tendency to view machines as objective with automation bias. These biases can encourage those with good intentions to use biased algorithms that produce worse results for the people they serve. However, the use of algorithms by those with overtly negative intentions should be considered as well. For example, institutions faced with difficult decisions may turn to algorithms to help the decision making process, like the hospitals discussed in the article. But, these institutions may use algorithms as a way of intentionally outsourcing the decision making process, or worse, use algorithms as a means of obtaining plausible deniability in their decision making. With the plausible deniability of using an algorithm for decision making instead of making decisions themselves, an institution may find it easier to make decisions when they have stark negative consequences.
There are some connections to be drawn between these two readings. The part about the Data being biased in medcitynews article is reflected in the part where Broussard struggles to even have her race properly represented on her chart. Some bias in the data comes from simply the methods through which the data is collected, though there are a lot of other sources of bias for the data as well. The bit about the effects of skin color on registering skin cancer is similar to the effect that skin color had on wearable tech as well, the straight up physically defined (as opposed to socially defined) differences that arise in handling people with different skin tones. However, in both cases, the resulting difference in outcome around skin cancer detection rates and the ability to read heart rate with wearable tech comes from how doctors and developers handle these physically defined differences so that part could be more socially or behaviorally defined. The doctors should just simply know what skin cancer looks like in different demographic groups, that’s their job. I, as a person with white skin who receives lots of sun exposure, pay my dermatologist to pay attention to what’s happening on my skin and identify what’s going on with it. The same service should be expected for any client regardless of skin tone. If the doctor can only identify skin cancer on white skin then their education cannot be considered complete.
The first reading talks about the risk model to determine who can stay in the ICU bed in the hospital due to the lack of beds. Starting from there, the article mainly discusses whether we can unbiasedly determine the distribution of medical resources. The algorithm might become an excuse for not getting patients and getting rid of the responsibility of people and hospitals. Besides, the prediction may create anxiety for doctors and nurses as well as for the potential patients who do not yet have the symptoms of diseases, as the author mentions that “you could see how quickly the nurses and care managers who were running this program were overwhelmed. When you had thousands of patients who tested positive, how could you contact all of them?” It is hard to determine what is the more important factor. In other words, how to determine the system fairly since fairness is different for different people.
The second article focuses on the racial problem in health informatics. In the reading, the author was both black and white due to the parental skin colors. However, the doctor asks her to only choose one. This further describes the racial issue as the treatment of black and white are different and some times the machine or AI limit the possibility which leads to the trouble towards varies races of patients.
During the COVID-19 pandemic, many hospitals turned to automated tools and machine learning to help make decisions due to limited resources. However, experts raised concerns due to the absence of peer-reviewed research and regulatory oversight. Despite the urgency during the pandemic, experts cautioned against relying solely on algorithms for critical decisions. They emphasized the importance of careful consideration and evaluation, pointing out that models could be unreliable and biased based on input data.
Concerns about bias in healthcare systems are not new, as highlighted by historical examples discussed by Broussard in her reading.
Establishing a standard for implementing a non-biased healthcare system, which includes machine learning tools in hospitals, is crucial to preventing harm to individuals caused by biased outputs.
A pandemic naturally could cause organizations to adopt new procedures haphazardly, and clearly some hospitals felt the pressure to adapt too quickly. Once again we see an example of the need for constant ethical consideration, and the potential issues when this is neglected. The article also hinted at the conflicting ideas of proprietary technology and peer review, which has come up in some of our discussions. Given that many of the algorithms that rule our society are proprietary, it is impossible to know for sure how they function, and often we have very limited information on the inner workings. This, combined with loose regulation on the application of technology, especially in medicine, opens the door for buggy, biased, and poorly tested tech to become a central figure in important decision making.
There is a quote from More Than a Glitch chapter 8 that sticks out to me as something that well encapsulates a lot of our discussions in this course as they relate to algorithms, to systems, and now, to models: “It is clear that reproducing existing diagnostic processes in the digital world is not the answer. But how could we build tech that is both ethical and accurate and takes advantage of any digital gains while not perpetuating racist or sexist or ableist consequences?”. The chapter goes on to discuss some of the general propositions towards how we might resolve it, which of course, feature an examination of different types of biases and how they might be corrected. Here, there is another quote that sticks, “Kandabi recommends looking at different types of bias and deciding what is an acceptable threshold, which is a popular idea among computer scientists but not among the people affected”. This is a pithy, yet poignant way to articulate that we don’t magically become ethical by encoding a little less racism, a little less sexism, a little less ableism, a little less bias: while we knowingly and intentionally allow any “acceptable threshold” for this, it’s really just performative and still treats certain lives as less worthy of justice and of joy: there is no “acceptable threshold” for oppression. What Broussard instead proposes is breaking our collective technocentricism: if the technology cannot be made fair, it should not be at all. One final point I really liked that was brought up is part and parcel with academic elitism: confusing expertise somewhere with expertise everywhere. Academics, and individuals in general, often have great expertise in one thing, but not in all things: presumption that knowledge always extends is how we approach problems that don’t exist with solutions no one wanted, and sometimes, these solutions cause direct harm. It is this idea that informs my personal research philosophy of codesign whenever possible, and at least consultation with the true experts along the way when that cannot be done.
After the readings we’ve done in this class, everything about the first article seemed like a recipe for disaster. A bunch of new, unregulated algorithms running in an already biased system that affect real world decisions that can certainly lead to death. All of this also happened at a time where the healthcare system was completely overloaded and panicky, which can only serve to bury the issues caused by implementing such systems. I can definitely see how such algorithms can be a help, especially when healthcare systems are stressed during exceptional events such as pandemics, and even though it’s unlikely another one should strike soon, I think these tools are worth investigating and discovering their effectiveness. They’re useless, though, if they perform differently across different demographics, and make ethnicity a comorbidity in itself. Unfortunately, we just flat out don’t know if they are biased in this way.
I was a little disappointed that Broussard felt that their encounter with the obstetric receptionist was not their “best moment.” Maybe it was uncomfortable at that very moment, but people should definitely not feel bad for insisting that the systems designed to help them reflect the real world. I was also struck by the importance of history in evaluating the systems around me. I never really have to consider historical failures of the healthcare system for people like me when visiting a hospital or doctor. People of color must reckon with the fact that only fifty or so years ago, horrible experiments like the Tuskegee Experiment were running, treating Black people as expendable test subjects and never offering them treatment that was readily available.
Today’s readings focused on how automated systems that the American medical industry implemented are extremely biased toward people with darker skin, particularly Black people. A point that was particularly interesting was when the first reading, “In Scramble to respond to Covid-19, hospitals turned to models with a high risk of bias”, mentioned how any technological system that the American medical industry implements will be biased. This is because of “how people access and interact with health systems in the U.S. is fundamentally unequal” (Reuter).
There has been consideration for ‘corrective’ measures to be implemented when calculating certain data when it comes to black patients. But, what this does is perpetuate the White Supremacist mindset that Black people are unequal, and less-than. This ‘solution’ is extremely harmful, and reinforces the claim that I mentioned earlier from the first reading. Demographics are not properly and accurately measured when it comes to healthcare, whether that is due to mistrust in the system or a lack of insurance on the patient’s side. On the health system’s side, there is often a lack of ample resources to properly log a person’s identity, like how the author of More than a Glitch had to go through extra steps to get on her chart that she was mixed.
After these 2 readings, I feel like the readings are becoming more common and not as surprising to me. We see that there is the lack of data that matches with society right now as well as the lack of concern in several groups and only focusing on the main group that is getting affected by the problems. People are still not looking into the problems that lie in society that are affected subtly and more focused on people that are more common. This is such a concern as we as a society have no awareness to the problems that lie in our society and how we have no concern to change that. These things really show in our data as we still have biases and tend to ignore disadvantaged groups consistently. Even in our second reading, ‘More Than a Glitch’, we see that the author coming from a black and white background still have disadvantages as well as lack of care in hospitals. They do not geniunely care about the author’s dual ethnicity as it seems too complicated to them as well as they don’t want to put the extra effort. It is also very disappointing to what how “black men with syphilis were deliberately denied life-saving medical treatment so researchers could observe exactly how the disease destroys a person’s body”. I find this exetremly shocking and disgusting to hear as we see that they are almost being treated like lab rats. This is not a matter of the data being bias or algorithms not working, but purely just terrible treatment as they ignore the care of disadvantaged groups and only use them for their advantage, as they have no care for them as well as using this as an opportunity to learn more for their own benefit rather than actually helping those in need.
In the article written by Elise Reuter, several key points resonate with me: Firstly, The high risk of bias in the COVID-19 risk models points to a fundamental challenge in machine learning: ensuring data quality and representativeness. In healthcare, where decisions can be life-altering, the importance of diverse, comprehensive, and accurate datasets is paramount.
Secondly, the potential biases in these algorithms, especially when not tested across diverse demographics, reflect the ongoing challenge in AI of preventing and mitigating biases. It highlights the ethical considerations in AI development, necessitating transparency, accountability, and inclusivity in algorithm design and implementation. Thirdly, the retrospective nature of most model evaluations raises concerns about their efficacy in real-time decision-making. This emphasizes the need for ongoing, real-time evaluation and adjustment of models to ensure their relevance and accuracy.
This follows along very consistently with a lot of the things we’ve already discussed quite frequently in this class in the sense that removing context and humanity from the decision making process, especially related to important decisions (healthcare, jailing, housing) where there are already entrenched, systematic biases that are going to be reflected in the data is not a good idea. I was really astounded with how Stanford’s medicine department could write an algorithm that did the exact opposite of what was intended. It seems like just making a randomized list in excel and then sorting it by “in-person” vs “remote” would have been much better, so this is another situation where I would pay good money to see the people in the room where the model was developed and read the code they wrote. I’m glad some hospital administrators are becoming more aware of this issue, but the problem remains that in unexpected and stressful situations people are forced to turn to untested systems to dole out their limited resources. The core issue here is that America as a society doesn’t view investing in healthcare and housing as “national defense” when I’d argue it helps Americans much more than floating a 100,000 ton city of death outside a developing country so that they give us better deals on oil.
The main topic of today’s two articles concerns what has already been brought up a lot in previous class discussions and readings. Training a model or an algorithm is very complicated and problematic, especially when there’s a high likelihood of the input data containing biases. Besides that, there are also a lot of sources of biases during the data collection and processing stages that might get in the way of efficiency and equity. It’s also hard to encode aspects about identities into a computer program or a machine learning model because the categorization of variable values rarely ever reflect reality; for example, the few categories of race do not necessarily accommodate dual race or ethnicity.
In the first reading, I believe that the situation was a lot more complicated because the medical industry was in a very precarious situation with a severe lack of medical resources. We as a society were then faced with a disease that we knew little about. There was a lack of representative training data that were applicable, and there was a lack of knowledge about the virus and how differently it impacted different groups of people. I believe that this is a situation where relying on technology to help might end up costing more time and financial resources.
Today’s readings discussed the use of AI tools and algorithms in the medical system and how they are prone to racial bias. The first reading talks about how different medical institutions used models to determine who could get an ICU during the COVID pandemic. In the article, Reuter argues that algorithms cannot be used to make tough decisions because, as is well known, the data those models would use contains biases, and inherently, their results would end up being biased. Reuter also discusses the lack of regulation when it comes to developing machine learning tools, even describing it as the ‘wild, wild west.’ He points out that currently, the FDA doesn’t require software that provides diagnosis or treatment recommendations to be regulated as a medical device. Additionally, Reuter mentions how developers should ask themselves if the communities they are serving have a say in how the system is built or if the system is even necessary.
This final idea connects well to some of the arguments made by Broussard in chapter 8 of ‘More than a Glitch.’ In this chapter, Broussard discusses different scenarios where racial biases and discrimination are seen in the medical health system, especially when it comes down to systems built to diagnose. For example, the Google machine that examines skin and can diagnose is largely trained using images of light-toned skin, while the number of images of darker skin tones is scarce. At one point in the reading, Broussard talks about how computer scientists think they know everything and ignore the context behind many of the problems they try to solve. In reality, their priority should be to listen and act based on the necessities of the community they are serving.
Elise Reuter’s article addresses concerns about biased or untested algorithms in healthcare, echoing class discussions on issues from biased data to assumptions about algorithmic effectiveness. The adoption of patient deterioration prediction algorithms during the COVID-19 pandemic, driven by resource constraints, is highlighted. As discussed, human biases towards machines, from the Eliza effect to automation bias, can lead to biased algorithms with unintended consequences. Moreover, institutions facing tough decisions might strategically use algorithms to outsource decision-making or gain plausible deniability.
Connecting to Andrew Nickeson’s comment, parallels emerge between the medcitynews article and Broussard’s struggles with racial representation in health charts. Both underscore biases in data collection and the impact of physical differences, like skin color, on healthcare outcomes. Wearable tech faces similar challenges, with skin color influencing accuracy. The resulting disparities stem not only from physically defined differences but also from how doctors and developers handle them. The expectation is for comprehensive education to ensure doctors can identify health issues across diverse demographic groups, emphasizing the need for equitable healthcare for all.
The chapter from “More Than A Glitch” reminded me of a podcast I recently listened to called “The Retrievals”. The podcast follows the experiences of fertility patients who underwent egg retrieval procedures without anaesthetic, as a nurse at the facility was stealing morphine (I think thats what they used but I am not totally sure), and replacing it with water. Most of the women described the ways in which their doctors and nurses failed to recognize their extreme (abnormal) pain, and projected a cruel disbelief of the pain they were desperately trying to assert. In this podcast, there was little discussion of the ways in which race and others aspects of identity could or did effect this experience. Based on the chapter, it is highly likely that Black women faced especially cruel treatment / denial of their symptoms. The bias of medical staff has dangerious, potential fatal consequences for all women, but the effect is particularly harmful to Black women, and other groups of women (for example fat women) who are less likely to be believed about their own health, pain, feelings, etc.
The chapter also touched on the training data used for AI tools like Google’s skin/hair/nails diagnostic tool and how it disproportionately represented lighter skin tones, with only 3.5% of images coming from folks with darker skin. Overall, this chapter reiterated some of the issues with the implementation and use of AI technologies we have seen in our previous models.
I just find it crazy how organizations will use newly developed algorithms that weren’t built equitably/failed to execute what they were built for (Stanford) or produce applications that have large public health implications but don’t prioritize them. An example of the latter would be Google stating they built the skin app to get more searches (and thus more money) rather than a primary goal to prioritize accurate diagnoses across all people. At the end of the “Diagnosing Racism” chapter, Broussard reiterated what we’ve often discussed in class: are machines and algorithms always really needed in certain situations (especially if they’re hindering more than helping)? Furthermore, even if they help in assisting humans to make our jobs easier, they shouldn’t be a “proxy for tough decisions” (as Reuter says); this just leads to lack of accountability. Kind of unrelated, the line in Reuter’s article that read “Among the hospitals that published efficacy data, some models were only evaluated through retrospective studies” reminds me of pm’s point of trying to have a pre-mortem rather than just a post-mortem approach when designing algorithms or any technology. It’s especially crucial in the field of healthcare if algorithms are here to stay since you’re dealing with people’s lives.
I think talking about the algorithms that were being used to determine care for COVID-19 patients is such an interesting topic because I think we all, having experienced the stress and seen the impact on healthcare workers, can see how such systems were considered without thorough vetting. One of the considerations we have brought up a couple of times is that a temporary decision in an algorithm needs to be just that, temporary. These kinds of system can easily be ingrained into a system without much oversight and have devastating consequences. This isn’t something as simple as a change in policy because these systems are ultimately difficult for the ones using them to understand. At least with a protocol change, many eyes would be on the process, but in an algorithm the systems are obscured to the majority of the people utilizing it.
Talking also about medical biases in general and proper treatment, historical treatment and lack of transparency and honest study of medical studies on the Black population of America have left biases that are pervasive throughout our treatment and diagnosis of diseases. I remember hearing my dad talk about having to override nurses decisions to give someone dialysis because the recommendation was at a higher threshold for black people because they were assumed to have denser muscle fibers. These decisions and biases kill people.
Like so many other examples in this class, major institutions are choosing to implement algorithms that are largely under-tested and unfinished for the sake of efficiency and “fairness”. It’s difficult to wrap my head around why so many are so quick to implement these algorithms when they aren’t even sure if they’re more effective than previous methods. In a situation as delicate as risk assessment during a pandemic, we are putting the responsibility on a program that carries the same cultural biases through its data that humans do, and can read none of the nuance of any individual situation. I understand how stressful, time-consuming, and serious these decisions are, but swapping out the imperfect solution of human decision-making with a potentially worse solution is incredibly confusing. And the lack of regulation from the FDA (or really any large government/medical institution) with regards to machine learning usage in medicine is deeply concerning.
The Broussard reading did an incredibly effective job at showing the cracks in the implementation of AI. Specifically, I found it deeply concerning how often tech companies (and individuals in the industry) insert themselves into worlds they’ve never belonged to. Google rolled out it’s skin AI project with major flaws, especially when it comes to the racial bias of the test cases it was trained on, and didn’t reflect on the larger implications of releasing it. The quotes from Hinton are infuriating; I don’t understand how this guy can just make these wild claims about radiology when he doesn’t have the background to say anything about it. What happens when we keep letting tech companies and individuals in tech, who clearly hold cultural biases, to work in fields they have no right to?
I thought that the anecdote in Broussard’s book about Geoffery Hinton perfectly encapsulates the arrogance of computer scientists in the domain of healthcare. Hinton, a leading researcher of deep learning, famously predicted that radiologists would be completely replaced by machine learning models within five years. The precedent for this prediction was the advancement of machine learning computer vision models, and the idea that the skill of a radiologist to analyze x-ray scans would eventually be surpassed by algorithms, just as computers have long since surpassed humans in chess.
The problem with this prediction is that the work of radiology, though on the surface may seem to be solely concerned with bone and tumor scans, is ultimately about human beings. The data fed to a radiology model, then is deeply connected to the sociology of our healthcare system, and all the racial baggage that Broussard details that accompanies it. That means that any attempt to automate radiology with machine learning will have to address the limitations and racism of the healthcare system head on.
The problem presented in Elise Reuter’s article is an interesting one. In a vacuum, I would argue that any attempt to use machine learning to inform healthcare decisions is a TERRIBLE idea, for all the reasons she listed and more. Not only is the data biased, and therefore any ML algorithms based on it will be biased, but we also need to consider what such algorithms actually optimize, be it something other than patient health.
However, in the face of a shortage of healthcare workers and an extreme surplus of patients that might need critical care, I can see how ML algorithms would be tempting, given how slow consultations can be. When healthcare workers can’t make intelligent decisions about who needs treatment in a reasonable amount of time, one wonders whether it is justified to use flawed algorithms to at least make those decisions quickly so that healthcare workers can focus on treatment.
My intuition is that there is no solution to this problem within the current healthcare system. Perhaps using biased ML algorithms to make healthcare decisions quickly is the lesser of two evils. For the time being, I think that should be the last resort, when hospitals are otherwise unequipped to make enough decisions in a reasonable time-frame, because the technology is too new and too experimental to use in any other situation.
In ‘More than a Glitch: confronting race, gender, and ability bias in tech’ Broussard mentions the idea of expertise and how Kadambi, a computer vision researcher, does not have expertise in the realm of race and bias. This makes me think of how academia, while it may be slowly changing, is not built in a manner that makes it conducive to addressing these types of interdisciplinary issues. While academia is built upon the idea of expertise within a subfield of a subfield, modern day issues require expertise in many different areas of knowledge production. Funding is also often (but not always, yay!) more easily available for domains of knowledge more deeply inured in academia. This is structural issue within the entire enterprise, one that is often almost used as excuse for not dealing with issues where one’s expertise does not lie (because no one’s expertise lies there at all). A lot of reputable journal’s being specific to a domain of knowledge production also disincentivizes interdisciplinary collaboration (but this is changing). I also think of the fact that Grinnell College does not want as many double majors. I realise that this is a result of the lack of infrastructural resources they have to support students that may want to take this path, but the focus on dissuading students instead of (tenure track) hiring and pay for faculty (and also lower level staff) leads to an academic culture wherein we are encouraged to become siloed into the understanding of a single discipline and are, as a result, inadequately prepared to solve issues that plague us.
I didn’t really have anything new to add for the other article, but I appreciate them saying that we cannot use technology to make hard decisions.
I think this idea of the “greater good” pervades a lot of decision making and almost always disproportionally impacts marginalized groups. Efficacy or efficiency should not depend on cutting corners to the point where a substantial amount of the population is neglected.
Reuter’s article reminded me of Liz Rodriguez’s presentation about data, and how there is no such thing as truly raw data because there are social structures that influence how that data gets to you and is presented to you. Further contextualized by Broussard’s chapter on racism, marginalized people harboring negative sentiment towards the healthcare industry is pretty justified. Historically the healthcare system has exploited racial minorities and refused to consider genetic and epi-genetic factors that might change the care they receive.
After our conversation about wearable tech earlier this week, I am understanding why there is such a huge market for alternative ways to understand one’s health and respond to potential issues. So it feels really sinister that these potentially productive ways that racial minorities can reclaim some sort of agency is thwarted by failure to consider how darker skintones might interact with this technology.
Overall, these readings affirm that the reliance on algorithms to make tough decisions to avoid human bias is counterproductive.
The second reading, More than a Glitch: Confronting Race, Gender, and Ability Bias in Tech by Meredith Broussard, brings up a question we have been attempting to answer this whole semester and a question that occurs again in the first reading. “How could we build tech that is both ethical and accurate and takes advantage of any digital gains while not perpetuating racist or sexist or ableist consequences?” (p.131).
Broussard, of course, does not have all the answers to the previous question, but she provides some good ideas. When reading about a field, such as healthcare in the US, that is built upon decades and decades of racism, eugenics, and false concepts of biological assumptions of difference based on race that is now integrating technology that replicates these biases, there seems to be an endless feedback loop. However, Broussard offers an idea that I have been thinking about all semester; she “want[s] to normalize not using technology when the technology is impossible to make fair” (p.132). She simultaneously highlights that experts are not experts in a sweeping fashion despite their efforts to market themselves as such. Broussard simply describes that sometimes we need to ‘stop’. Computer Scientists and other engineers are internally taught the idea that there is always a way to solve and fix, but within a society featuring hundreds of years of racism, sexism, etc. this is not possible. Humans are not perfect, but they are capable of being taught in ways that technology currently cannot be taught through data. As Broussard states, ‘“In a lot of these [healthcare] cases, we must ask: Why use inferior technology to replace capable humans when the humans are already doing a good job?”’ (p.133).