The Tyranny of Metrics by Jerry Z. Muller. Chapters 1,2, and 9 (https://ebookcentral.proquest.com/lib/grinnell-ebooks/detail.action?docID=5214923)
The Tyranny of Metrics by Jerry Z. Muller. Chapters 1,2, and 9 (https://ebookcentral.proquest.com/lib/grinnell-ebooks/detail.action?docID=5214923)
In Jerry Z. Muller’s Tyranny of Metrics, the author discusses how a number of institutions across society, from public to private and non-profit to profit, have fixated on the collection of information as a means of improving accountability and their performance in the pursuit of well defined goals. After defining this notion of metric fixation, the author proceeds to discuss its typical downfalls and how they may look in particular fields, like the practice of medicine. When developing software, the pitfalls of metric fixation are fairly pertinent. After all, the basics of binary computation encourages the collection of discrete data, not continuous data. Not only does this limit the use of computers to instances where quantifiable analysis is possible, it can encourage the culture of metric fixation Muller discusses.
In previous classes, a number of biases toward the use of computers and their results have been analyzed, like automation bias and the Eliza effect. With these biases, people have shown a tendency to favor the use of computers in studying and analyzing a wide range of scenarios. While computers allow for the storage of subjective information through text, some may view the ability for computers to perform complex mathematical analyses at great speed as an invitation to take advantage of this ability. Thus, it is important, as computer scientists, to recognize our training to favor the digitization of information, which can contribute to a culture of metric fixation.
Metric fixation may be an apt description of the college ranking discussion we had earlier in the semester, and the effects associated with it line up. In the reading, Muller mentions the tendency of workers—whose jobs become enumerated—to manipulate their performance in order to satisfy the measurement process. In the realm of college rankings by the US News algorithm, this behavior was seen in institutions gaming the algorithm, like those that inflated their SAT scores or admitted fewer students, or those that increased fundraising and improved their athletic teams. Many schools worked hard to improve only the proxies that they were evaluated on, which—in some cases—caused other issues to fall through the cracks. Metric Fixation is the general effect seen in this scenario. We like to trust numbers because we feel they are objective, but in most cases, numbers can be manipulated. They may not tell the full story. They may be flat out wrong. They almost certainly cannot measure every element of value of whatever they are set to measure. What they do tell us may be incredibly valuable, but like any other data, they must be contextualized and examined, otherwise they may not be as reliable as we would like them to be.
Very early on in this class, it became apparent how numbers and data are often misleading. We learned how numeric measures of improvement often abstract so far from what is really going on behind the scenes. This simplified view of larger issues and problems plaguing our society has extremely harmful consequences. Today reading about the healthcare system and the issues of data collection and metric-based improvement added to this notion. Healthcare is always a hot topic of discussion in the United States especially when paralleling the American Healthcare system to our European counterparts’ universal systems. Healthcare capitalism is continuously at the foundation of severe discrepancies in care between those who can afford the top care and those who can not. It was really fascinating to read today how metric-based improvement is laden in an already problematic system. Specifically, to learn that certain patients are turned away from important surgeries and that “gaming” these metrics has certainly led to the loss of life. Incentivizing Doctors to do something that should already take pride in doing to the best of their ability feels dirty and wrong. Obviously, metric-based evaluation is not the answer in healthcare, but to me, it points to the greater problems that exist. As healthcare is a luxury good, publicly available metrics can cause a drive by America’s richest to the ‘top’ hospitals and the poorest to the worst. Therefore, metrics not only may decrease overall care but also further drive the already large wealth-based discrepancy in healthcare.
Today we learned about how data regulation leads to the corporations and institutions that use them resorting to gaming the system to meet the data requirements rather than actually applying changes that would improve their results. This has led to a larger disparity between areas of different social classes, where the less-fortunate schools and hospitals for example are not meeting the numbers as effectively and so they are forced to use tactics to game the system more than their more well-off counterparts. As a result, these institutions are forced to use more tactics to manipulate the system than their well-off counterparts. This has created a vicious cycle where the less fortunate continue to struggle while the rich get richer. The use of tactics also leads to a lack of transparency and accountability, making it difficult to identify genuine progress.
It is clear that the use of tactics to meet regulatory requirements is not sustainable in the long run. Instead, organizations should focus on making genuine improvements to their practices to achieve better outcomes. This would not only benefit the organizations themselves but also the wider society as a whole. Ultimately, organizations need to prioritize ethical and transparent practices to achieve long-term success. The use of tactics to manipulate data is not only unfair but also undermines the integrity of the entire system.
The three chapters provide a useful overview of the promises and pitfalls of performance metrics across different fields. A key takeaway is that metrics are not inherently good or bad – their value depends on how they are used. When used for internal feedback and diagnosis, metrics can provide helpful insights for improvement. However, metrics often fail or backfire when used for external accountability and rewards. Though well-intentioned, applying high-stakes consequences to metrics distorts behavior in counterproductive ways.
A recurring theme is matching metrics to institutional culture and professional values. Metrics are most constructive when developed organically from within a professional community to align with its goals and norms. In contrast, metrics imposed hierarchically from above or outside tend to meet resistance and gaming. Though some standardization is necessary for comparison, metrics should not pursue standardization at the expense of meaning and context. Overall, the merits of metrics depend on recognizing their limits – what they can and cannot measure validly. Metrics provide useful data, but should inform professional judgment rather than seek to replace it. Numbers offer valuable evidence, but in human services, counting does not automatically equal accountability.
In this reading, we see the concept of “metric fixation” and how it affects various aspects of society, including education, healthcare, business, and government. We notice that the reliance on metrics significantly increases as we see that metrics become a means of controlling and optimizing complex systems, where we lose the balance between how much not everything important can be quantified, and how not everything can be quantified is important. We notice that the metrics have negative problems in the education system, such as narrowing the curriculum, gaming of the system, and increased stress on students and teachers. This causes distortion in educational goals and values. We see that this doesn’t apply to only education but to other parts of society as well. The idea of “metric fixation” is very similar to how a computer scientist would deal with a problem. By focusing on the idea of efficiency and optimizing the situation, we see that “metric fixation” is also disregarding the concerns of several effects that it could have and instead is optimizing the situation as well. This is very concerning to us as we see that the progress to solving problems in society is not changing and only continuing to get harder and harder. I am very curious as to when we would be able to stop focusing on the idea of optimizing the problem and more about dealing with every concern that is made.
Taking a quantitative approach to quantitative situations is a reasonable approach. When we run into issues is when we try to analyze a qualitative situation via numerical means. The quality of a doctor’s care is not fully represented by whether or not a patient overcomes their condition, but their success rate is. But even then, we aren’t considering cases in which a doctor gets a patient who has 0 possible positive health outcomes. So even then we’d need a wider array of statistics to evaluate a doctor’s success rate. If we took a more qualitative approach we would totally get a sense of the quality of a doctor’s care via exit interviews with their patients. The quantitative approach is just simply more accessible via our current computation resources, and that biases us towards treating anything we track digitally as a quantitative issue. Soon with sentiment analysis and Nature Language Processing, we might be able to better address widespread qualitative issues.
I see this issue come up very commonly in the running world as well, specifically in relation to training logs. I personally used to have a spreadsheet in which I tracked every metric that I could think of: miles ran, time spent running, average pace, peak pace ran, average heart rate, peak heart rate, perceived exertion, vdot score (a scale taken from a book), hrv. I spent a ridiculous amount of time planning out my mileage builds and what kind of efforts different days might require (numerical scale), and I gathered A LOT of data. I felt that I could ensure that I would be undertaking the highest possible quality of training for myself in this way. This was not the case, these numbers couldn’t account for the quality of my training from day to day, the sensation of running from day to day and the effects it had on my body could not be represented by numbers (though hrv actually showed some promise in terms of tracking my health from day to day). I made it through a couple months of my plan, near the end I found that I was blindly throwing myself through 60 mile weeks completely ignoring all of the signs my body was giving me to slow down. I got injured and had to drastically reduce my training load for a significant portion of time. This past summer I based my training on much simpler metrics, I kept in mind my training priorities from day to day and made day of decisions towards how I could best pursue these priorities. This allowed me to put together several months of high-quality training which accounted for the state of my body from day to day.
Truly nuanced issues that require decisions to consider real-life qualitative metrics that change from day to day cannot be solved via quantitative analysis.
I think I tend to be more of a metric fixator than the average Grinnellian, especially the second component and the first to a lesser extent. That being said, I really can’t argue with their point that forcing people to work towards pre-established goals stifles creativity. I’ve felt that in my own work and have seen how people teaching to or only studying for the test can cause a lack of real understanding. Although I’m tired of hearing stats classes remind me that you can’t just set up your hypothesis test to get desired p-values if you want to get any actual results, they’ve got a point.
I don’t know how I feel about their arguing that the 37th rating of the U.S.’s health system isn’t valid. I’ll concede that it’s not strictly objective, but honestly, I don’t think anyone think that that metric is like “how likely you are to die if you had unlimited money to spend on doctors from this country” which seems to be how they’re portraying it. Like clearly a significant part of “healthcare” is that it helps people stay healthy. Not rich people. People. So having some sort of reflection of that doesn’t seem too ideological to me (although I’ll concede that the system might disadvantage universal healthcare, but that, unfortunately, is not a problem we have to worry about rn). And arguing that obesity or having a heterogeneous society are detached from healthcare is kinda absurd. Ignoring that preventative care should definitely be considered part of “healthcare,” if your healthcare system could treat a disease with 100% efficiency, but only when the patient was born under a blue moon and named “Trent” that would still be a pretty fucking bad system. Plus, we’d probably get a lot of people with douchey names whose parents held shitty beer over them while being born.
Chapters 1 and 2 served primarily as intros to what we would read in, presumably the rest of the book, but in our case, specifically chapter 9. However, the way they talked about metrics reminded me a lot about how we’ve talked about data in general in this class. In chapter 1, Muller writes, in defining metric fixation, “the belief that it is possible and desirable to replace judgment, acquired by personal experience and talent, with numerical indicators of comparative performance based upon standardized data (metrics)”. We’ve discussed this time and time again, how algorithms can reduce us, miss pieces, fail to be holistic. However, he also wrote in that same definition, “the belief that making such metrics public (transparent) assures that institutions are actually carrying out their purposes (accountability)”, which I feel like our discussions have sometimes tended to distort. We have not historically been of the opinion that transparency fixes all, not even close, but I distinctly remember that with social media ads, we (and the article) noted that being very clear about how the data is used and manipulated could stop giving bad actors cover for using discriminatory practices; perhaps this is all we could do short of hard to come by legislation, or perhaps it echoed the reading’s conclusion, but transparency would not, and does not, mean we’ve found justice or necessarily made much improvement. Back to being reminded in a “yeah we’ve talked about this way”, chapter 2 mentions that information can be distorted in 8 key ways, including by “measuring what is easiest to measure”, “measuring the simple when the desired outcome is complex”, and “improving numbers through the omission or distortion of data”: all of these three resonated with the discussions we’ve had, specifically about using proxy variables in algorithms like COMPAS for the first point, or a 1-17 discrete value scale in the algorithm to decide who gets temporary housing for the second, or kind of COMPAS again but also most of them that fail to be holistic for the third (even if this omission isn’t intentional, its just that people are way too complex to optimize over like that). Still, that left 5 ways we hadn’t talked about, like “degrading info quality by standardization” which reads like a lossy conversion issue but also by trying to treat all cases as the same at some level, losing nuance, or changing the goalposts by lowering standards for success by cheapening what success means, or even changing who or what you’re looking at by finding easier targets who it’s less challenging to be accurate for (gaming through creaming). This demonstrated to me that while we’ve had many discussions about what are essentially downfalls of being so metric-centric and harming real lives in the process, we’ve yet to intentionally consider everything that may imply.
Now onto chapter 9, which was the bulk of the reading. It was, of course, about healthcare, and pretty early on gets into some ways that, in healthcare, metrics can be very valuable and be “solutions” of a kind, which disrupts the deep pessimism set out in chapters 1 and 2. He discussed how metrics, for example, were able to be harnessed to cut the rate of central line infections by 66% as evidence that metrics do have a place in healthcare, because “Rapid improvement in any field requires measuring results—a familiar principle in management. Teams improve and excel by tracking progress over time and comparing their performance to that of peers inside and outside their organization. Indeed, rigorous measurement of value (outcomes and costs) is perhaps the single most important step in improving health care. Wherever we see systematic measurement of results in health care—no matter what the country—we see those results improve”. However, the end of this section gets us under halfway through the chapter’s 22 pages, and the rest is spent exploring how these metric successes are the exception rather than the rule. Metrics and metric-centrism still fail us. The advent of “medical report cards” have not, as Muller pointed out, improved the standard of care as patients see it super dramatically, but have instead become a PR management issue for hospitals that can actually draw away resources. While it’s true that entities are often driven by maintaining their image, the cost of them doing so can make the “cure” worse than the problem, or if not worse, at least a far cry from a solution.
This also reminded me of a presentation I heard this summer, shown as an example of “sometimes research just doesn’t work out”. Essentially, metrics were able to be used to pretty accurately predict when a rare heart condition was occurring and would require pretty invasive treatment (and when it was inaccurate, it tended to overestimate concern, not falsely say there was no condition). This was critical, because the condition often presented as less dangerous than it was, and was revealed to be the rare condition all too late. Thus, it was set up with a system where doctors could check the metrics result before making care decisions. However, it turned out doctors didn’t love having their authority questioned, so after all the (kinda invasive) data had been collected, 5 years of research conducted, and a very expensive system designed, it didn’t even get used. This demonstrates both metric-centricism, that the researchers really felt that this solution could outpace doctors (which it kind of could sometimes, but also sometimes overshot the danger of a condition, and doctors had nuance to handle that aspect a bit better), but also illuminates the human resistance to being “replaced” by metrics, which has an interesting pull with the metric-centric notions of wanting metricize everything we can and viewing them as largely positive even where it might not be deserved, because it shows there’s also a tension where those who inhabit the spaces where metrics are being placed, especially in the medical field, where being there requires many long hard years of training, may not be super welcoming to a heavy metric presence.
Rating systems, as a result of corporate push for highest possible satisfaction and performance, have completely lost all meaning in almost every context. When I give a movie five stars, it means that I thought it was perfect and enjoyed it immensely. When I give an Uber five stars, it means the driver didn’t put my life directly in danger while I was in the car. The same goes for other similar services that you engage with on a personal level, like restaurants, Airbnbs, and whatever else. Anything less than a perfect score is considered a loss, and employees suffer because of it. Anything less than perfect or near perfect is considered punishment worthy. And, to avoid giving people a hard time, you simply feel pressured to turn a spectrum of opinion into a binary good versus bad system.
It’s not surprising that this sort of bizarre thinking and desire to put everything in a box and sort those boxes around has spread to medical systems and had a negative effect. I wasn’t aware of the issues with ranking health systems by WHO’s criteria, and, because we have a natural tendency to trust the numerical rankings, I never even gave them a second thought. I don’t think it’s unnatural to make these sorts of connections and try to find what makes some health systems perform better than others. But one must wonder what incorrect directions we have and may in the future point ourselves if we simplify the interactions between complex, different systems and are unaware of the tangible changes that can actually be made.
The book talks about metric fixation. The key concepts involve 3 components of beliefs: the belief that it is possible and desirable to replace judgment; the belief that making metrics public assures the accountability of institutions that really use the metric; and the belief that using ranking as a way to motivate people. However, the quantified system also got distorted into more severe problems. The ranking tends to measure the things that are easy to measure and ignore others, and the standardization also limits what is the “best” in the ranking system. Currently, the US. news have reported that they have problems with their algorithms which leads to the ranking variation for more than 100 universities. The ranking metric would make that huge change with the change algorithm and it is interesting to learn the metrics in the healthcare side.
I think for a while we have talked about the idea that the data that is available to us is not always good, complete, or even useful. I think this reading though goes a step further into that discussion of what trying to use that data to directly drive decisions can do. Before, we had often been talking about the algorithms that use the data to determine some potential outcome or a recommendation, like with VI-SPDAT and COMPAS. The examples that Muller mentions in the chapter on medical uses of data though are human interactions in response to data. We are told that in the case of the UK P4P program, many of the health issues not accounted for by data saw a decrease in positive outcomes. There was a mention that the statistics on tubing infections for some medical device was designed and implemented by doctors themselves. I believe Muller even specifically calls attention to this because motivation needs to be internal from the doctors and practitioners and such a strategy devised internally is much more likely to drive motivation. Generally speaking, I think the reading makes a very good point of showing that not all metrics are helpful to improving an industry like healthcare and while some can show promise and save lives other data implementations can actually draw attention away from helpful practice and divert resources unnecessarily.
I think the author’s discussion of metric fixation combined with the general techno-solutionism observed in most modern industrial and social issues (which itself is a result of what Cathy O’Neil refers to as math washing) reveals a curious paradox. As discussed in the class before, we tend to offer technical fixes for increasingly complex problems. These fixes necessarily abstract away certain complexities due to the limited technological affordances of binary computers (for example), but we tout them as better solutions than those that do not involve technology.
The discussion of the gaming of metrics in healthcare also reminded me of the undergraduate college admission process. While having clear cut metrics for what could get a student into a certain college does allow for people to gauge where their efforts and money would be best spent for which colleges to apply to, these same metrics allow for students/parents with more power (i.e: more money) to ensure that their profile matches the metrics in ways that are not accessible to everyone. So, bad actors with power can and will exploit metrics. Due to this, others will feel compelled to engage in the similar behaviour to the best of their abilities in order to ensure that they are able to get a good education. Ultimately, metrics (barring the other shortfalls discussed) are only as good as the people/ organizations willing to use them in the way they were intended. As the author says, metrics should not be imposed but created with the stakeholders that it will be used to regulate.
“The Tyranny of Metrics” by Jerry Z. Muller offers a penetrating critique of our cultural obsession with quantifiable performance metrics. As a computer scientist, I am acutely aware that algorithms are increasingly used to measure and evaluate human behaviors and outcomes, especially in the health sector. From patient diagnosis to hospital efficiency, algorithms are often considered neutral tools that can enhance objectivity and decision-making. However, Muller’s work reminds us that an overreliance on quantifiable data can lead to the devaluation of qualitative judgment and the nuanced understanding of human needs. This concern resonates deeply when considering health disparities and the delivery of healthcare services. In health applications, algorithms on the one hand improve efficiency, personalize medicine, and generate better patient outcomes. On the other hand, they can inadvertently perpetuate biases or reduce complex human experiences to mere data points, potentially leading to the marginalization of those who do not fit neatly into the quantifiable boxes.
There were several points in the reading that reminded me of past discussions. I guess one example is the manipulation of performance indicators and focusing attention on what is measured or rewarded and putting less resources into other goals of the organization (possible because they’re less measurable using metrics). An example that came to mind was the college ranking system. US News factors in specific variables that contribute to a college’s ranking; thus, colleges invest more resources into improving those metrics to attract more students and keep their business running. During high school I basically worked at a hospital doing analytics on patient feedback. In this case, I did see how metrics can improve patient experience. When I first started working at the hospital, there weren’t many resources for patients whose primary languages were Spanish and Chinese. After the hospital began incorporating questions about communication and language in patient surveys, they found how crucial communication in these languages was to patient understanding, as a large portion of patients spoke Spanish or Chinese with more fluency in comparison to English. This led to the investment in more resources for Spanish and Chinese-speakers, as well as a push for hiring translators and other medical workers who could speak these languages so that patients could better understand their health.
I really enjoyed these readings and learning about metric fixation, I feel like these chapters verbalized trends I was noticing. Healthcare is already an impenetrable/opaque system, and I think the more media I see dedicated to demystifying healthcare implicitly or explicitly addresses these ideas of metric fixation. We’ve discussed before in class that optimization and efficacy can sometimes neglect the needs of disadvantaged or marginalized groups, as there becomes more of a focus on whether the benefits outweigh the risks. It seems egregious that so much value is placed on metrics to the point where hospitals will manipulate how/if they serve patients to guarantee an appearance of efficacy, but it also makes sense when so many support structures are tied to this way of representing success.
In “The Tyranny of Metrics,” Jerry Z. Muller explores the shortcomings of relying on metrics in various domains. In Chapter 1, Muller introduces the concept of “metric fixation” and how it has become the primary method for measuring progress, with the intended benefits of promoting transparency and accountability. However, Muller’s overarching argument centers on how the use of metrics to gauge progress has, in fact, become detrimental due to its inherent flaws. In Chapter 2, Muller identifies several key issues with the metric system. For instance, metrics can incentivize individuals to manipulate the system, as the pursuit of success is often tied to achieving certain metric-based goals, leading to unethical behavior. Another significant flaw he highlights is that if people lower their standards or understanding of certain aspects, the metrics can still indicate good performance, even when the actual quality of performance has declined. In summary, Muller’s work underscores the idea that data derived from metrics can be unreliable, and institutions, including those in the medical field, should consider a broader range of factors beyond metrics when assessing performance and progress.
I think that the tyranny of metrics is a useful framework for discussing many of the societal issues we’ve already seen in this class. For example, the COMPAS algorithm’s existence suggests a tyranny of metrics in the justice system: it’s believed that having *some* metric for estimating the probability for recidivism is better than having no metric at all. The difficulty is that once a metric in a system is put in place, it’s difficult to get rid of it. Case in point, even years after the ProPublica article exposed the racism inherent in the COMPAS algorithm, it continues to be in place to this day. Given this, it seems like at any institution, engineers and leaders should carefully evaluate inherent biases in any metric before its introduction, since fixing ore reverting the use of the metric will likely be difficult in the future.
The discussion of metrics in this reading was, as many others have pointed out, very reminiscent of our discussions in class of using algorithms for policing, or the phenomenon of LLMs parroting incorrect or prejudiced information. The idea of a sort of blind trust in “data” as objective, and somehow not as vulnerable to human bias as decision making based on human judgement. Over the past few months, we have clearly seen that this is untrue. The consequences of this non critical view of data for the healthcare system are laid bare in chapter 9 of the reading. The public ranking of doctors and hospital systems was very similar to the system of college ranking we discussed earlier in class, but with less effect. Though college rankings do not neccesarily push colleges to make changes that improve the lives of their students, it was clear that they do have an effect on what colleges spend their money and time on, thus altering outcomes of learning in some way. However, the hospital rankings were said to have no effect on outcomes, a testament to the waste of time and energy they may be.
I really liked these readings, as I immediately understood the growing obsession with metrics Muller was talking about. It seems like in every aspect of major institutions, success is measured in numbers and not any humanistic qualities. During a time when every major employer is looking to be as efficient and cost-effective as possible, on paper it makes sense to make everything quantifiable. It’s easier to understand at a glance and implement into different data technology. But as the reading shows, metrics (in the way they’re often implemented), are often not as effective as qualitative data and can often be gamed or cheated. We previously read about the US News college rankings, which pushed colleges to put their money into easily quantifiable variables. There were obvious problems with this as the improvements often only helped the reputation of the school and did not necessarily aid students and faculty. A very similar situation was seen in chapter 9 of this book, but gaming metrics in the healthcare system directly effects the wellbeing of patients, as these numbers determine the choices of them and healthcare providers. I believe there is a place for metrics in these institutions, but there needs to be a higher level of transparency about what these numbers actually represent and methods of exposing deceptive practices.
The readings concern the metrics system and how it is utilized in the medical field. This is problematic on so many levels because it incentives doctors and medical companies to “game” the system at the expense of people’s lives and physical well-being. We’ve already seen how this effect is manifested in the college ranking system, where a set of metrics was defined with a narrow-minded perspective on the role of education. Over time, colleges figured out what they are being judged based on and attempted to improve those metrics to raise their ranking. This created a further divide between top-ranking, prestigious colleges who have the resources to manipulate their statistics and lower-ranking ones who struggled to climb up the ladder. Similarly, agents in the medical industry will strive to manipulate their performances and improve their metrics on the surface, while in fact not actually honing their skills and bettering themselves professionally. This distracts them away from their actual work, which is saving people’s lives and conducting research to improve patients’ physical well-being.