Reading for Monday October 9th

Watch: On the Dangers of Stochastic Parrots (https://www.turing.ac.uk/events/dangers-stochastic-parrots)

We read the paper that forced Timnit Gebru out of Google. Here’s what it says, by Karen Hao (https://www.technologyreview.com/2020/12/04/1013294/google-ai-ethics-research-paper-forced-out-timnit-gebru/)

Optional: On the Dangers of Stochastic Parrots, by Bender, Gebru, McMillan-Major, and Schmitchell. (https://dl.acm.org/doi/pdf/10.1145/3442188.3445922)

22 thoughts on “Reading for Monday October 9th”

  1. As discussed in the MIT Technology Review article by Karen Hao, there are a number of risks in the training and deployment of Large Language Models, from environmental and financial risks to an inability to audit training data due to their sheer scale. One of the risks discussed is the issue of misinformation and people being fooled into believing the output of Large Language Models without hesitation. However, I believe this is an issue that can be explored further. Through a number of in-class discussions, I believe that Large Language Models can be compared to vaccines in some ways. As discussed in class, computer scientists and engineers can understand the theory of these models, but this does not necessarily translate to public trust or understanding.

    In the case of vaccinations, there is not an insignificant group that do not trust vaccinations. For some, a strained relationship with health professionals, pharmaceutical companies, and politicians can result in a lack of trust. At the same time, some of this lack of trust can be associated with conspiracy theories, which can be fostered by a lack of transparency in governance and healthcare. With a lack of accessible information on how something works, it is possible for communities to form that question well-supported facts, like the efficacy of vaccines. Like vaccines, Hao explains that Language Models tend to require significant resources to research, develop, and deploy. Due to this requirement, those with a lack of trust for these powerful organizations can then distrust all answers from these Language Models. At this point, people may be divided based on whether they trust LLM’s as a source of information, further dividing society on what sources of information may be considered trustworthy.

    Reply
  2. The readings for today analyze the dangers of Large Language models, that seem to touch most aspects of our society. They are contributing to climate change, they perpetuate harmful terms being used in “trusted” settings, they are capable of learning racist and biased viewpoints because their primary goal is to gather as much information on the internet as possible, and their sets of training data are simply too large for it all to be proven as credible and valid training data.

    It is important to recognize these potential dangers of Large Language models and to ensure that they are used responsibly and ethically. This can be achieved through careful training and validation of these models, as well as by implementing safeguards to prevent biases and inaccuracies from being propagated, or by reassessing the training model’s “as-much-info-as-possible” approach.

    Focusing on the environmental impact of Large Language models, we know from the readings for today that their massive computational power has a significant carbon footprint, and can omit as much carbon as five American cars can in their entire lifetime. By using more efficient algorithms and hardware, we can help to reduce the environmental impact of these models while still reaping their benefits.

    Overall, it is essential to address these risks and find ways to ensure that Large Language models are used responsibly and ethically. By doing so, we can harness the power of these models while minimizing their potential negative impact on our society and the environment.

    Reply
  3. The lecture and the article discuss some of the potential problems that negatively affect the results when using large language models in social science studies. First, the costs of clouds and carbon emissions due to the training of the large model is a significant issue that causes environmental and economic issues. As industries and companies try to use the LLM to “follow the step” of the change of the world, massive resources lean toward the LLM and big data which sometimes cause waste in labor and computing. Besides, with more and more models emerging, the data cannot be well-validated. The lecture focuses on “how big is too big” for LLM. As they mentioned some of the underrepresented groups or communities may lose their importance when the samples become larger and the increment in the samples of represented groups. Thus, she mentioned in the lecture that the incorporation of energy and compute efficiency is needed in planning and model evaluations. The selection of datasets and the potential users information are important before doing the machine learning. It is also important to identify stakeholders and design to support their values.

    Reply
  4. In the realm of Natural Language Processing (NLP), the development of large language models has brought about both excitement and concerns. These models, with their ability to comprehend and generate human language, have found diverse applications across industries. However, it is crucial to approach their usage with caution and consider the potential risks and limitations associated with them.

    One important aspect to consider is the environmental impact of training and deploying large language models. The computational power required for training these models consumes significant amounts of electricity, resulting in a substantial carbon footprint. This raises concerns regarding the contribution of these models to climate change, particularly when marginalized communities are disproportionately affected.

    Another area of concern is the potential biases and harm that can be embedded in the models due to the vast amount of data they are trained on. It is crucial to ensure that the datasets used for training are carefully curated and audited to mitigate the presence of racist, sexist, or abusive language. Additionally, the issue of unrepresentative language models and the lack of diverse voices in their training data pose challenges in achieving inclusivity and fairness.

    Furthermore, the ability of language models to generate synthetic text raises the risk of misinformation and the spread of false information. As these models become increasingly sophisticated in mimicking human language, there is a need for robust risk management strategies to address the potential harms associated with their misuse.

    In conclusion, while large language models offer tremendous potential for advancing NLP applications, it is crucial to approach their development and deployment with caution. The environmental impact, potential biases, and risks of misinformation highlight the need for responsible development, thorough curation of training data, and ongoing research into risk mitigation strategies. By addressing these concerns, we can harness the power of language models while ensuring their ethical and responsible use.

    Reply
  5. The readings and video for today analyze the true risks and costs of LLMs on the environment and society. Environmental costs are something that always catches my eye. Climate change is one of the biggest threats to the future of our society. The typical cutting-edge technology of today has an association with a carbon-neutral or environmentally sustainable characteristic, so in my opinion, there has been a shield almost to the effects these technologies can have on the environment. It is thus important for people to be educated on the true environmental costs LLMs and other machine learning models can have on the environment. The energy expended to train some of these models is astronomical and irreversible. In addition to that, the large data sets typically reflect biases, and as said in On the Dangers of Stochastic Parrots: Can Language Models be too Big?, the larger data sets used to train LLMs reflect hegemonic viewpoints. LLMs often are trained by older data sets that are even more prone to biases and due to not reflect societal changes.

    Both the article and video today bring forward important concerns and present threats LLMs pose to society and the environment. Professor Emily M. Bender offers a fantastic set of solutions in her presentation where there is an evaluation of the data imputed with reference points. Considering that these models learn and process decisions simply based on data, it is crucial there is transparency and evaluation from which these models learn.

    Reply
  6. Representation is a primary concern when it comes to the expansion and implementation of LLMs, namely in that data that is used as an input for these models may be inherently culturally skewed or contain harmful sentiment. In either case, as computer models get better and better at replicating the texts they are fed, they may begin to disseminate these harmful ideas. One example that is given in the technology review article is the notion that since the internet is dominantly written in English, and even more so—as we examined last week—in dominant dialects of English, there may not be a sufficient amount of input for underrepresented languages and cultures in the outputs of LLMs. This may further isolate already-marginalized groups from technology use, and widen the technology gap. We saw in our discussion of voice assistants that the accessibility was limited to certain socially dominant or financially-incentivized groups, and a similar effect could emerge from LLM’s. Not only are tech companies not incentivized to ensure that their data is equally representative, the nature of the internet and of recorded history (think how many black and brown authors in American History never had their works widely published or were completely prevented from writing in general) is dominantly white, christian, etc.

    Reply
  7. Both the MIT article and the video highlight how the physical aspects of computing, which are often abstracted away, and the digital aspects of its can work to reify existing global power structures. Firstly, as the MIT article describes, training LLMs is an extremely resource heavy task and is a massive contributor to greenhouse gas emissions. At the same time, Professor Bender also talks about how the data design of training these LLMs works to benefit the individuals who already have hegemonic power. In conjunction, these two ideas show how these systems, through their contribution to climate change (which disproportionately effects the global South) and the centring of those with relatively greater global power (linguistically and, therefore, culturally) in their training design, unintentionally work to oppress disadvantaged groups and people in the physical and virtual world.

    Reply
  8. In her remarkable talk, ‘On the Dangers of Stochastic Parrots,’ Dr. Emily M. Bender explained the research paper of the same name, which she co-authored with several other scholars. The talk addressed the perils and risks posed by Large Language Models (LLMs), both current and anticipated. Dr. Bender and her colleagues raised a plethora of intriguing questions and made several noteworthy points. One aspect that particularly caught my attention was when Dr. Bender presented a graph depicting the size of databases. This graph aimed to help us answer the question of ‘how big is big.’ It was exciting not only to observe the current size of databases but also to witness the substantial growth they have experienced over the past four years. Naturally, one cannot help but wonder how much larger they will become in the upcoming years.

    Another thought-provoking aspect of the talk was the discussion on the environmental impact associated with training various models. It was amusing how Dr. Bender tackled this issue by presenting relevant statistics and highlighting that the first groups to be affected are likely those who won’t benefit as much from LLMs. Additionally, Dr. Bender emphasized the necessity of intentionally selecting datasets to mitigate preexisting biases and the misinterpretation of current language. This was exemplified by a quote from an article by Birhane and Prabhu: ‘Feeding AI systems on the world’s beauty, ugliness, and cruelty but only expecting them to reflect beauty is a fantasy.’ This sentence served as a reminder that we have the power to shape the outcomes we desire, but it requires dedicated effort.

    Turning to ‘We read the paper that forced Timnit Gebru out of Google. Here’s what it says by Karen Hao, it was indeed captivating to witness the entire interaction between Gebru and Google. It revealed how, ultimately, corporations prioritize their own interests. In this case, it shows how Google fired an excellent professional in order to avoid letting people know the perils of their AI systems. As Dr. Bender aptly pointed out in her talk, there exists a significant disparity between the goals of research conducted by the industry and those pursued by academia.

    Reply
  9. The stats on environmental impact are crazy, I have no idea what kind of computational capabilities must be necessary to train a model like that. I also wonder if that kind of a load could be distributed across a virtualized compute pool, would it increase the energy demands to distribute it like that? I would have to imagine that it could, but I also wonder how many places might just have one machine powerful enough to handle the task. Interesting stuff.
    Moving on to another point, these language models are only intended to create language, specifically language that sounds human. I am wondering if we are potentially creating advanced liars by training bots that are specialized to produce language that sounds like what we want to hear. Is there anyway we could incentivize a large language model to produce human language that also contains truth? I don’t know how we would train that. It’d probably take a pretty crazy data set.

    Reply
  10. I found the article about how Google forced out Gebru really scary, but also a good example of why tenure can be a good protective force when it comes to balancing the interests of research and industry. I also thought that the video was cool and that the idea of a filter seems good at first glance but then you soon realize that words like “queer” have been reclaimed so it soon becomes unsustainable. The value lock idea was also what I’ve been thinking so it’s good to have a name to put to it. The most interesting thing was some of Dr. Bender’s other work that was brought up in the video.
    My main issue was with the continued insistence that there’s no possible way that LLMs can “understand” anything. I even went and looked up the Bender and Koller (2020) paper that Professor Bender mentioned in her video, about how AI can’t understand. I hoping to see proof that would help me shed light on this debate that I’ve been having with myself, something to tilt the scales. However, the paper is just “an argument on theoretical grounds that a system exposed only to form in its training cannot in principle learn meaning” by providing some fascinating thought experiments that are ultimately just resolved in a manner chosen to be consistent with their point. Plus, I think even some of their proposed scenarios are outdated, as GPT-4 has been able to consistently solve the Stacking Objects problem, which they claim it shouldn’t be able to. That, in addition to the unicorn’s horn, which was another test developed for AI “understanding,” were both supposed to show that models don’t have a deeper comprehension of the world… until they did. Now I don’t know if they got lucky and were able to pick some random meaning out of words to guess the right answer, or if they actually, in their training of trillions of parameters, we able to make new logical connections and learn about the world, but it doesn’t seem like anyone else does either, and I’m skeptical of people who claim to be able to. If Ramanujan could learn math through reading and Armagan could learn to paint through talking, then who’s to say that language isn’t a valid way to learn?

    Reply
  11. I found both the talk and the article very insightful when it comes to the state of AI. We rarely discussed the environmental consequences of training these models, but it was fascinating to hear about what the carbon emissions looked like. Simultaneously, I found it really gross how Google’s AI head said their paper didn’t discuss “relevant research” about energy efficiency, which is why he didn’t publish it. It feels very deceptive to say that they shouldn’t bring up these concerns because they didn’t acknowledge all research on the issue; the problem of environmental damage due to AI training would still be relevant with or without acknowledging other research. Both Dr. Bender and Hao did a very good job of showing that despite companies and academic institutions putting an unbelievable amount of resources into developing LLMs, there are still massive blindspots they are hesitant to further investigate. The echo chamber of hegemonic voices that these models take from create more biased, more dangerous products that will internalize controversial viewpoints, while ignoring views from marginalized communities. Obviously, these models need to be trained on more diverse views, and companies should do a better job of filtering out harmful media, but the damage that has already been done within these models should be looked at. Narrowing the focus of LLMs by carefully picking what media they’re trained off of is something I hadn’t really considered before, and it seems like there are many problems that would come up when attempting that, but it still sounds promising. Still, the forcing out of Gebru makes me worry more about the future of AI. If these companies create ethics teams, and those ethics teams bring up concerns, they shouldn’t be able to just get rid of the people they disagree with.

    Reply
  12. The article and video reveal some of the multifaceted challenges at the intersection of large language models and AI ethics, within the broader context of industry and academic research. The environmental footprint of training large AI models is alarmingly high. Models like Google’s BERT have carbon footprints comparable to significant human activities like round-trip flights. Such findings demand that we rethink our priorities in model training, balancing advancements with sustainability. As models get larger, the datasets they are trained on become expansive, often capturing the good, bad, and ugly of the internet. This leads to models inheriting and sometimes amplifying the biases present in these datasets. The researchers rightly point out that this has ethical implications and impacts how these models respond to cultural and linguistic shifts, potentially promoting a homogenized worldview. The argument that the industry’s focus on large language models diverts attention from other potentially beneficial areas is compelling. The focus on accuracy and fluency can overshadow understanding. This seems like an inflection point in the development trajectory of AI models, where we need to ask – are we striving for machines that mimic or understand?

    Reply
  13. The first thought that comes into my mind when it comes to the removal of Gebru from Google due to reasons based on biases and a demand for better inclusion, is that Google is choosing the idea of business, which would benefit them more rather than the choice of what the people want. These thoughts lead companies to further improve their current successes by putting more resources into it. By doing so, we are releasing more CO2 emissions that are only continuing to affect our environment and profiting from them. We also have the problem of racism, sexism, and abusive language being present in LLMs due to their ability to take in all resources and information, tying to the idea of too many resources leading to the deterioration of the environment. These risks are so important and detrimental that there are some suggestions on how to mitigate the risks such as becoming more efficient so that the number of resources used is reduced and the CO2 emissions are reduced as well as being able to maximize work in a more friendly environment. These ideas are very interesting in the fact that now that we know about LLMs, we are getting to know how the strive for better is affecting our surroundings as well as how there are some risks that can affect us as well.

    Reply
  14. Today’s readings mainly discuss the dangers of Large Language models such as environmental risks, extremely high research costs, and generating misinformation. In my opinion, generating misinformation by using a large language model is the most dangerous because it has great political and societal effects all over the world.
    The Russian-Ukrainian war represents the first instance of a war incorporating information warfare through AI technology. Russia employed AI technology to generate and spread disinformation, which refers to deliberately spreading false information with the intent to deceive people. The act of spreading disinformation to alter the “narrative” of people has met with partial success. Regrettably, some individuals in Western countries trust that information and blame Ukraine.
    The trend of using AI technology to gain an advantage in narrative warfare is expected to continue and evolve. As the world faces these challenges, it becomes increasingly crucial to develop effective safeguards and systems to mitigate the potential danger of LLMs in shaping our collective reality.

    Reply
  15. The diagrams of environmental impact from the MIT article felt familar—perhaps they were shown during the class on NLPs, or at least something like them? Regardless, their impact was just as profound the second time around. It is very easy to abstract computing away from the physical—digital copies are more eco-friendly than paper, in-depth communication is possible without driving to meet one another, so and so-forth. Yet, the impact training these models alone takes, not to mention running them incessantly, cancels out any of these positive impacts by miles.
    On a different note, the forcing out of Timnit Gebru is very telling of the priorities of big tech (read: $ over ethical practice, close to every time). She was one of Google’s top AI researchers, yet, because she was unwilling to paint an apologist narrative by highlighting all the work being done to be more energy efficient or mitigate bias instead of showing these problem’s ugly heads for what they were. Though this was not formally a firing, it is very clear that Google has very limited respect for her work and her concerns, and will continue to be the source of the problem rather than the solution while brushing off accountability because they are making marginal changes that still don’t even keep pace with the rate of harm done by these models’ increasing proliferation and exponential growth. Google has had a great fall from grace since the days of “don’t be evil”, and are now categorically one of the big bads they were founded with a dream of being an antithesis to.

    Reply
  16. As others have said, the environmental impacts of training these language models is nearly unimaginable. I think this is a powerful side effect of the enormous offloading of computing and processing to anonymous computer warehouses that we don’t really implicate ourselves in. I am hopeful, though, for the future in these terms. Some of the largest tech companies, such as Apple, are already carbon neutral, and many others, like Meta and Amazon, have set goals to be carbon neutral fairly rapidly. As we’ve discussed previously, only a couple of companies will have the resources to build their own models, and we have seen larger companies choosing to use existing products over developing their own. Hopefully, like it was said in the lecture, more efforts will be directed towards “distilling” and repurposing existing work rather than spending an immense amount of energy to make an equivalent product.

    It was interesting to read about someone who had faced consequences that stemmed directly from their work on this paper after the presenter indicated that several people had not been included as coauthors due to demands from their employer. None of what Gebru said was unethical, untrue, or directly damaging to Google, as she was contributing fairly to a scholarly conversation. Google, though, simply didn’t like what she was saying, and pushed her out. One has to wonder what the point of an ethical AI team really is if you can and will swiftly ax anyone who engages in ethical discussions or raises difficult questions that challenge a company’s bottom line.

    Reply
  17. I remember that this story was quite large back when Timnit Gerbu was fired, and I vaguely remember that there were discussions about the research she was doing and the idea that she was fired for disagreeing with the path Google was taking. At the time, though, I did not really understand the implications of what the paper was about, and I think it took a little bit of time before more was understood about the ideas. I think it is kind of shocking to see the majority of our talking points in class laid out under the paper, from carbon emissions to the embedding of sexist and racist language to cultural and classist questions about the data input. At the time she was fired, I thought it was a terrible and unethical company decision, but now that OpenAI has created something like ChatGPT, the implications feel so much more present and real. I no longer feel like we can avoid discussing the implications Gerbu and the rest of the paper writers laid out. We need to put regulations in place that address and restrain the growth firstly until our energy infrastructure is more sustainable, but secondly until we have a better grasp of the implications of this technology.

    Reply
  18. I think the content for today’s class addressed environmental effects of LLMs that weren’t heavily discussed in previous readings. Marginalized communities who are most impacted by climate change not reaping the benefits of LLMs since LLMs are built on data of “high resource languages”, in combination with the overrepresentation of hegemonic viewpoints in these massive datasets ultimately just reinforce systems of oppression through another medium. Furthermore, it was unsurprising to me that Google fired Gebru, but I also just found it weird since doing this research and finding results like these is her job. If anything, the findings from the stochastic parrots paper align with Google’s AI principles and objectives.

    Reply
  19. We have previously discussed the environmental impact of LLMs, so the more compelling aspect of this reading for me was the performance problems that accompany building LLMs. Of course with ChatGPT we have seen how effective LLMs can be, yet the models continue to have problems with formal or logical reasoning. I think it’s a compelling question of whether this problem is inherent to the architecture of the model or if it just requires more and more data.

    I also think it’s important to frame the problem of LLMs in terms of opportunity costs like the researchers do. Clearly LLMs are effective, but what is the end goal? If we eventually want to create generally intelligent machines, then it seems like the current transformer architecture may be insufficient to that end. Compared to human intelligence, we need far less data to learn new concepts or language than a machine learning model does. Given the substantial economic and environmental costs, it’s worth considering whether we are spending too many resources on what may be a “dead end” to reaching AGI.

    Reply
  20. The paper “On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?” co-authored by Professor Emily M. Bender views the costs and risks associated with large language models from perspectives that are less talked about by industry experts. After some of the readings last week, I had questions about the effort that goes into training the models as well as the data that get used for the algorithms, which are all discussed in further detail in the presentation. I particularly like the slide about the unmanageability of data and the presence of embedded biases in even the largest datasets where she had multiple examples of certain identity groups being disproportionately represented in some common online sites. This can lead to a lack of diversity in not only gender or race, but also in life experiences or styles of writing given how it is understandably preferable to scrape data from more legitimate sources. Professor Blender mentioned how important it is to be intentional with the selection of training data and to document the process as well as the criteria for each dataset so we can better track the sources of biases and come up with solutions. One question that I have about this is whether and how it is possible to document the training process if it is becoming automated and machines are being programmed to constantly learn from real-time data. Is there really a way to control for biases and input data if we are moving towards constantly updating what are considered facts or truths?

    Reply
  21. The idea that Timnit Gebru was forced out of google for her participation in the research surrounding the dangers of stochatist parrots really underscores the priorities of big tech companies. Any threat to the bottom line (even something that is not neccesarily a threat, just something to look out for) is hostile, and must be ousted. Perhaps it is due to my relative unfamiliarity with the concept, but from what the video showed, I honestly don’t understand googles reaction. It did not feel like a substantial enough hit to the company or its AI prospects, so I am curious about what other factors contributed to the parting of ways between google and Gebru.

    Reply
  22. I thought the panel discussion on parroting was fascinating. I thought it was important how they talked about how experimenting with models of LLM contributed a lot to CO2 emissions of around 284t in comparison to during a human lifetime and releasing around 5t. Then how this emission will disproportionately negatively affect marginalized groups in addition to the harm the software itself is doing with all of its bias. This all just makes me wonder if we really have a place on this earth in the long term. At this point, I feel as if this planet will only be habitable for computers at the rate we’re going with advancing with technology and caring less for individuals.

    Reply

Leave a Comment

css.php
The views and opinions expressed on individual web pages are strictly those of their authors and are not official statements of Grinnell College. Copyright Statement.