Reading for Wednesday November 22nd

Captcha if you can: how you’ve been training AI for years without realising it by James O’Malley (https://www.techradar.com/news/captcha-if-you-can-how-youve-been-training-ai-for-years-without-realising-it)

Underpaid Workers are being forced to train biased AI on mechanical turk by Aliide Naylor (https://www.vice.com/en/article/88apnv/underpaid-workers-are-being-forced-to-train-biased-ai-on-mechanical-turk)

The Trust Imperative: A Framework for Ethical Data Use (https://bigdata.fpf.org/wp-content/uploads/2015/11/Etlinger-The-Trust-Imperative.pdf)

21 thoughts on “Reading for Wednesday November 22nd”

  1. In the article discussing underpaid workers training AI, it is mentioned that many workers live in politically and economically restricted countries, making them dependent on the work. As mentioned by Aliide Naylor, not only are workers encouraged to conform to the majority of other workers labeling data to keep their position, but their work may be rejected by clients, resulting in no pay. Overall, this is a situation that capitalizes on the vulnerability of groups of low socioeconomic status and can definitely result in biased results for the information being labeled. However, even if these workers did not have economic pressures encouraging them to conform to the majority, there would still be pressures encouraging workers to confirm and reinforce biases. Particularly in politically restricted countries, there is a fair chance that speech may be restricted in certain ways to protect the current political organization of the country, like a monarchy or dictatorship. Of course, this is problematic in that it can reinforce these biases through the creation of algorithms trained on biased data. However, even when we consider the transcription of data that is less biased or controversial, these corporations assume that the associations made between data and their transcriptions today will remain valid. As the associations people make between images and emotions are shaped by culture, it is not necessarily the case that the data used to train algorithms today will remain valid as cultures change over time.

    Reply
  2. Microemployment seems like another unregulated source of worker exploitation, with various disincentives that devalue the need for humans in the long run. In essence, it seems as though microworkers are being asked to act in a manner that a machine might—devoid of emotion, instructed on what to do and how to respond to prompts. All this with the goal of training AI—like a teacher might a student—to do a clients bidding. It is easy to see how bias within the company might seep into the products they create if there are no safeguards in place to ensure workers are not taken advantage of or told how to respond. This sort of environment is a fertile ground for unfettered bias, and sets the tone for tech companies to control the narratives for their products in the future. As tempting as the promises of AI are, it is careless to train it with such a limited process, and the results will ultimately exude this input.

    Reply
  3. Reading Captcha if You Can: How You’ve been training AI for years without realizing it by James O’Malley completely caught me off-guard. I do not know if it was complete negligence or idiocracy on my part but I had no clue as to why Recaptcha had changed its model from one of words to one of images. Almost all that O’Malley explains about Recaptcha violates a lot of what is stated in The Trust Imperative: A Framework for Ethical Data Use by Susan Etlinger with Jessica Groopman. I enjoyed this reading a lot in comparison to some other ‘framework’ readings we have in the past as it seemed to take on a large reality that has only been discussed vaguely in class: governments are slow. The framework offers alternate solutions and core questions. Contrary to the appeal of most things we read in class, the framework seemed to be aimed at CEOs and the average reader by giving them consumer feedback that will affect profits down the line. This change of pace made for a really interesting framework that posed, to me, real questions with aims at getting real solutions.

    The usage of microworkers as described in Underpaid Workers are being Forced to Train Biased AI on Mechanical Turk by Aliide Naylor was not as new to me as the recaptcha or framework articles, but I still found it extremely intriguing. The use of these workers seems to violate a lot of labor laws. Additionally, the incentivization of these workers to give the ‘desired’ or ‘popular’ solutions is prompting dangerous groupthink in the training and censoring of social media and complex algorithms. Again, this is another reading from this class that makes you cringe and truly feel uncomfortable about all that is being sacrificed for “technological advancement.”

    Reply
  4. The reading by Susan Etlinger and Jessica Groopman reveals a deep concern for the ethical implications of data use in contemporary business practices. As a computer scientist, two key points stand out to me. Firstly, the Increasing Ambiguity of Privacy: The report underscores how privacy in the digital age is no longer binary but is increasingly contextual and fluid. This resonates with the computer science community’s ongoing struggle to define and enforce privacy in an era where data is omnipresent and often passively collected.
    Secondly, The connection between data use and consumer trust is critical. From a computer science perspective, this highlights the need for transparent algorithms and systems that respect user privacy and consent. There’s a growing need for systems that not only comply with legal standards but also align with ethical considerations.

    Reply
  5. Today we explored how AI is being trained unethically by large numbers of people, who are either underpaid or are being forced to participate without any knowledge of what they are really doing. When I read what Captcha was really for, I was shocked and slightly angry, for I never agreed to help train an AI, and yet in order for me to use the sites that require it, I had to fill out Captchas. Although it is helpful for transcription and making things like automated cars safer, it would have been nice to know what my data was being used for, but from our readings in this class we know that this concept is not too far-fetched for Google. For the people who do know that they are training AI, they are getting underpaid, and there come times where they are being denied their money in general. So to avoid this, workers are forced to give answers that they may not agree with, which enforced encoded bias. There are even workers who just enforce their bias in general, like during the YouTube example where reviewers were flagging LGBTQ+ videos simply based on their own homophobia. In this class we have talked about how AI cannot replicate human intuition. But, from today’s reading we have learned that AI is actually being trained by it, albeit badly.

    Reply
  6. The arms race between Captcha technology and AI is simultaneously entertaining (as an interested technologist) and incredibly frustrating (as a user of the Internet). As the article details, in the past few years, Captcha have often become more complex, asking users to solve increasingly difficult visual recognition puzzles. The cause for this increase in complexity seems to be two-fold: first, this data is used to train better AI models employed by Google and others; second, the Captcha prompts must be more difficult to avoid solution by state-of-the-art ML agents.

    The end result of this process is a severe degradation in the user experience of websites that rely on Captcha to verify human users. As these ML models continue to progress, I predict that they will eventually overtake even the most difficult Captcha puzzles in strength. Given this, it seems necessary that a new form of Captcha is developed to test whether a user is a real human without relying on image recognition. I would be interested in following research in this domain, such as if it’s possible to use cryptography or even blockchain technology to uniquely identify a user as human.

    Reply
  7. In our reading, Captcha if you can, the main part of this reading that caught my attention was the idea of how people were able to use AI to pass another AI test, where they were testing how well can another machine pass the test of whether you are human or not. It showed that it is done very easily as they are most likely getting the same data imputed to their collection of data. At the conclusion of this article, we see that the author is a bit concerned about how machines could “increasingly difficult to separate us humans from machines.” This creates a concern to us about how we should be worried about not only people trusting in AI more and more but also being worried about how we could be replaced by AI and kick us out of our work. I feel like AI does have its benefits but it is also concerning of how far we are trying to create AI to be. This leads us to our next reading, Being Forced to Train Biased AI on Mechanical Turk, where we see the process of AI and machine learning and how people are being forced to work with the chance of not being paid. This is ridiculous as they are just overworking people for the benefit of AI. It is also interesting with people working only in Global North, which would most definitely create bias, just because they have the resources to improve AI. Overall, I find these readings to be quite interesting in the sense that the interest for AI is significantly higher than the people and the strive to improve technology and not society.

    Reply
  8. I feel like I have heard over and over how the captchas are being used to enhance AI and read books, but it is interesting that the internet has used millions of people’s time and energy to accomplish this without compensation. I think it is partially a byproduct of the free business model of many internet companies, but the exploitation of consumers that large tech companies explore seems unethical. Just like how the people working for Clickworker are paid little to nothing because their work is viewed as convenient task oriented work and often results in wages much less than official minimum wage, companies use websites’ need to filter out bots to generate human curated answers without proper compensation for their work. On the one hand, I think it is amazing that so many works of literature have been digitized, but I imagine that there is a definite bias towards books in English and Roman characters. With Google’s dominance over information access, I think its contribution towards language power hierarchies cannot be ignored. I think, on many levels, there needs to be oversight on the process and direction of online AI training. The data collection processes often include unpaid, underpaid, and biased work (as was mentioned in the discussion of Clickworker’s users), and the outcomes are highly lucrative and sought after without any of it given back to the people spending their time to train these machines.

    Reply
  9. I think at this point in the course, it is becoming really apparent that regardless of our individual feelings towards AI, and therefore our willingness to engage with it, our seemingly negligible behaviors are somehow training AI systems and contributing to their advancement. The captcha article was shocking to read. It seems that there should be more provisions around whether or not users consent to having their answers to captchas used to benefit the parent companies. I had no idea about any of this, and I assume most people don’t. I think the lack of transparency feels very sinister, even if it’s for a “menial” task like transcribing books.

    The Vice article illuminates this universal push to categorize things to then be easily consumed and used my AI models. Art, and most creative expression is subjective, and most of our personal opinions on things are informed by our lived experiences and identities. So what does it mean when (underpaid) workers are making almost arbitrary decisions as to what certain figures mean? I feel like it as just as important to recognize when something is incomprehensible to the human mind as it is to recognize when it is meaningful.

    Reply
  10. The VICE article was not only an example of labor exploitation, but also another example of bias in AI. How is data collection accurate if workers’ salaries are being held over their heads—if they’re being incentivized to input one answer even if they don’t agree with it because the implications of choosing the non-conforming answer may cost them their job? If people are choosing an answer that conforms to the majority, model will just be fed more data supposedly affirming that opposing answers aren’t as “right,” when this really isn’t the case. If this just happens continuously, models will just progressively become more biased toward whatever organization they’re working for, affecting the content people can engage with (for example, flagging content as inappropriate when that’s not necessarily the case), which may ultimately affect their thoughts and actions.

    With Captcha, there’s a lack of transparency as to where our data goes; though it’s quite clear Captcha is used to differentiate between humans and machines, information about where your input goes isn’t. And even if people consent to the prior, they may not consent to contributing toward training a model. Though transcribing books seems harmless, the fact that users of the internet are unaware their data has been used for that is quite scary since this could’ve happened developing a model for something considered more harmful.

    Reply
  11. The first two articles were both enlightening and disturbing to read. Part of me was really amazed at how Google was able to see the potential of the Captcha technology early on and use the data it collected to train its own machine learning algorithms, but the other part was also concerned about the lack of transparency in terms of data usage and the unethical practices carried out by some other companies.

    Just a few days ago, I was writing about the lack of human instincts in artificial intelligence, hence its inability to replace humans in intuitive tasks. I am prompted by Captcha to complete these small image recognition tests almost everyday, and I was never made aware of its usage besides that it is used to detect and possibly override other machine learning robots. Although it is unethical not to let Internet users know about how their responses are being collected and used for, this does seem like the best and most reliable way to train machine learning and AI algorithms to be more instinctive and flexible. Since the data is most likely representative and users are also incentivized to be accurate with their answers, the algorithm is guaranteed to be less biased and more precise. To me, the most concerning part about Captcha technology is not knowing which algorithms are being trained using the data and what those algorithms will end up being used for.

    The second reading about the exploitation of microworkers is deeply disturbing, not only because they are underpaid to perform these time-consuming tasks that are likely to bring astronomical profits to the companies, but also because there is no form of legal protection against them being irrevocably rejected and denied compensation for their work. The lack of such a system and employment regulations speaks volumes about the ethics and exploitativeness of the companies carrying out these practices. Additionally, some sources of bias explained in the reading were more comprehensible than others. For example, I’m still confused as to why they will risk being fired for deviating from the majority and providing original, non-conforming answers. Is that how the algorithms in the companies are programmed, or are companies financially motivated to do so to microworkers because they need their algorithms to be verified as accurate? On the other hand, the fact that these companies hire such an unrepresentative pool of workers, and those workers who participate in these tasks are subject to their subjective prejudices based on their identities, beliefs or backgrounds, is a breeding ground for algorithmic biases.

    Reply
  12. So, I already knew about the purpose of CAPTCHAs and just kinda assumed others would have been curious about them and Googled, since they just pop up everywhere. To me, they seem like a necessary evil. We get all this free stuff, services that literally would seem divine and magical to even people from 50 years ago, and we complain about having to click a few boxes or see a popup add in the corner of the screen. It also seems to me that the internet would fundamentally be a more unequal place if every site was locked behind a paywall. I’d definitely not pro-data collection, and I think an important part of the puzzle missing here is government regulation enforcing honesty and transparency in terms of the data actually collected, but I just want to acknowledge the cost of the things that we often take for granted. Now whether or not we’re being fairly compensated for the work that we do on CAPTCHAs and the value of our eyes scanning ads is definitely a more complicated question. I’d guess not, given that the world’s richest people mostly got up there from exploiting that gap, so definitely something to consider.

    Reply
  13. In the first article, James O’Malley describes the Google algorithm to identify the picture. Humans have been involved in the algorithm test by proving that we are the human. Underpaid Workers are being forced to train biased AI on mechanical turk by Aliide Naylor talks about how workers have to work underpaid to train the AI. This makes me rethink about whether AI is necessary for daily life. On one hand, AI helps society in a more convenient way, on the other hand, it creates underpaid workers, and more people struggle with unemployment since AI took over their jobs. Is it really the case why we want AI?
    In The Trust Imperative: A Framework for Ethical Data Use, the author discusses how the trust crisis happens in companies with AI technologies. Trust in data usage and big data has become a large problem in tech companies. The down side of the technologies may not be the reasons for us to stop it, however, to find a balance between AI and human society.

    Reply
  14. Once again, we encounter more articles that talk about the flaws of AI and how biases are present in these models. In the article Underpaid Workers are being forced to train biased AI on Mechanical Turk, author Aliide Naylor discusses how Amazon uses workers to train their AI. However, this work turns out to be extremely underpaid, and the labor conditions they face are pretty bad. In addition, for workers to keep their jobs, they are forced to contribute in the same way the majority of workers are contributing. This means that even if they evidence bias, but their opinion doesn’t match the majority of workers’ opinion, they are at risk of losing their jobs. Also, the workers might not get paid if their work is rejected by their clients. This means that if they do not do exactly what the company and clients want, their work will not be taken into account. On the other hand, in the article Captcha if you can: how you’ve been training AI for years without realizing it, James O’Malley talks about how the captcha system implemented by Google first served as a way to transcribe books and then evolved into a way of training Google’s AI models. Although this approach seems way better than the one presented in the previous reading, concerns about consent and biases arise when reflecting upon the reality that millions of users are contributing to building something they might not want to build.”

    Reply
  15. The readings for today, especially the first two, focus mainly on exploitative practices regarding AI training. I did happen to be aware of what ReCAPTCHA images are used for (identification models), so that particular model of exploitative practice was not news to me, but it is very frustrating that participating in unpaid model training is now just a cost to using the internet, even for things where verifying humanity is maybe not necessary but an opportunity is nonetheless there to make money and benefit the ever growing body of AI. Regarding the second, I was aware of mechanical turk, but not super conscious of everything that goes into it. For one, I did not expect to find that a bulk of mechanical turk labor comes from the global north, as that is not typically where global economies find the greatest concentration of unpaid labor to exploit, but the connection to the training then continuing to reinforce the perspectives of the global north makes a lot of sense with this understanding. Second, I don’t know why it surprised me that workers were pressured to fall in line, classification (which is what it seemed like they were doing a lot of) of course relies on coming to consensus, but it is nonetheless deeply troubling that to do this, they risk losing their employment and liveliehoods if they don’t conform to a “consensus” that harbors institutional bias, especially because that virtually guarantees future systems will follow in this perpetuation. A lot of the AI landscape, from my perspective, is deeply troubling, not just this, and it makes a lot of sense that labor practices would be one of the major places for it to fail on an ethical level, but every new thing I learn causes a further loss of hope, which is not the most effective headspace to be in for eventual making change here either, and I’m left wondering how to best capitalize on these feelings of societal shame to make a better future, and clinging to what little hope remains that it is not too late.

    Reply
  16. It’s difficult, especially with our understandings of how AIs and tech companies operate, to not feel like a guinea pig. If I’m connected to the internet, my data is being absorbed and used to further tweak ad recommendations or train an experimental LLM. The Vice article was particularly disturbing to me, as these workers are being exploited for their human-ness, only to train models that are going to be deeply biased and problematic anyways. They’re paid minimum wage for countless hours of menial labor, in which a response that is contrary to what is wanted can cost them their jobs. On a much larger, albeit depressing scale, every internet user is also helping train AI through Captcha, which is not well-known public knowledge. (Side note, the term “microworker” feels incredibly demeaning and I don’t like it!)

    These readings further confirm for me that large tech companies have become so massive that they do not ever feel the need to be transparent to the consumer, and use them as test subjects. Google doesn’t feel the need to explain to users how their data is being used to train AI, just as Apple and Fitbit didn’t feel the need to give a warning to their smartwatch users that their PPG tech is very inconsistent for people of color. I hope companies implement ethical data usage as suggested in the Etlinger reading, but I’m unsure if/when it could happen.

    Reply
  17. First of all, I knew that captchas were being used to train AIs to recognize images, but I did not know about how they were used to transcribe Google books. That’s actually super cool and interesting. It’s Google, so I wouldn’t be surprised if there were some insidious unethical or racist side effect (oh, maybe something about dialect/language bias…) but it honestly seems like a really ingenious idea that apparently had the desired outcome.

    My reaction to the second reading went something like this:
    “Oh, you can make money doing that? That’s kinda cool, might have to remember that…

    Ah, right. That makes sense.

    Oh…

    Oh, I see.

    Oh… oh no.

    OH.”

    It’s frustrating how corporations justify reaping enormous profits from workers who get insultingly low compensation by calling it “unskilled work” … or well, in this case, “microwork”… though maybe that term refers to some other aspect of the work, I’m not entirely sure. In any case, everything else the reading described suggests that it’s barely treated like real labour at all. It has been de-legitimized in every aspect. Despite how much money it generates.

    Reply
  18. The article ‘Underpaid Workers are being forced to train biased AI on mechanical turk’ by Aliide Naylor highlights a worrying trend in technology mediated labor structures. Beyond the narrative of the danger of AI and other such smart technologies to replace jobs in the future, it shows us that these technologies change jobs now. As is discussed in books such as ‘Worn Out’ by Madison van Oort and ‘Uberland: How Algorithms Are Rewriting The Rules Of Work’, we see that the employment of such technologies in the management of workers has led to the rise of the gig economy, increasing technology mediated job insecurity, and the employment of refractive surveillance (where in data collected about customers, for example, are used to determine the futures of workers). Furthermore, the jobs that are created by these new technologies seem to themselves be super precarious (and harmful) or not paid at all (as in the case of recaptcha). Increasingly, we see that these new jobs and the jobs modified by the technology treat workers like machines, and there are obviously systematic differences on who is left most precarious. Finally, while it is all well and good to talk about the ethics of AI systems and how we can make them better, I think we would be remiss to forget the real current material ramification the development and deployment of such technologies have for people and the environment.

    Reply
  19. Happy Thanksgiving! Posting late. I was struck most by the captcha reading. It doesn’t surprise me that captcha is slowly becoming more and more irrelevant as AI advances further and further. Our role in the training of these AIs is surprising to me, however. I don’t really have any awareness of my role in training things from day to day, I just know that I am in fact doing it. I kind of wonder how we’ll get out of this one, so to speak. Like what test won’t eventually be able to be replicated by AI. Even if we were to have a finger prick test to determine if one is a biological human being, I could see a world in which AI finds a way to replicate the data produced by whatever tech we use to read the blood. A never ending arms race I suppose. Maybe eventually we won’t be as concerned about it and we’ll just have to emphasize cyber security to the point that AI or not only the people that we want getting access to things are the ones getting access.

    Reply
  20. The article on underpaid workers contributing to AI training highlights a troubling reality where many individuals, predominantly in politically and economically constrained countries, find themselves reliant on this work. Aliide Naylor emphasizes the pressure on workers to conform to prevailing labeling norms, risking rejection by clients and subsequent non-payment. This exploitation capitalizes on the vulnerability of those in low socioeconomic groups, potentially introducing biases into the labeled information.

    Even without economic coercion, workers face pressures to align with prevailing biases. In politically restricted countries, limitations on speech may further reinforce existing biases, impacting algorithmic outcomes. This becomes problematic as algorithms trained on biased data perpetuate and potentially amplify societal prejudices.

    Microemployment emerges as an unregulated avenue for worker exploitation, resembling machine-like behavior devoid of emotion, strictly adhering to instructions. Microworkers, tasked with training AI, may inadvertently introduce company biases into AI products. Without safeguards, this environment becomes a breeding ground for unchecked bias, paving the way for tech companies to influence product narratives. The process, though promising for AI advancements, lacks diligence, and the outcomes are likely to mirror the limitations of the input.

    Reply
  21. I have read about the poor conditions for workers performing microtasks and data annotations in support of AI systems before, but I was mostly learning about some of the dangers faced by these workers, for example when they are labelling dangerous or offensive content, or acting as moderators. I did not realize that the systems were set up in such a way that simple majority would determine the correctness of even subjective answers. It is clear that higher wages and a more in depth process of annotation could mitigate some of the normalizing effects of the process, but because profit is the bottom line, there is no way these large companies will choose an option that cuts into that. When it comes to the captcha article, I find the exchange of a short transcription task for use of a website or service to be somewhat logical, and like overall beneficial to humanity in general. It is not lost on me that when the task feels beneficial to “the greater good” I feel more comfortable with this kind of data collection. Intent is very important, but more than that, it should be down to the individual to decide whether they are willing to participate in data collection processes. However I am not sure if there is a good way to do that yet. Obviously, terms and conditions are typically obfuscated and assumed to be skipped by most users, so is there a way to be more transparent, but understandable for the average person?

    Reply

Leave a Comment

css.php
The views and opinions expressed on individual web pages are strictly those of their authors and are not official statements of Grinnell College. Copyright Statement.