Talk to Me: how voice computing will transform the way we live, work, and think, by James Vlahos. Chapter 1 (https://grinnell.ares.atlas-sys.com/areslms/ares.dll?Action=10&Type=10&Value=40175)
Talk to Me: how voice computing will transform the way we live, work, and think, by James Vlahos. Chapter 9 (https://grinnell.ares.atlas-sys.com/areslms/ares.dll?Action=10&Type=10&Value=40176)
Your Computer is on Fire, chapter 8: Siri Disciplines (https://direct.mit.edu/books/edited-volume/5044/Your-Computer-Is-on-Fire)
In Your Computer is on Fire Chapter 8 by Halcyon M. Lawrence, the author analyzes how technologies have prioritized certain accents of English in the creation of voice technology tools, like Amazon’s Alexa and Apple’s Siri. Technology companies have not focused on emerging non-standard accents of English used by groups of people learning English. At the same time, these companies have not prioritized non-standard accents of English that have been long-standing, like Scottish, Irish, and Caribbean accents. As Lawrence explains, these groups have been and continue to be discriminated against, partially due to their accents.
These voice recognition technologies are typically written for and by native English speakers, resulting in these biases. However, I believe this is to be a part of a much larger trend in software development. Software development has been dominated by native English speakers, where programming languages often imitate English grammar and comments tend to be written in English. In some instances, having knowledge of English can suffice when understanding snippets of code. Particularly in Python, code can resemble written English more than a programming language. For example, we have the following valid Python code:
if x is y:
return True
elif x is not z:
return 39
else:
return False
When already familiar with English, a programmer can already be familiar with the syntax and keywords of many languages, making it easier for them to develop technology. This lowers the barrier to entry for English speakers in software development, entrenching differences between English speakers and non-English speakers.
A question I had when reading “Your Computer is on Fire” that I had never considered is how do CS students in countries like China and Russia learn the languages that they code in? Are languages like Java and C adapted to have their libraries and syntax in a language other than English. I was surprised that I had never considered this, given that the languages I have learned are typically convertible to comprehensible forms without too much effort. But for a non-native speaker, it may be more difficult to associate the results of code that say, sorts a list, because the coding language vocabulary must be translated an additional time: first from source to English, and then from English to their native language. It is certainly a more nuanced consideration that I’m sure many people have not acknowledged, since it is so easy to take for granted the ease in which we speak a global language. Reading works such as this are a reminder that native English speakers—especially those of dominant dialects and accents—can have an advantage when it comes to “universal” technology. If I hadn’t thought about the issues related to language until reading this article, I would assume that many developers often have the same shortsightedness.
The chapters analyze the impact and implications of voice computing technologies and generative AI systems like ChatGPT. They discuss the historical context, technological advancements, and social challenges associated with these technologies.
The discussion starts with the significance of early chatbots like Eliza, which showed the potential for AI to elicit emotional responses and challenge our perceptions of machines. This led to the development of more sophisticated conversational agents like ChatGPT.
However, these advancements raise concerns about the deskillment of humans and the ethical implications of generative AI. The ability of AI to fabricate false information and change answers over time poses challenges in discerning truth and maintaining trust. The article emphasizes the need for responsible human-machine interaction, comprehensive challenges to AI systems, and moral leadership to mitigate risks.
Voice computing technologies in healthcare and elder care settings have potential in assisting individuals with impaired mobility, cognition, or sight. However, there are privacy concerns and ethical considerations surrounding their use in sensitive healthcare contexts.
The readings also highlight social biases and limitations of voice technologies, emphasizing the need for inclusivity and linguistic diversity in design. Accent bias and discrimination against nonstandard and foreign-accented speech demonstrate inherent biases in speech technology.
To address these challenges, the articles call for new safeguards, ethics, and educational approaches. Preserving human judgment, cultivating moral leadership, and equipping individuals with AI skills are crucial. This will help harness the transformative potential of these technologies while ensuring well-being and inclusivity.
In conclusion, the articles explore the philosophical, societal, and ethical dimensions of voice computing technologies and generative AI systems. They emphasize the need for responsible development, comprehensive understanding, and proactive adaptation to ensure that these technologies promote human advancement and avoid harm or inequality.
As AI largely involved in society, there are lots of concerns coming up. The huge engagement of AI in the determination of social issues might lead to a huge discrimination problem. As data is processed mainly through machine learning, the computer might learn from the previous samples in the database and perform even worse. Voice computing is the trend so that people are able to communicate with AI through talk. However, in order to create a user-friendly interface, they would make assumptions about a person’s gender, race, and intelligence by hearing the voice only. This may lead to the bias of the machine as they might have a certain score system to treat different people differently with their protected characteristics. The information that different people get would be different due to their voice characters.
The discussion on voice was interesting, it took me a second to really see the importance of what the author was saying. I was lost when it started talking about losing the device because I don’t see how voice technology can exist without a device. That’s not a very important point though, I think it may have just been talking about screens and visual interfaces. The point about SEO and competing to be the first option listed by the voice tech was an interesting one, I hadn’t thought about how biased I am towards the first search results, but I never leave the first page. This will only get worse, I feel like I can scan through a bunch of search results faster than I can ask a voice AI to read them to me. I don’t think visual interfaces will go away.
I loved the line from Hicks that said “any technology that reinforces or reinscribes bias is not, in fact, revolutionary but oppressive” which really makes total sense. I think there’s an association that we have with anything that’s new being revolutionary, groundbreaking, or good, none of which are probably true for most new things. I mean you can have something that’s new, but that’s just a new more efficient way of stopping change, which I guess might be groundbreaking in a sense, but is definitely not the other two. Even something like autocorrect is essentially punishing non-standard behavior. Honestly, even Googling (touched on briefly in Vlahos’ chapter 9)(and yeah, I’m using “Googling” to help break down Google’s IP strength, no need to thank me, just doing my part) is a skill and something that would be harder if you don’t already think in the words and phrases that the system assumes you do. This puts the ease of access to information behind a lock where your accent, and by extension, identity, is the key.
Pivoting to Vlahos’ work, I think that the idea of knowledge graphs really appeals to how I want to view the world: complex and interconnected but ultimately logical. I don’t think anyone can really disprove that, but I also don’t think it’s always the most useful lens. So while I think it’s cool I am trying to be honest with myself and say that maybe it’s not the coolest thing and that the data it collects is likely incomplete, biased, and filled with gaps and error that it’s going to end up just perpetuating. (ok but imagine all of humanity’s knowledge mapped out! It would be so cool to explore!)
Chapter 8 of “Your Computer is on Fire” by Halcyon M. Lawrence is an interesting read. Lawrence argues how voice technologies like Siri and Alexa are “unresponsive and frustrating” for speakers with nonstandard accents and those technologies have “left them behind”. Furthermore, the author argues that voice technologies “reinforce or reinscribe” bias and “they are not revolutionary, but oppressive”.
Lawrence argues that the number of non-native English speakers is increasing, but foreign-accented and nonstandard-accented speech is not recognized as much as native-speaking English by voice technologies. The author further argues about “accent bias”, highlighting how individuals with accents may face negative perceptions. Then the discussion is extended to voice technologies and the “accent bias” is replicated in the voice technology itself.
Being an international student, I often find using voice technologies frustrating due to their occasional failure to recognize my “non-native English”. In my view, these technologies do not necessarily “reinforce or reinscribe” bias as the author suggests. This is primarily because developing voice technology demands significant resources. Prioritizing the primary target audience, native English speakers, is a logical commercial decision. Nevertheless, it would be beneficial if technology companies started focusing more on supporting non-native English speakers.
Both readings talk about voice technology which resonates a lot with me. In my daily life, voice assistants like Siri and Alexa have become indispensable tools, I use Siri to set a timer/ an alarm or Alexa to turn on/off music for me. But while the voice increases the overall efficiency of our daily life, there remain some problems with it. one example is the accent bias. One of my friends, originally from China, often mentions her struggles with getting voice assistants to understand her accent. It’s a reminder that while technology has advanced leaps and bounds, it often remains tailored for Western, English-native accents. This inadvertent “accent bias” can be frustrating for non-native speakers or those with distinct regional accents, as it seems to suggest that the technology is not truly “universal”. Another issue is related to privacy concerns. With smart speakers constantly “listening”, there’s always a concern about what data is being recorded, stored, or potentially shared. For many, it feels like a trade-off: the convenience of voice technology in exchange for a piece of our privacy.
I really enjoyed these readings. When we previously talked about LLMs, I thought about how they would react to different languages and accents. Different dialects and accents are perceived as “less intelligent, less loyal, less competent, poor speakers of the language, and as having weak political skill.” I also remember that in our education class, we mentioned similar ideas to the teacher getting criticized for their accents. We are so attuned here to believe that our way of English and accent is so right that when creating products such as Amazon’s Alexa, we tend to forget the need to include different accents and languages, creating the idea of “accent bias”. These thoughts haven’t really come into my mind until coming into this class and so I am quite glad to be able to broaden my thoughts and consider different things that we should focus on. With these thoughts in mind, I am quite curious, how long do you think it would take to finally be inclusive in the ideas of being able to communicate to bots with different languages and accents without having to worry about not being recognized or understood. I feel like this doesn’t just apply to technology but as well as just general conversation and I do believe that this is also a problem with people where they do not choose to care or consider the thoughts of others and only want what is best and most comfortable with them.
Looking at the overarching theme for the readings for today, it is clear that the voice technologies that are being developed currently are primarily driven by monetary gain. Companies are fighting for the top search spot online, to secure a much better chance to be shown to a user of a voice tool’s screen for a search. The languages that are supported are the ones that are the most commonly used in business. What modern voice technologies prioritize is not optimized compatibility, rather they prioritize optimized compatibility with languages and people who will bring the largest benefit.
People are being forced to modify how they speak for these technologies as well, which can be frustrating even if you have what they consider a “standard” accent because voice technology relies on what it thinks is “correct” pronunciation, so even if you have a few words that you pronounce incorrectly, you will have a more difficult time. This is tedious, let alone for non-native English speakers who cannot change their accents, and therefore struggle to interact with these bots. This is a prime example of coded bias because although the intention was not to ostracize non-native English speakers, voice technology in English was coded with a certain speech pattern in mind, which leaves out billions of people.
I found both readings to be extremely interesting. Although both of them presented different perspectives on voice technology, I believe that both are worth reflecting on. In the two chapters of ‘Talk to Me,’ James Vlahos argues that voice technology represents the ultimate step technology can achieve. He suggests that when technology can emulate human communication effectively, there may be limited room for further innovation. However, he also acknowledges that traditional input methods like keyboards and mice will still have a role in the digital era, albeit a smaller one. Two intriguing points from this reading are Vlahos’s idea of immortalizing people through voice technology and the potential impacts explored in the chapter ‘Game Changers.’ While the concept of immortalizing individuals through their voices may seem like science fiction, we’ve already witnessed instances of recreating the voices of famous people for various purposes, which Vlahos alludes to in his text. The danger lies in how this technology can be misused to spread misinformation and fuel political discord. Another thought-provoking aspect is the power search engines wield over media creators and the increasing influence of AI on them. In the chapter ‘Siri Disciplines’ from the book ‘Your Computer is on Fire,’ Halcyon M. Lawrence discusses how speech technology design inherently carries biases, including those related to accents.
Black Mirror, Her, Ex-Machina, and hundreds of sci-fi television shows and movies have depicted the perils and fears of AI that we read for class today. For AI to become more human, humans are making decisions that reflect what in their mind is the ideal human voice or accent. As the readings stated, Siri began a voice-tech revolution that is currently being capitalized via AI language models. Siri, to me, was just a voice-based search bar that could barely pick up anything I said. Instead of avoiding the long load times and the frustrations of incorrect search results, I went back to manual typing. However, steadily, my own reliance on Siri has increased with the evolution of the voice-backed technology. I now use Siri more than ever. Siri not only broke down a wall for tech giants in using LLMs through voice-based tech, but it also opened the door for general populous acceptance. In turn, AI chat models are a mere evolution to something that we have already accepted.
The readings today outline several perils of these models. Technology has historically been designed by the white, western civilization. As a result, Siri and other voice-based techs use this language and understand it significantly better than other languages or accents. The neo-imperialism that develops as a result forces people to modify their voices, language, or approach to fit this design. Conforming millions of people to this specific type of language can come at the cost of identity. How we talk and various accents can offer so much familial, cultural, ethnic, and geographic history that is beyond valuable. The centralization of language thus can eliminate these sources of identities and directly say that these diverse forms of language are ‘wrong’. A quick Google search of voice-driven LLMs yields some interesting results that evidence this danger. There are merely adverts and blurbs discussing new technology and AI offered but rarely any discussion of the new dangers introduced.
It’s evident from chapter 1 and 9 of Talk to Me that voice technology is revolutionary in how we go about our daily lives as well as how many industries might function in the future. In many ways, voice assistants offer us the same benefits as generative AI technologies like ChatGPT, being able to provide us with an immediate answer to questions instead of a few hundred thousands of resources or links that might not all be relevant. Therefore, they also raise the same concerns around its over-usage. One concern specific to voice technologies that was discussed in-depth in chapter 8 of Your Computer is on Fire is accent biases and the far-reaching ethical, cultural and linguistic implications it may bear. The fact that most prototypes or early stages of these technologies were developed based on a dominant language and a “standard” accent poses many problems, creating a cultural and linguistic hierarchy where some languages and accents are superior and should be deemed the norm over others. This acts as a hindrance to non-native speakers as well as speakers with dialects or heavy accents because their commands are often misinterpreted, preventing them from utilizing these technologies to their full potential. Since the way we speak is an important aspect of our identity and bears cultural significance to where we grow up, this is a huge ethical concern – voice technologies can be non-inclusive and inadvertently perpetuate historical biases.
“Siri Disciplines” as others have said, shows the pitfalls of profit-motivated technological development. It’s great that these technologies have improved rapidly in recent years, as it has greatly enhanced the accessibility and ease of use of many products, but people at the intersection of needing these features and speaking a nonstandard English accent are left in the dust. I think the idea of using accented speech in certain contexts is an interesting one, but I think that the execution here would be key. I grew up hearing a lot of nonstandard English accents and I appreciate that now I don’t have as much difficulty understanding some accents as much as others. But there have also been times where I have been unable to understand the standard English accents in “information systems,” which only made me frustrated. Adding another layer to that could be harmful and further negative opinions on accents if implemented incorrectly. I believe that media, like movies such as the referenced “Rogue One,” there is a large and open stage for nonstandard accents to be heard more.
“Talk to Me” was also an interesting read. It points out directly what has been at the back of my mind at the last couple classes: that the jigsaw puzzle pieces of technology needed to realize the far-out concepts of androids in “Alien” and “Blade Runner” are falling into place and may be present in our lifetimes. Automated speech recognition, natural language processing, formulation of lifelike responses, and even humanlike, independent, audio animatronics. These technologies exist in separate boxes now, but it will only be a matter of time before these technological threads merge and form something completely new. Our interactions with these types of creations will be a watershed moment for human-computer interaction.
As was true the last time we read from Your Computer is On Fire, I was really drawn to the chapter assigned from there, Siri Disciplines. When I consider the ways I am privileged, speaking “non-accented” (or better put, non-regionally accented, since to speak is to have an accent of some kind) English does not often come to mind. It is a way in which it is very easy to consider myself a default and not think too critically about it further; it is almost entirely non-salient—but this is only because my experiences haven’t been shaped by the social realities of being othered by my voice. A quote from very early in the chapter stuck with me: “[W]hat is the experience of accented speakers for whom speech is the primary or singular mode of communication? This so-called “revolution” has left them behind. I In fact, Mar Hicks pushes us to consider that any technology that reinforces or reinscribes bias is not, in fact, revolutionary but oppressive.” When I think of the ways I use Siri, the tasks are pretty simple, setting timers, making calls, turning on low-battery mode, etc., all things that are pretty simple to do with not Siri, but considering the way this builds on systemic accent oppression, that takes away access to jobs, that forces assimilation sometimes to the degree of abandonment of mother tongues and associated cultures, to a whole host of negative stereotypes, the lack of broad support for non-native accents along with the selection of which few non-native accents receive support from these tools (such as Singaporean and Hinglish) signal something much more insidious. It is a sign that in an increasingly digital age, the choices being made are to uphold systems of oppression rather than take the opportunity to in seemingly small but compounding ways, level the playing field.
It also occurs to me that my initial thought that many of the day-to-day tasks I use Siri for, from an accessibility angle, may not be so easy to do for everyone. Accents typically are a regional thing, but disability can also produce “accented” speech, and many of these (mainly those that bear some relation to muscle tone and motor control) would render major access concerns with the pre-voice assistant ways to accomplishing these tasks (navigating through apps or pages of settings and either sliding switches or pressing buttons). Voice assistants could have offered a pathway to greater accessibility by allowing alternative mechanisms to accomplish these tasks, but instead, through accent bias and tools only recognizing normative speech, yet another barrier is built
I wanted to focus on the idea of voice computing as a form of coloniality that was brought up in ‘Siri Disciplines’ from ‘Your Computer Is On Fire”. Lawrence talks about voice technology as necessarily not revolutionary, in contrast with Vlahos’s conception of it. While I do agree with Vlahos in that the ease of learning and use of this particular modality (for some) is a little unprecedented in computing, I also agree with Lawrence in that most (if not all) technologies that are readily adopted by society at large are necessarily conserving unspoken societal norms and the power structures held within them (because that is where the money assuredly lies). Particularly, as Lawrence describes, Siri conserves coloniality. Coloniality refers to the way in which colonialism has mutated to continue operating today, wherein beliefs and attitudes about different areas of life are used to unwittingly (or with full knowledge) rationalize the continuing global dominance of Western countries and cultures. Voice technologies being easy to use for people with standard accents (because everyone has an accent in a colonial language), while ostensibly resulting from a concern for greatest profit, preserves colonial beliefs about who actual speaks English and who speaks it “illegitimately” and, through this, lends even greater legitimacy to these accents over the illegitimate ones. Furthermore, it further the idea of assimilating to a standard accent, a practice that many have to do when immigrating, but extends this assimilation even when not within the boundaries of a country that speaks in this standard accent. We also see that in the case that non-standard accents or dialects are included, these accents are only included in Siri as a result of an opportunity for greater profit (as with Hinglish).
It’s hard for me to be not cynical about the state of voice technology. On the one hand, I entirely understand the desire to just speak out a question, and to get a single, reliable, concise answer. When it takes more than 30 seconds to find exactly what I want after a search, it can get frustrating. Still there are a number of very important issues that were brought up in both readings. In Talk to Me, specifically the second chapter, Vlahos brought up the struggle for the top of the search bar, and how companies struggle to get heard. This has the simultaneous affect of worsening searches and creating a system where only the most successful businesses can survive. With the effects of AI generated media and the lack of regulation when it comes to poor/dangerous sources, these “one-shot answers” can often be wrong or misleading. Not to mention that many questions can’t simply be answered in one sentence from a single source. After a number of misleading or just false answers from Google snippets throughout the years, I try to not use it, and instead scroll through until I find a source I can trust more. Still, the convenience factor of that system is hard to deny.
When it comes to the use of the actual voice technologies though, I’m still cynical. Even as an American-born native English speaker, I have never found much use in voice technology. A family member won an echo mini in a raffle once, and after fiddling with it for less than a week we couldn’t find a use for it. I don’t think I’ve ever used Siri except for curiosity’s sake, to see how it’s improved after an update or to see what it does when I ask it a difficult question. If I don’t like these technologies, I can’t imagine the frustration second-language English speakers or speakers from different regions and countries have, especially if they rely on them as assistive technology. I think Lawrence is correct when she brought up how it’s not enough to simply celebrate the achievements of these technologies without acknowledging the shortcomings. If this tech can’t work consistently for most English speakers, then it’s not the achievement we make it out to be. A number of people have brought up how our perceptions of AI and voice tech in sci-fi media are becoming more of a reality, and there’s an interesting discussion to be had there. But I wonder, even if we perfected this tech so it works just like in Star Trek or Blade Runner, how much more efficient would it be than an optimized search engine?
It is very interesting to see the different attitudes towards voice AI technology between the two texts. I think in some respects this tech is very revolutionary and will certainly change many peoples live for the better as both chapters in Talk to Me describes. The ability to communicate and interact with voice assists has been very helpful for people who are unable or uncertain how to interact with modern technology. There is of course issues of privacy that are briefly discussed in chapter 1. Ultimately though, I think Talk to Me lacks a lot of the foresight and nuance that is apparent in chapter 8 of Your Computer is on Fire. I think our discussions in this class have done a good job of bringing our focus outside of the privileged benefits of this kind of technology by looking at some of the impacts towards people who are not included by these advancements. This chapter brings in a lot of that perspective of how different accents and varieties of languages are left out because of a lack of market making access more difficult. Currently there is no incentive for these companies to pursue creating voice ai’s that represents people more broadly. I have however noticed that Apple has introduced a personalized voice assistant feature where you can use your own voice to train an AI voice. I wonder how good the transcription features are from voice to text. I would also like to know how many languages Apple and other companies currently allow for voice to text capabilities.
Vlahos states that with voice technology, we will not be the ones adapting to our inventions, rather, it will be the opposite. While I think voice technology does provide more accessibility, it also fails to consider a large portion of its users—those with nonstandard accents. In order for voice technologies like Siri to understand commands, you must speak with a “standard” English accent; otherwise, its services are inaccessible. Thus, people must still adapt to the invention by changing their accent; additionally, people must also alter other aspects of their speech (for example, sentence structure) for these voice technologies to successfully respond. This further imposes the feeling of having to assimilate speech-wise, and as pointed out by the “Siri Disciplines” reading, results in alienation. I also dislike the “one answer” approach brought up in “Take to Me”; in addition to resulting in large tech companies being in control of information and what brands/products are discovered, I find it uncomfortable to be provided with a single answer via voice and not really having the option to manually (and visually) go through a multitude of sources, perspectives, or options. This would really limit the diversity and quantity of information people consume.
I honestly feel something like suspicion when I read Vlahos’s pieces. I think something about his reverance for these voice computing technologies makes him a bit difficult to take seriously. Additionally, his lack of interaction with the the various critiques presented by Lawrence’s piece in your computer is on fire, is blaringly obvious. How can you boast and marvel about these supposed possibilities, going so far as to call them “oracles” (honestly that just gives me the heebie jeebies, what is this techno-mysticism/orientalism/etc.) without any sort of sustained interaction with the questions of colonialism, anglo-centrism, etc. Overall it was interesting to read two perspectives with such differing approaches to the topic. It really highlighted the way I think companies and business-minded programmers look at technology, with an unchecked optimism do to the allure of profit
In the Talk to Me reading by James Vlahos he explains how voice technology has been integrated into society which I thought was interesting how Google doubted the possibility that a question could be answered in one shot which is pretty incredible for technology to be able to do this now. He also mentioned something I never really considered, which is that with the increase of more synthetic voices this influx could pose a problem in public settings where human voices need to be valued.
The second article just reminded me of the time when I tried to type my French paper by speaking into the microphone on google docs, but I didn’t understand anything I was trying to say. It was so frustrating I honestly think it’s broken because I’ll be pronouncing the word right but the slight pauses I had while saying a word would change the word. As opposed to when I speak in my native language, English into my Google doc it mostly understands me. So I can only imagine how hard it is for non-English speakers to reap the benefits of voice technology.
I like how Lawrence unpacks a lot of the issues with voice technology.
Vlahos writes a bit passively, “The payoff is increased efficiency. The trade-off is diminished independence.” I don’t think our independence is just diminished, I think we are quickly entering a future where technology and voice assistants immobilize our cultural and social competency. Lawrence presents a pretty comprehensive data table that proves that people with non-Western accents are perceived as less intelligent in Western contexts, which has real palpable effects on housing, employment, and courts. If voice assistants are meant to mimic human to human interactions, and most voice assistants are used for devices/appliances that serve their users, what does it mean that 1. Most of the voices are female coded and 2. Most of the voices have American or British accents. Lawrence ends the chapter with “I, for one, look forward to the day when Siri doesn’t discipline my speech but instead recognizes and responds to my Trinidadian accent.” It’s not enough to just acknowledge the diversity of human experience, we need to account for the breadth and depth in our technological practices.
I’m generally skeptical of Vlahos’s thesis that voice computing is the next frontier of technological advancement. Despite the proliferation of voice assistants like Siri, Amazon Alexa and Google Assistant, they have yet failed to seriously displace or replace other forms of computing. Moreover, their economic success has been fraught: In 2022, it was reported by ArsTechnica [0] that Amazon lost over 10 billion on Alexa by attempting to sell devices at a loss and recoup profits from digital services. But that venture has mostly been unsuccessful, since most people use Alexa to ask for the weather or play music — they don’t trust it to order a product on Amazon correctly.
So far, what voice assistant have generally demonstrated is that they can make interacting with some user interfaces easier. Instead of clicking through a menu to play music it’s easier to ask Siri (but only for some users with standard English accents, as Lawrence importantly notes). But beyond interpreting and performing pre-set simple tasks, the voice assistants have yet failed to become more generally capable assistants that can fully replace a traditional UI. It’s possible that advancement of LLMs like ChatGPT, which can more easily understand and perform unstructured and novel tasks, could eventually achieve this goal. But with the state of current technology, I don’t see how traditional computer UIs will easily be replaced.
[0]: https://arstechnica.com/gadgets/2022/11/amazon-alexa-is-a-colossal-failure-on-pace-to-lose-10-billion-this-year/
I am skeptical of James Vlahos’s assessment of voice technology in chapter 1. Mostly because I am not convinced that voice technology is fundamentally easier to use or ever will be easier to use than other non-voice modalities, like he suggests. He describes the operation of technologies like keyboards and mouses and smartphone screens as unnatural and uncomfortable owing to the way they must be learned, as opposed to voice, which we all know how to use and therefore has the potential to ‘hardly feel like an interface at all’. Thing is, plenty of humans don’t learn how to speak (speech is not a universal modality for human communication), and more and more people these days are growing up with technologies like keyboards and mouses and smartphone screens, so they can say they’ve been using those technologies their whole lives too. Human brains are incredibly plastic, and speech is incredibly complicated, and not always the most convenient or without significant challenges. People still have to learn how to interface with voice AIs, and even i8f the AIs become more human-like, people still have to learn how to interface with other humans as well. I am not convinced that voice is going to subvert other means of human-computer interactions as the dominant modality.