Our gadgets finally speak human, and tech will never be the same

Source: Eliot Wyatt/WSJ
Source: Eliot Wyatt/WSJ
Summary

Generative AI is allowing devices to better hear and understand what we’re saying. As voice interfaces accelerate this year, we’ll soon wonder why we ever typed so much.

All over the world, millions of people have been doing something in private that’s now spilling into public spaces, on our sidewalks and in our open-plan offices alike.

They’re…talking to their gadgets. And not just a little—constantly.

These aren’t the old voice assistants we’ve come to resent. Billions of devices are equipped with microphones and internet connections, and more of them are getting generative artificial intelligence, making them radically better at both hearing and understanding us.

A revamped Siri, powered by Google, is coming to the iPhone. Hundreds of millions of Alexa-capable devices from Amazon already support the generative-AI Alexa+. Google is rolling out an AI model to its smart speakers and the Gemini app that understands spoken audio without first transcribing it into text.

ChatGPT, Claude and Gemini are approaching human-conversation-level frictionlessness. Then there’s OpenAI’s forthcoming hardware, designed by none other than Steve Jobs’ former collaborator Jony Ive. The unveiling is expected late this year. And you couldn’t swing a dead cat at CES without hitting an AI-powered gadget that promises to listen and interpret your every utterance.

This is shaping up to be the year that AI makes talking as powerful as tapping and swiping. The shift could be as transformative for the tech industry as the introduction of the Mac, Windows or the iPhone.

Some call this new lifestyle being “voice-pilled." Reid Hoffman, co-founder of LinkedIn, recently wrote about it: “For many everyday purposes, voice is simply faster, more natural and more flexible than typing. And what’s changed now is that state-of-the-art AI models can genuinely process what we say."

Speaking > typing

Today’s voice-transcription AIs have crossed an accuracy threshold: It’s now more convenient to dictate a message than to type it.

Leland Rechis heads voice experiments at Google’s Gemini division. He says that since Google added natural-language voice interactions to Gemini, total usage of the chatbot quintupled. And since October, Gemini has had the “native audio" model, innately understanding speech and generating responses without any cumbersome transcription. People are now having long conversations with the bot, as opposed to simply asking it quick questions, he adds.

The new Google-powered Siri will introduce the world’s billion-plus iPhone users to better AI. Google’s tech might possibly even give iPhones a power that Android users have long enjoyed: near-perfect voice transcription.

In the meantime, iPhone users can taste the future with an app called Wispr Flow. It replaces Apple’s voice transcription with a cloud-based, open-source voice transcription model that’s scary good. Imagine a voice-dictation AI that knows when to automatically insert semicolons; be still my writerly heart. It’s also great at identifying proper nouns.

I’ve also begun dictating all my emails, Slack messages and everything else using the built-in dictation features on my Lenovo Chromebook Plus. Windows and MacOS computers can also do something similar, although it’s buried in their respective Accessibility settings.

Talking = the new touch screen

If you’re driving your car and inspiration strikes, you don’t pull out a laptop and start pounding away. At least, I hope not. Talking to devices makes those moments of inspiration easier to capture.

Because of their vastly improved comprehension, chatbot-powered interfaces are far more forgiving than the old Siri or Alexa, and are better at simulating intelligence. And since they comb the web for the things they don’t “know" off hand, they really can make you smarter.

My colleague Joanna Stern regularly talks with an AI: In her car, she has conversations with ChatGPT about whatever is on her mind. Another columnist colleague, Nicole Nguyen, uses it to practice her French, allowing her to have actual conversations instead of just repeating stock phrases.

An OpenAI spokeswoman says that the company has seen a big uptick in adoption of dictation and conversation mode in the ChatGPT apps in the past year. Recently, the company directly integrated voice into the app so it’s easier to use solely with your voice.

My editor, Wilson Rothman, has taken to chatting up the Alexa+ in his kitchen about cooking times and temperatures, substitute ingredients and other on-the-fly culinary insights that he doesn’t want to grab his phone to look up.

Recently, I took Gemini on a long walk with me, during which we had a Socratic dialogue about the history of the Byzantine Empire. (Did you know Rome never really fell?)

Doing + organizing

What’s coming next is hardware dedicated to making the experience of conversing with our tech that much easier.

Mina Fahmi is chief executive and co-founder of Sandbar, a company currently testing a ring with a built-in microphone. To use it, you lift your hand to your mouth and speak softly to your AI assistant. The idea is that you can chat with it comfortably, even in public.

With products like Sandbar’s ring, conversations build on themselves, a true dialogue in which we clarify our thoughts as much to ourselves as to the machines, says Fahmi. Last year, Joanna experimented with a similar product, a wearable pin from Plaud, which allows you to record and analyze all of your meetings.

I’ve spent time talking through column ideas with ChatGPT and Gemini, then asked them to organize those thoughts in notes that I can revisit later.

While OpenAI declined to comment about the device it’s cooking up with Ive, the former Apple design boss, one of OpenAI’s leaders recently suggested that it is focused on dialogue.

Meanwhile, Meta has had surprising success with its smartglasses. These have microphones and tiny ear speakers, so that you can chat with Meta’s AI assistant when you’re busy doing other things. And Apple is said to be working on its own smartglasses as well as expanded AirPod capabilities, with much of this same interaction in mind.

But what do we lose?

One of the primary dangers of voice-based interfaces is that they become too frictionless. In a process known as “cognitive offloading," we might become less capable of doing the stuff our AI can handle. Why learn anything when the answer will always be one mumbled request away? This is a very real concern, one that I intend to revisit as AI adoption expands and its impacts become more apparent.

On the other hand, technology has already overburdened us with too many stressors and microtasks. AI promises to minimize at least some of the unwanted byproducts of progress. There’s a world in which AI might even help us push against the always-on connectivity that has made a farce of “work-life balance."

I, for one, welcome a future in which I talk to my AI assistants throughout the day, and they handle my correspondence, calendars and to-do lists, while also serving as my coach, tutor and confidant.

Write to Christopher Mims at christopher.mims@wsj.com

Catch all the Technology News and Updates on Live Mint. Download The Mint News App to get Daily Market Updates & Live Business News.
more

topics

Read Next Story footLogo