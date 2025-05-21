A new generation of customer-service voice bots is here, spurred by advances in artificial intelligence and a flood of cash.

Automated voice programs are being upgraded from old-school systems with little or no AI to newer speech-to-text and text-to-speech models combined with large language models.

If the technology lives up to its promise, the shift might improve the customer experience at a range of companies and reduce their costs in the process. But there are questions, too, about consumers’ comfort level with the technology and how best to keep the AI from spitting out false information.

Insurance marketplace eHealth uses AI voice agents to handle its initial screening for potential customers when its human staff can’t keep up with call volume, as well as after hours.

The company slowly became more comfortable with using AI voice agents as the underlying technology improved, said Ketan Babaria, chief digital officer at eHealth. “Suddenly, we noticed these agents become very humanlike," Babaria said. “It’s getting to a point where our customers are not able to differentiate between the two."

The transition is happening faster than many expected.

“You have AI voice agents that you can interrupt, that proactively make logical suggestions, and there’s very little or no latency in the conversation. That’s a change that I thought was going to happen a year and a half or two years from now," said Tom Coshow, an analyst at market research and information-technology consulting firm Gartner.

That, combined with more venture-capital dollars for startups building voice AI tech, is causing more businesses to roll it out in their call centers, and automate sales and appointment-making phone calls in areas like home services and healthcare. Gartner predicts that generative AI capabilities, from voice to chat, will be present in 75% of new contact centers by 2028.

Venture capital investment in voice AI startups increased from $315 million in 2022 to $2.1 billion in 2024, according to data from CB Insights.

Some leading AI models for voice applications come from AI labs like OpenAI and Anthropic, startup founders and venture capitalists say, as well as smaller players like Deepgram and Assembly AI, which have improved their speech-to-text or text-to-speech models over the past few years. For instance, OpenAI’s Whisper model is a dedicated speech-to-text model, and its GPT-4o model can interact with people by voice in real-time.

(News Corp, owner of The Wall Street Journal, has a content-licensing partnership with OpenAI.)

Many of the existing phone systems, called interactive voice response, or IVR, date back decades, are rigid and often don’t understand the intent of the human on the other line. They also lack the ability to say unscripted things that are grounded in the context of the conversation, analysts say.

The newer, AI-infused models can understand a wider range of words, said Mike Droesch, a partner at Bessemer Venture Partners who invests in voice AI technology.

Technological progress also means some models are voice-native, meaning they don’t necessarily need to spend time and energy converting speech to text, using a large language model to process it and then turning the output back into speech.

“It’s really in the last 12 to 18 months that we’ve seen AI voice agents performing as well or better than humans," said Alex Levin, co-founder and chief executive of voice AI company Regal.

Other companies like ElevenLabs and Cartesia have also dramatically improved the humanlike characteristics of their AI voices, making a human and AI voice nearly indistinguishable, startup founders and businesses say.

EHealth’s AI voice agents tell customers they are “virtual agents" at the beginning of each call, the company said, adding that it is a best practice to inform people as soon as possible.

Fertitta Entertainment, the parent company of Golden Nugget casinos and Landry’s restaurants, uses AI for its customer service calls. However, given generative AI’s tendency to produce false information, called hallucinations, the company’s voice agents are strictly controlled so they don’t veer away from their predetermined area of knowledge, said Brian Jeppesen, director of contact center operations for Fertitta.

“It’s our knowledge base that it uses, so it doesn’t go off on tangents and hallucinate," Jeppesen said.

The next step is AI voice agents that can use the phone to independently perform tasks such as making restaurant reservations, closing sales and placing orders, said Nikola Mrksic, co-founder and chief executive of voice AI company PolyAI.

But the challenge for enterprises, especially within call centers, is determining whether to stick with their existing call automation providers or switch to newer, more innovative voice AI players, said Gartner’s Coshow.

Those concerns shouldn’t stop enterprises from experimenting with AI voice agents now, especially as their rate of improvement is so rapid, he added. For now, many enterprises are focused on chat-based AI agents, and encouraging customers to resolve their own questions online rather than speaking to human or digital agents.

But there will always be a need for a human touch, especially for “high-value" interactions that shouldn’t be passed off to a bot, companies say.

“It doesn’t necessarily mean the end of the contact center," Coshow said.

Write to Belle Lin at belle.lin@wsj.com