OpenAI launches voice assistant inspired by Hollywood vision of AI

OpenAI technology chief Mira Murati with CEO Sam Altman last year. She says the new AI model can interact with people by voice in real time. (WSJ)
OpenAI technology chief Mira Murati with CEO Sam Altman last year. She says the new AI model can interact with people by voice in real time. (WSJ)

Summary

The function is part of a faster, more capable version of the company’s flagship artificial-intelligence model.

OpenAI unveiled a less expensive version of its flagship artificial-intelligence system that includes a new voice assistant to make it easier to use, as it races other tech companies to roll out products and features to attract users.

The new AI model, dubbed GPT-4o, can better digest images and video in addition to text, and can interact with people by voice in real time, said Mira Murati, OpenAI’s chief technology officer, on Monday. 

People can interrupt the new voice feature while talking to it, unlike current voice assistants, and the model is capable of responding close to instantaneously, the company said.

OpenAI executives showed in a livestreamed demonstration how the model could analyze code, translate languages between two speakers or guide users through a basic algebra problem written down on a piece of paper, all seemingly in real-time.

The launch of GPT-4o reflects how OpenAI and other startups and tech giants are increasingly seeking to expand their pools of users and bring in money for their generative artificial-intelligence technology after pouring enormous sums of money into computing power and energy to develop their systems.

The OpenAI announcement comes a day before the start of Google’s annual developer conference on Tuesday, where it is expected to announce its own new products. Google, an AI pioneer, has been vying with OpenAI and its partner and backer, Microsoft, for leadership in generative AI. Microsoft wasn’t involved in making GPT-4o.

Chief Executive Sam Altman likened the new product to the kind of AI tools typically seen in movies. In a speech last year, he said he and other OpenAI executives found inspiration in the 2013 film “Her" about a man who falls in love with a voice assistant. He and other OpenAI employees posted references to the movie on X just after Monday’s announcement.

“The original ChatGPT showed a hint of what was possible with language interfaces; this new thing feels viscerally different," Altman wrote Monday on his personal blog.

The new model could also detect a person’s emotions in their tone of voice or facial expression, OpenAI said. It also more rapidly switches between different emotional tones from a dramatic voice to a robotic tone to singing. This feature will roll out to users who pay for ChatGPT-Plus, the company’s $20-a-month feature in coming weeks.

GPT-4o will also be offered to companies. Murati said the model would be twice as fast and half the cost of its current top-of-the-line offering GPT-4 Turbo. The “o" in GPT-4o stands for “omni," the company said. People who use the free version of ChatGPT will have access to GPT-4o’s image and vision features starting Monday.

OpenAI already offers a feature called “voice mode" that combines three separate models to respond to users in voice, but it could be confused by multiple speakers or background noise. It was also slow. The “chains of models" approach “doesn’t cut it when you’re trying to deliver at this speed," said Mark Chen, head of frontiers research at OpenAI, in an interview.

By contrast, GPT-4o was built as a single model trained on text, vision and audio material, and can respond more quickly and accurately to cues.

OpenAI executives declined to describe what kind of data was used to train this model. They also declined to say whether OpenAI was able to train the model with less computational power. OpenAI is also working on a whole-new AI model, called GPT-5, that is expected to make a large jump from current technology.

Murati said on Monday that the OpenAI team wasn’t inspired so much by the movie “Her" as by human conversation. “When you stop talking, I’m about to jump in. I can kind of read your tone and respond to that. And it’s really natural, rich and interactive," she said.

Catch all the Technology News and Updates on Live Mint. Download The Mint News App to get Daily Market Updates & Live Business News.
more

MINT SPECIALS