Google has raised the stakes in the contest to develop the world’s fastest and most efficient artificial-intelligence chips.
The Alphabet unit has developed a new computer processor that is customized for the type of computing operations involved in querying, rather than training, AI models. Demand for that type of computing, known as inference, is exploding as businesses embrace AI agents capable of writing software and performing other tasks.
The company plans to unveil its eighth generation of Tensor Processing Units, or TPUs, this week at a company event in Las Vegas. Google has been developing an inference-specific chip for several years, people familiar with the matter said, and in recent months enlisted a small group of AI companies to test the chip’s capabilities.
Google is also introducing a separate chip customized for training, the process by which AI models are fed data and taught how to respond to queries.
“If you don’t have inference, you cannot cover the cost of your training,” said Thomas Kurian, chief executive of Google’s Cloud unit. “So eventually inference is going to be at least as big, if not bigger, than the training market.”
The development sharpens the rivalry between Google’s custom semiconductor business and market leader Nvidia as growing use of agentic AI brings a surge in demand for faster, more energy-efficient chips. The chips giant last month unveiled its own inference solution, a server composed of its signature graphics processing units, or GPUs, paired with chips from the startup Groq, whose technology Nvidia licensed as part of a $20 billion deal last year.
Another chip startup focused on inference, Cerebras, recently struck a major deal with Amazon Web Services to offer AI companies faster computing to run their models. Last week, Cerebras filed to go public with plans to launch an IPO as soon as mid-May.
Nvidia became the world’s largest publicly-listed company by selling millions of GPUs to major customers including OpenAI, Microsoft and Oracle. While fast and powerful, GPUs, which can perform billions of simple tasks simultaneously, are most useful for training.
Inference requires less raw computing power than GPUs typically provide but more memory. Without enough of it, customers running lots of queries or agents can hit the “memory wall”—the bottleneck caused when a chip can’t access data quickly enough, forcing users to wait longer for models to respond.
“What the customers really care about is how can you drive down the latency,” said Mark Lohmeyer, vice president of AI and computing infrastructure for Google Cloud, referring to the time it takes for a model to produce a response.
A single request to an AI agent generates between 20 and 50 times the “inference transactions” as a query to a chatbot, Lohmeyer said, because the agent will undertake numerous actions on its own rather than simply generating a response.
Google has been designing its in-house, custom chips for over a decade. The TPU was initially used in data-center cloud servers, but as AI became more widespread, the chips were used to train and run Google’s generative AI models, including its Gemini chatbots and the Nano Banana image-generator.
Each new generation of TPU has been designed by Google in partnership with Broadcom, which specializes in custom chips. Last year, Google began selling its seventh generation of TPUs, known as Ironwood.
Last week, Google and Broadcom announced an expanded partnership to design more AI chips for Anthropic, but the companies did not say whether those chips would be used for training, for inference, or for both.
Industry observers have for years wondered whether Google will commercialize its TPUs for widespread use. It has signed two major deals with outsiders: one to give Claude-maker Anthropic access to about 1 million of its chips, and another with Meta Platforms.
The company late last year promoted Amin Vahdat, one of its top engineers, from the position of vice president of AI infrastructure inside the Cloud unit to a new position, chief technologist of AI infrastructure, reporting directly to Alphabet CEO Sundar Pichai. In his new role, Vahdat oversees the TPU business as well as the development of new generations of Gemini and other AI models.
Vahdat said the company is fielding interest from customers that may want to use its TPUs outside of the Cloud platform.
“As of late, we have been seeing more and more demand for having the compute closer and closer to the workloads,” he said.
Write to Robbie Whelan at robbie.whelan@wsj.com and Katherine Blunt at katherine.blunt@wsj.com
