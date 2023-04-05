Google claims its custom-built AI supercomputer outperforms Nvidia's systems2 min read . 06:12 AM IST
- In a scientific paper released on Tuesday, Google revealed how it has connected over 4,000 TPUs to create a supercomputer.
On Tuesday, Google released new information about the supercomputers it uses to train its AI models, stating that they are faster and more power-efficient than similar systems from Nvidia corporation.
Google has developed a custom chip called the Tensor Processing Unit (TPU) for over 90% of its AI training work. AI training involves feeding data through models to make them effective at tasks such as generating images or responding to queries with human-like text.
Google (TPU) has now advanced to its fourth generation. In a scientific paper released on Tuesday, Google revealed how it has connected over 4,000 TPUs to create a supercomputer. The company has developed custom optical switches to help connect individual machines.
Improving these connections has become a key point of competition among companies that build AI supercomputers because so-called large language models that power technologies like Google's Bard or OpenAI's ChatGPT have exploded in size, meaning they are far too large to store on a single chip.
Google's largest publicly disclosed language model to date, called PaLM, was trained by splitting it across two of the company's 4,000-chip supercomputers over a period of 50 days. The company has developed a system that allows for easy reconfiguration of connections between chips on the fly, which can help to avoid problems and improve performance.
"Circuit switching makes it easy to route around failed components," Google Fellow Norm Jouppi and Google Distinguished Engineer David Patterson wrote in a blog post about the system. "This flexibility even allows us to change the topology of the supercomputer interconnect to accelerate the performance of an ML (machine learning) model."
While Google is only now releasing details about its supercomputer, it has been online inside the company since 2020 in a data center in Mayes County, Oklahoma. Google said that startup Midjourney used the system to train its model, which generates fresh images after being fed a few words of text.
In the paper, Google said that for comparably sized systems, its supercomputer is up to 1.7 times faster and 1.9 times more power-efficient than a system based on Nvidia's A100 chip on the market at the same time as the fourth-generation TPU.
Google has stated that it did not make a direct comparison between its fourth-generation Tensor Processing Unit (TPU) and Nvidia's current flagship H100 chip, as the H100 came to market after Google's chip and is made using newer technology.
Google hinted that it might be working on a new TPU that would compete with the Nvidia H100 but provided no details, with Jouppi telling Reuters that Google has “a healthy pipeline of future chips."
