Novel ideas to cool data centers: Liquid in pipes or a dunking bath
Summary
Nvidia is trying a novel method of keeping its cutting-edge servers cool and limiting the use of power-hungry air conditioning.One of the latest innovations at artificial-intelligence chip maker Nvidia has nothing to do with bits and bytes. It involves liquid.
Nvidia’s coming GB200 server racks, which contain its next-generation Blackwell chips, will mainly be cooled with liquid circulated in tubes snaking through the hardware rather than by air. An Nvidia spokesman said the company was also working with suppliers on additional cooling technologies, including dunking entire drawer-sized computers in a nonconductive liquid that absorbs and dissipates heat.
Cooling is suddenly a hot business as engineers try to tame one of the world’s biggest electricity hogs. Global data centers—the big computer farms that handle AI calculations—are expected to gobble up 8% of total U.S. power demand by 2030, compared with about 3% currently, according to Goldman Sachs research.
The Nvidia GB200 series is likely to be sought-after as technology companies race to deploy AI in content creation, autonomous driving and more.
Nvidia’s stock took a hit early this month as investors reacted to potential delays in its Blackwell-powered products. Although the company said it was on track to ramp up production in the second half of this year, Chief Executive Charles Liang of Super Micro Computer, which makes server racks with Nvidia chips, said the timeline had been “pushed out a little bit." Liang said he anticipated significant volumes would be ready in the first quarter of next year.
Data centers, housing as many as tens of thousands of servers, tend to be cacophonous and chilly places. At older facilities that use fans and air conditioning, cooling accounts for up to 40% of power consumption, a proportion that could be reduced to 10% or less with more advanced technology, according to Shaolei Ren, associate professor of electrical and computer engineering at the University of California, Riverside.
Liquid cooling has become a common feature of high-end gaming computers, but on a larger scale has traditionally been limited to the hardest challenges, such as nuclear power plants. The upfront cost of circulating liquid through delicate electronics can be many times the cost of installing AC and fans. Some parts are in short supply.
Leakage is the biggest risk.
“If a single drop of water falls onto a server, such as the million-dollar GB200, it could cause catastrophic damage," said Oliver Lien, general manager of Forcecon Technology, which works with semiconductor makers on cooling.
More than 95% of current data centers use air cooling because of its mature design and reliability, according to a recent Morgan Stanley report.
Super Micro Computer, commonly known as Supermicro, will use liquid cooling in about 30% of the racks it ships next year, said Liang. In June and July, the company delivered more than 1,000 liquid-cooled AI racks, representing more than 15% of new global data-center deployments, he said.
Nvidia both makes its own servers and supplies chips to other server makers that build devices for tech giants working on AI applications. Decisions on cooling tend to be made jointly by those companies.
Manufacturing challenge
Taiwan-based contract manufacturer Foxconn is taking a leading role in manufacturing the Nvidia GB200 series in Taiwan and Mexico, according to people involved in the plans.
The sensitivity of the cooling issue was highlighted in late July when shares in Foxconn and two suppliers of cooling components fell more than 5% following social-media posts suggesting the GB200’s cooling system had leaks.
People familiar with the production said suppliers were working through normal issues that arise in preproduction testing. They said the cooling system issues weren’t likely to significantly affect the GB200’s shipping schedule. Shares of Foxconn and the suppliers quickly recovered. Nvidia declined to comment and Foxconn didn’t respond to a request for comment.
Supermicro said its liquid cooling systems enabled data centers to reduce power consumption by 30% to 40%. Nvidia has said liquid-cooled data centers can pack twice as much computing power into the same space because the air-cooled chips require more room in a server.
If only air cooling is used, high-performance computers require server-room temperatures below 50 degrees Fahrenheit, said Lien of Forcecon Technology. Aside from the heavy electricity use, the fans produce dust that can hinder performance and 24-hour whirring that can annoy the neighbors.
“Liquid cooling is definitely inevitable for higher-end AI applications from firms such as Nvidia, AMD or Google," Lien said. The liquid-cooled machines whisper instead of whirring and kick up virtually no dust.
“You would feel a slight tremor if you place your hand on the machines, a gentle reassurance that they are diligently at work," Lien said.
The liquid cooling systems for Nvidia’s GB200 high-end rack cost more than $80,000, about 15 to 20 times the cost of an air-cooling system for an existing rack with Nvidia’s H100 chips, according to Morgan Stanley estimates. It projected the market for those systems will more than double to $4.8 billion by 2027.
Growing pains
In the systems, pumps circulate coolant to microchannels inside cold plates on top of the chips and withdraw the liquid when it heats up.
One sign of growing pains in the industry is a shortage of specialized parts. Supermicro said it had to delay shipments representing some $800 million in revenue because of component shortages primarily related to liquid cooling.
Executives said one part in tight supply is called universal quick disconnect, an item that prevents leaks when parts of the piping system are disconnected. That part is mostly made by American and European companies, but more than half of the global cooling system business is concentrated in Taiwan-based companies, according to Edward Kung, who leads Intel’s liquid cooling projects and is chairman of the Taiwan Thermal Management Association.
The Taiwanese companies are benefiting from their experience cooling gaming computers, much as Nvidia started as a maker of chips for games and moved into AI.
Many in the business think the next step could be total immersion in heat-absorbing fluid, although the technology faces skepticism because the fluid and custom tanks are costly and maintenance is messier.
Taiwanese companies including Cooler Master, a longtime Nvidia collaborator known to videogame enthusiasts for its high-end computer cooling hardware, are working on immersion technology for potential future Nvidia products, people familiar with those products said.
Last year, Nvidia Chief Executive Jensen Huang stopped by a trade-show display in which Taiwan’s Gigabyte Technology showed off its immersion cooling tank.
“Good job," Huang told people at the display. “This is the future."
Write to Yang Jie at jie.yang@wsj.com