To keep its lead in AI chips, Nvidia is banking on the idea that bigger is better.
Bigger is also turning out to be more difficult.
The digital brains of the company’s newest artificial-intelligence chips, roughly the size of four Scrabble tiles arranged in a square, are about twice as big as the ones whose sales propelled an explosion in Nvidia’s business since early last year. The new chips, dubbed Blackwell, boast an even bigger increase in performance—they house 2.6 times the number of transistors—and Chief Executive Jensen Huang has said demand is red hot.
But Nvidia, when it reported strong quarterly sales and profit on Wednesday, also said challenges manufacturing the new chips were largely responsible for narrower profit margins and a $908 million provision it booked in the most recent period. Those factors pushed its highflying stock down 6.4% on Thursday.
Nvidia hasn’t detailed the nature of the issue. But analysts and industry executives say its engineering challenges stem mostly from the size of the Blackwell chips, which require a significant departure in design. Instead of one big piece of silicon, Blackwell consists of two advanced new Nvidia processors and numerous memory components joined in a single, delicate mesh of silicon, metal and plastic.
The manufacturing of each chip has to be close to perfect: Serious defects in any one part can spell disaster, and with more components involved, there is a greater chance of that happening. What’s more, the heat generated by all those pieces risks warping different materials in the package at different rates.
It is a wonky-sounding set of challenges involving microscopic circuitry that can nonetheless have a sizable impact on the bottom line. Any significant defect can render a $40,000 Blackwell chip useless and damage the overall manufacturing “yield," a critical industry measurement of the usable percentage of a chip maker’s output.
“The issue has been getting the chips to work together and the yields," said G. Dan Hutcheson, vice chair of TechInsights, an industry analysis firm. When the yields on individual parts of chips aren’t high enough, he said, “you find that everything goes south very quick."
Blackwell’s complexity
Nvidia said on Wednesday it made a change in Blackwell’s design to improve its yield. No “functional changes" to the chips were necessary, Huang said on a call with analysts.
Chief Financial Officer Colette Kress said Nvidia was on track to ramp up Blackwell production and expects it to contribute several billion dollars to revenue in the quarter ending in January.
In a report earlier this month, analysts at UBS said Nvidia’s main issue with Blackwell was the complexity of using a new way of joining chips together that is offered by Taiwan Semiconductor Manufacturing Co., the contract chip maker that produces most of Nvidia’s chips. TSMC declined to comment.
The new approach, necessary because of Blackwell’s size, came with hurdles including increased manufacturing complexity and warpage that affected reliability and performance, the analysts said. Those were the primary factors challenging the Blackwell rollout, they said, although rising production yields over time should allow Nvidia to produce the chips as planned next year.
Nvidia’s recent shift to releasing a new generation of chips every year—instead of every other year previously—has heightened the pressure to sort out manufacturing issues quickly.
The company acknowledged that dynamic in a securities filing Wednesday, saying the “increased frequency and complexity of newly introduced products could result in quality or production issues" that could raise costs or cause delays.
‘One giant chip’
Such problems aren’t unique to Nvidia, and industry insiders say they are likely to emerge more as chip makers look to add processing power by increasing chip size. Chip-design changes to iron out flaws or improve yields are also commonplace in the industry.
Complexity will increase in the future as companies look to squeeze out more performance by stacking chips on top of each other and using more silicon, said Lisa Su, CEO of Advanced Micro Devices, Nvidia’s nearest chip-making competitor.
“It’s a lot of technology to make work," she said. “Is it just going to get more complicated and bigger? Absolutely. That’s the world we live in."
There also are benefits in next-generation chips that are more energy-efficient and need less power, she said—a growing concern as AI data centers gobble up grid capacity.
Huang has used Blackwell’s size as a selling point. “It’s just one giant chip," he said at an Nvidia conference in March. “When we were told Blackwell’s ambitions were beyond the limits of physics, the engineers said, ‘So what?’"
With Hopper, Nvidia’s current-generation AI chip, Nvidia had already reached a size limit in chip-making. The most advanced lithography machines, which imprint minute circuitry on silicon, are only able to make a chip up to around 800 square millimeters, or a square with sides of about 1.1 inches.
To push the envelope with Blackwell, Nvidia opted to knit two maximum-size chips together in one chip, something that hadn’t been done in commercial graphics chips like the ones it makes.
“To do meaningful work in AI, you need a huge amount of compute, and that converts to many transistors, many more than can fit on a single chip," said Andrew Feldman, a founder of Cerebras Systems, a chip-making startup that competes with Nvidia. “The technology to get to two [chips] is difficult to develop, the technology to get to four is harder, and to get to eight, even harder."
Cerebras, whose investors include OpenAI CEO Sam Altman, has attacked the problem by developing the largest chips ever made—platters of silicon that are usually diced up into smaller chips but that Cerebras figured out how to connect and operate as one huge chip.
The company, which this week launched a cloud-computing service for AI deployments to challenge Nvidia’s dominance, has gathered customers including AstraZeneca and the Mayo Clinic. It recently confidentially filed for an initial public offering in the U.S.
