
It was hardly surprising that this week’s headlines were dominated by announcements from Nvidia GTC. After all, Nvidia Corp. is the world’s most valuable company, with a market capitalisation hovering at $4.4 trillion.
Taking the stage in his trademark black leather jacket, co-founder and CEO Jensen Huang laid out an audacious vision: Nvidia could generate as much as $1 trillion in annual revenue by 2027 from AI chip sales alone, doubling his earlier $500 billion forecast for 2026.
Huang’s confidence rests on a staggering premise that demand for compute has surged a million-fold over the past two years. That explosion, he argues, will drive cumulative purchase orders for Nvidia’s current Blackwell chips and its next-gen Vera Rubin architecture to at least $1 trillion.
If realised, the milestone would be historic: no company has ever generated $1 trillion in annual revenue. For context, Nvidia reported $215.9 billion in revenue in 2025, up from $130.5 billion a year-ago. Huang is signalling that this is just the beginning of a far steeper curve.
For more than a decade, Nvidia has been the undisputed kingmaker of the AI boom. Its graphics processing units are the backbone of modern AI, but as the industry pivots from training models to running them at scale—also known as inference—the competitive landscape could widen, creating openings for a new set of challengers.
In the training stage, giant models ingest vast datasets and learn complex patterns. During inference, the trained models actually answer queries, recommend products, generate images, or power AI agents. Every user interaction requires inference, and as AI applications proliferate across industries, the volume of these tasks will grow exponentially.
Inference workloads also prioritise different characteristics than training. Instead of raw computational muscle, they emphasise latency, power efficiency, and cost per query. GPUs excel at highly parallel training tasks, but inference opens the door to specialised chips designed for narrower, more efficient workloads.
Suddenly, you don’t need Nvidia GPUs anymore.
Simultaneously, Nvidia’s own customers—from Alphabet Inc. and Microsoft Corp. to Meta Platforms Inc., are investing heavily in custom AI silicon. Alphabet has long relied on its proprietary Tensor chips, Microsoft is building its Maia accelerator, and Meta is designing in-house inference hardware for its giga-scale data centres.
That, even as startups such as Cerebras Systems are betting on specialised architecture to outperform general-purpose GPUs in inference tasks.
“Cerebras chose to use SRAM (GPUs use DRAM or HBM) so that we could move data from memory to compute faster (also known as inference). Not a little bit faster but more than 2,600 times faster than NVIDIA Blackwell GPUs. As a result, we can generate tokens faster 15 times faster.” — Andrew Feldman, CEO and founder of Cerebras, on X.
While their market share remains small, Cerebras’ technology highlights how the AI hardware landscape could fragment as workloads diversify even as Nvidia’s software ecosystem, particularly CUDA, remains deeply embedded across AI development.
Besides, Nvidia has begun its own inference pivot with Huang unveiling the Nvidia Groq 3 Language Processing Unit, or LPU. Likely to ship in the third quarter, this LPU has been integrated into the Vera Rubin platform, working alongside GPUs and aimed at speeding up the inference process. This is a space that we’ll keenly be watching.
𝚃𝚑𝚎 𝙰𝙸 𝚝𝚘𝚘𝚕 𝚠𝚎 𝚞𝚗𝚕𝚘𝚌𝚔𝚎𝚍 𝚝𝚘𝚍𝚊𝚢 𝚒𝚜: 𝙶𝚘𝚘𝚐𝚕𝚎 𝙿𝚘𝚖𝚎𝚕𝚕𝚒
𝚆𝚑𝚊𝚝 𝚙𝚛𝚘𝚋𝚕𝚎𝚖 𝚍𝚘𝚎𝚜 𝚒𝚝 𝚜𝚘𝚕𝚟𝚎? 𝚈𝚘𝚞 𝚛𝚞𝚗 𝚊 𝚐𝚛𝚘𝚠𝚒𝚗𝚐 𝙳𝟸𝙲 𝚋𝚛𝚊𝚗𝚍 — 𝚜𝚔𝚒𝚗𝚌𝚊𝚛𝚎, 𝚏𝚘𝚘𝚍, 𝚏𝚊𝚜𝚑𝚒𝚘𝚗. 𝙿𝚛𝚘𝚍𝚞𝚌𝚝 𝚕𝚊𝚞𝚗𝚌𝚑 𝚒𝚗 𝚝𝚠𝚘 𝚠𝚎𝚎𝚔𝚜. 𝚈𝚘𝚞 𝚗𝚎𝚎𝚍 𝙸𝚗𝚜𝚝𝚊𝚐𝚛𝚊𝚖 𝚙𝚘𝚜𝚝𝚜, 𝙵𝚊𝚌𝚎𝚋𝚘𝚘𝚔 𝚊𝚍𝚜, 𝚙𝚛𝚘𝚍𝚞𝚌𝚝 𝚙𝚑𝚘𝚝𝚘𝚜, 𝚊 𝚁𝚎𝚎𝚕. 𝙰 𝚙𝚛𝚘𝚏𝚎𝚜𝚜𝚒𝚘𝚗𝚊𝚕 𝚜𝚑𝚘𝚘𝚝 𝚌𝚘𝚜𝚝𝚜 ₹𝟻𝟶,𝟶𝟶𝟶 𝚊𝚗𝚍 𝟷𝟶 𝚍𝚊𝚢𝚜 𝚢𝚘𝚞 𝚍𝚘𝚗’𝚝 𝚑𝚊𝚟𝚎. 𝚂𝚘 𝚢𝚘𝚞 𝚍𝚎𝚕𝚊𝚢. 𝙾𝚛 𝚙𝚘𝚜𝚝 𝚜𝚘𝚖𝚎𝚝𝚑𝚒𝚗𝚐 𝚘𝚏𝚏-𝚋𝚛𝚊𝚗𝚍. 𝙾𝚛 𝚜𝚙𝚎𝚗𝚍 𝚝𝚑𝚛𝚎𝚎 𝚎𝚟𝚎𝚗𝚒𝚗𝚐𝚜 𝚒𝚗 𝙲𝚊𝚗𝚟𝚊, 𝚊𝚗𝚍 𝚒𝚝 𝚜𝚝𝚒𝚕𝚕 𝚍𝚘𝚎𝚜𝚗’𝚝 𝚕𝚘𝚘𝚔 𝚛𝚒𝚐𝚑𝚝.
𝚃𝚘 𝚋𝚎 𝚜𝚞𝚛𝚎, 𝚒𝚗𝚌𝚘𝚗𝚜𝚒𝚜𝚝𝚎𝚗𝚝, 𝚕𝚘𝚠-𝚚𝚞𝚊𝚕𝚒𝚝𝚢 𝚖𝚊𝚛𝚔𝚎𝚝𝚒𝚗𝚐 𝚌𝚘𝚗𝚝𝚎𝚗𝚝 𝚌𝚘𝚜𝚝𝚜 𝚢𝚘𝚞 𝚜𝚊𝚕𝚎𝚜, 𝚝𝚛𝚞𝚜𝚝, 𝚊𝚗𝚍 𝚟𝚒𝚜𝚒𝚋𝚒𝚕𝚒𝚝𝚢. 𝙵𝚘𝚛 𝚜𝚖𝚊𝚕𝚕 𝚋𝚞𝚜𝚒𝚗𝚎𝚜𝚜𝚎𝚜, 𝚝𝚑𝚎 𝚐𝚊𝚙 𝚋𝚎𝚝𝚠𝚎𝚎𝚗 𝚠𝚑𝚊𝚝 𝚢𝚘𝚞 𝚠𝚊𝚗𝚝 𝚝𝚘 𝚜𝚊𝚢 𝚊𝚗𝚍 𝚠𝚑𝚊𝚝 𝚢𝚘𝚞 𝚌𝚊𝚗 𝚊𝚏𝚏𝚘𝚛𝚍 𝚝𝚘 𝚙𝚛𝚘𝚍𝚞𝚌𝚎 𝚒𝚜 𝚘𝚗𝚎 𝚘𝚏 𝚝𝚑𝚎 𝚖𝚘𝚜𝚝 𝚙𝚎𝚛𝚜𝚒𝚜𝚝𝚎𝚗𝚝, 𝚎𝚡𝚙𝚎𝚗𝚜𝚒𝚟𝚎 𝚙𝚛𝚘𝚋𝚕𝚎𝚖𝚜 𝚒𝚗 𝚝𝚑𝚎 𝚋𝚞𝚒𝚕𝚍𝚒𝚗𝚐.
𝙶𝚘𝚘𝚐𝚕𝚎 𝙿𝚘𝚖𝚎𝚕𝚕𝚒 𝚌𝚕𝚘𝚜𝚎𝚜 𝚝𝚑𝚊𝚝 𝚐𝚊𝚙. 𝙴𝚗𝚝𝚎𝚛 𝚢𝚘𝚞𝚛 𝚠𝚎𝚋𝚜𝚒𝚝𝚎 𝚄𝚁𝙻 — 𝚒𝚝 𝚛𝚎𝚊𝚍𝚜 𝚢𝚘𝚞𝚛 𝚋𝚛𝚊𝚗𝚍, 𝚎𝚡𝚝𝚛𝚊𝚌𝚝𝚜 𝚢𝚘𝚞𝚛 𝚌𝚘𝚕𝚘𝚞𝚛𝚜, 𝚏𝚘𝚗𝚝𝚜, 𝚊𝚗𝚍 𝚝𝚘𝚗𝚎, 𝚊𝚗𝚍 𝚋𝚞𝚒𝚕𝚍𝚜 𝚢𝚘𝚞𝚛 “𝙱𝚞𝚜𝚒𝚗𝚎𝚜𝚜 𝙳𝙽𝙰”. 𝙴𝚟𝚎𝚛𝚢𝚝𝚑𝚒𝚗𝚐 𝚒𝚝 𝚌𝚛𝚎𝚊𝚝𝚎𝚜 𝚏𝚛𝚘𝚖 𝚝𝚑𝚊𝚝 𝚙𝚘𝚒𝚗𝚝 𝚒𝚜 𝚊𝚗𝚌𝚑𝚘𝚛𝚎𝚍 𝚝𝚘 𝚠𝚑𝚘 𝚢𝚘𝚞 𝚊𝚌𝚝𝚞𝚊𝚕𝚕𝚢 𝚊𝚛𝚎.
𝙷𝚘𝚠 𝚝𝚘 𝚊𝚌𝚌𝚎𝚜𝚜: 𝚑𝚝𝚝𝚙𝚜://𝚕𝚊𝚋𝚜.𝚐𝚘𝚘𝚐𝚕𝚎.𝚌𝚘𝚖/𝚙𝚘𝚖𝚎𝚕𝚕𝚒
𝙶𝚘𝚘𝚐𝚕𝚎 𝙿𝚘𝚖𝚎𝚕𝚕𝚒 𝚌𝚊𝚗 𝚑𝚎𝚕𝚙 𝚢𝚘𝚞:
𝙰 𝚏𝚘𝚘𝚍 𝚋𝚛𝚊𝚗𝚍 𝚒𝚜 𝚕𝚊𝚞𝚗𝚌𝚑𝚒𝚗𝚐 𝚊 𝚗𝚎𝚠 𝚟𝚊𝚛𝚒𝚊𝚗𝚝 𝚊𝚑𝚎𝚊𝚍 𝚘𝚏 𝚝𝚑𝚎 𝚏𝚎𝚜𝚝𝚒𝚟𝚎 𝚜𝚎𝚊𝚜𝚘𝚗. 𝙷𝚎𝚛𝚎’𝚜 𝚠𝚑𝚊𝚝 𝚝𝚑𝚎𝚢 𝚍𝚘 𝚠𝚒𝚝𝚑 𝙿𝚘𝚖𝚎𝚕𝚕𝚒:
𝙽𝚘𝚝𝚎: 𝚃𝚑𝚎 𝚝𝚘𝚘𝚕𝚜 𝚊𝚗𝚍 𝚊𝚗𝚊𝚕𝚢𝚜𝚒𝚜 𝚏𝚎𝚊𝚝𝚞𝚛𝚎𝚍 𝚒𝚗 𝚝𝚑𝚒𝚜 𝚜𝚎𝚌𝚝𝚒𝚘𝚗 𝚍𝚎𝚖𝚘𝚗𝚜𝚝𝚛𝚊𝚝𝚎𝚍 𝚌𝚕𝚎𝚊𝚛 𝚟𝚊𝚕𝚞𝚎 𝚋𝚊𝚜𝚎𝚍 𝚘𝚗 𝚘𝚞𝚛 𝚒𝚗𝚝𝚎𝚛𝚗𝚊𝚕 𝚝𝚎𝚜𝚝𝚒𝚗𝚐. 𝙾𝚞𝚛 𝚛𝚎𝚌𝚘𝚖𝚖𝚎𝚗𝚍𝚊𝚝𝚒𝚘𝚗𝚜 𝚊𝚛𝚎 𝚎𝚗𝚝𝚒𝚛𝚎𝚕𝚢 𝚒𝚗𝚍𝚎𝚙𝚎𝚗𝚍𝚎𝚗𝚝 𝚊𝚗𝚍 𝚗𝚘𝚝 𝚒𝚗𝚏𝚕𝚞𝚎𝚗𝚌𝚎𝚍 𝚋𝚢 𝚝𝚑𝚎 𝚝𝚘𝚘𝚕 𝚌𝚛𝚎𝚊𝚝𝚘𝚛𝚜.
Jensen Huang wants us to believe that tokens will become the currency of the AI economy. He posits that in the near future, engineers won’t just earn salaries but also receive annual “token budgets”—a pool of AI compute they can spend to amplify their productivity. The basic assumption is that the need for tokens will keep growing.
But what is a token? Simply put, a token is a small chunk of text, words or parts of words that an AI model reads or generates. Every prompt and response is broken into these units, and companies are billed based on how many tokens are processed.
But as systems get more efficient, models will require fewer tokens per task, thanks to better architectures, caching, and specialised systems. If that trend continues, tokens become cheaper, more abundant, and ultimately less meaningful as a unit of value.
Besides, as pointed above, more money is now flowing into AI inference than into training new LLMs from scratch. That complicates the token narrative. Inference is highly competitive, margins are thinner, and efficiency gains are relentless. If companies optimise for fewer tokens per task, the very metric Huang wants to monetise starts shrinking.
Further, token budgets are not like salaries; they are tied to specific models, vendors, and pricing structures. If the underlying AI stack shifts—say from proprietary systems to open-source alternatives, or from large general models to smaller task-specific ones—the “value” of those tokens can erode quickly.
Huang’s idea also assumes that tokens will remain the primary interface to AI. That may already be changing. The industry is moving from “generate text” to “complete tasks”. Agentic AI systems do not just consume tokens, they call APIs, execute the workflows, and deliver outcomes. In such a world, tokens used may not matter. What matters is whether the job gets done or not.
AI BITS & BYTES
A techie made headlines when he said that he used ChatGPT and Google DeepMind’s AlphaFold to make a vaccine for his dog that had just months to live. Elon Musk later reposted that “Paul Conyngham, a Sydney-based technology entrepreneur and AI consultant, used Grok to finalise an mRNA vaccine construct for his dog Rosie’s mast cell cancer”.
That said, Emilia Javorsky, MD, MPH and Director of the Futures Program at Future of Life Institute, cautions that “...current AI capabilities—thoughtfully applied by people who understand biomedical problems deeply—can help...but realising this potential at scale requires abandoning the illusion of intelligence and embracing the complex, unglamorous work of fixing systems that actually stand between discovery and cure”.
She emphasises that today’s “...AI capabilities already deliver real medical value, not through Big Tech’s pursuit of god-like machines, but through targeted solutions for specific problems. Even DeepMind’s AlphaFold succeeded by focusing on one well-defined challenge: protein folding...”
MiroFish, touted as a “next-generation AI prediction engine” by its creators, extracts real-world information as ‘seed’ data (like breaking news, policy drafts and financial signals) and converts it into a knowledge graph.
The tool could be used to predict how traders and retail investors might react to specific news, forecast public reactions and opinions prior to policy announcements, test the effectiveness of marketing strategies by gauging user feedback in a virtual social environment, or build character worlds in order to deduce plot developments or write lost story endings.
Tech Talk is a weekly newsletter by Leslie D'Monte on everything happening in the world of technology and AI. Want this delivered straight into your inbox? Subscribe here.
Leslie D'Monte, author of "AI Rising", is a tech and science writer with stints at top media houses. An MIT-Knight Fellow and TEDx speaker, he covers AI, deeptech, and digital policy, curates tech events, and hosts podcasts and Mint's Tech Talk newsletter.
Catch all the Business News, Market News, Breaking News Events and Latest News Updates on Live Mint. Download The Mint News App to get Daily Market Updates.
Oops! Looks like you have exceeded the limit to bookmark the image. Remove some to bookmark this image.